Data Into Insights (Spring 2021): Difference between revisions

From CommunityData
 
(74 intermediate revisions by the same user not shown)
Line 7: Line 7:
:'''Instructor:''' [https://jeremydfoote.com Jeremy Foote]  
:'''Instructor:''' [https://jeremydfoote.com Jeremy Foote]  
:'''Email:''' jdfoote@purdue.edu
:'''Email:''' jdfoote@purdue.edu
:'''Office Hours:''' Thursdays; 3:00-5:00pm and by appointment
:'''[[User:Jdfoote/OH|Office Hours]]:''' Fridays 10am-noon and by appointment
 


<div style="float:right;">__TOC__</div>
<div style="float:right;">__TOC__</div>
Line 18: Line 17:
Students who complete this course will be able to:
Students who complete this course will be able to:
# Understand the role of narrative in interpreting and producing data analyses
# Understand the role of narrative in interpreting and producing data analyses
# Competently import, process, and prepare data from analysis in the [https://www.r-project.org/ R programming language]
# Competently import, process, and prepare data for analysis in the [https://www.r-project.org/ R programming language]
# Critically analyze data visualizations and presentations, and recognize poor or misleading visualizations
# Critically analyze data visualizations and presentations, and recognize poor or misleading visualizations
# Produce beautiful, well-designed data visualizations in R using [https://ggplot2.tidyverse.org/ ggplot2]
# Produce beautiful, well-designed data visualizations in R using [https://ggplot2.tidyverse.org/ ggplot2]
Line 35: Line 34:
**  '''Data Visualization: A Practical Introduction''' by Kieran Healy. [https://socviz.co/index.html Web version (free!)] or [https://amzn.to/2vfAixM Print version (Amazon)]
**  '''Data Visualization: A Practical Introduction''' by Kieran Healy. [https://socviz.co/index.html Web version (free!)] or [https://amzn.to/2vfAixM Print version (Amazon)]
** '''R for Data Science''' by Hadley Wickham and Garrett Grolemund. [https://r4ds.had.co.nz/index.html Web version (free!)] or [http://amzn.to/2aHLAQ1 Print version (Amazon)]
** '''R for Data Science''' by Hadley Wickham and Garrett Grolemund. [https://r4ds.had.co.nz/index.html Web version (free!)] or [http://amzn.to/2aHLAQ1 Print version (Amazon)]
** ''Effective Data Storytelling'' by Brent Dykes. [https://smile.amazon.com/dp/1119615712 Print version (Amazon)]
** '''Effective Data Storytelling''' by Brent Dykes. [https://purdue-primo-prod.hosted.exlibrisgroup.com/permalink/f/vjfldl/PURDUE_ALMA51860241510001081 Purdue libraries] or [https://smile.amazon.com/dp/1119615712 Print version (Amazon)]
 
* Other readings: Readings will be linked to from this page. Where necessary, they will be put on Brightspace
 
=== Reading Academic Articles ===


* Other readings: Other readings will be made available on Brightspace.
Some of the readings will be academic articles. I do not expect you to read every word of these articles. Rather, you should practice intentional directed skimming. [https://writingcenter.gmu.edu/guides/strategies-for-reading-academic-articles This article] gives a nice overview. The TL;DR is that you should carefully read the abstract, introduction, and conclusion. For the rest of the article, focus on section headings and topic sentences to extract the main ideas.


= Course logistics =
= Course logistics =
Line 53: Line 56:
This course will follow "flipped" classroom model. I expect you to learn most of the content of the course asynchronously. The goal of our time together is not to tell you new things, but to consolidate knowledge and to clear up misconceptions.
This course will follow "flipped" classroom model. I expect you to learn most of the content of the course asynchronously. The goal of our time together is not to tell you new things, but to consolidate knowledge and to clear up misconceptions.


The Tuesday meeting will be a collaborative, discussion-centric session. Typically, about half of each session will be devoted to going over assignments and the other half will be a discussion of the readings and videos from that week.
The Tuesday meeting will be a collaborative, discussion-centric session. Typically, about half of each session will be devoted to going over assignments and the other half will be a discussion of the readings and videos from that week. We will take collaborative notes [https://etherpad.wikimedia.org/p/com-495-data-insight using this Etherpad].
 
If you would like to create collaborative summaries of the readings, you can [https://etherpad.wikimedia.org/p/com-495-summaries use this Etherpad].


The Thursday meetings will be more like a lab. Some of these sessions will include synchronous activities but they will often be more of a co-working time, where you can work synchronously on assignments and I can be available to answer questions.
The Thursday meetings will be more like a lab. Some of these sessions will include synchronous activities but they will often be more of a co-working time, where you can work synchronously on assignments and I can be available to answer questions.
Line 61: Line 66:
Your first place to look for help should be each other. By asking and answering questions on Discord, you will not only help to build a repository of shared information, but to reinforce our learning community.
Your first place to look for help should be each other. By asking and answering questions on Discord, you will not only help to build a repository of shared information, but to reinforce our learning community.


I will also hold office hours Thursday afternoons on Discord. If you come with a programming question, I will expect that you have already tried to solve it yourself in multiple ways and that you have discussed it with a classmate (e.g., on Discord). This policy lets me have time to help more students, but it's also a useful strategy. Often [https://en.wikipedia.org/wiki/Rubber_duck_debugging just trying to explain your code] can help you to recognize where you've gone wrong.
I will also hold office hours Friday mornings on Discord ([[User:Jdfoote/OH|sign up here]]). If you come with a programming question, I will expect that you have already tried to solve it yourself in multiple ways and that you have discussed it with a classmate (e.g., on Discord). This policy lets me have time to help more students, but it's also a useful strategy. Often [https://en.wikipedia.org/wiki/Rubber_duck_debugging just trying to explain your code] can help you to recognize where you've gone wrong.


I will also keep an eye on Discord during normal business hours. I encourage you to post questions there, and to use it as a space where we can help and instruct each other. In general, you should contact me there. I am also available by email. You can reach me at [mailto:jdfoote@purdue.edu jdfoote@purdue.edu]. I try hard to maintain a boundary between work and home and I typically respond only on weekdays during business hours.
I will also keep an eye on Discord during normal business hours. I encourage you to post questions there, and to use it as a space where we can help and instruct each other. In general, you should contact me there. I am also available by email. You can reach me at [mailto:jdfoote@purdue.edu jdfoote@purdue.edu]. I try hard to maintain a boundary between work and home and I typically respond only on weekdays during business hours.
=== Resources ===
Especially for the programming assignments, I will often create video walkthroughs that will be linked from the schedule. I also created the following general videos that may be helpful:
* Explanation of ggplot (and Chapter 3 in R4DS) [[https://purdue.brightspace.com/d2l/le/content/208726/viewContent/5507580/View Video]]
* Finding and fixing bugs in your code [[https://purdue.brightspace.com/d2l/le/content/208726/viewContent/5708092/View Video]] [[https://jeremydfoote.com/TDIS/week_8/debugging.Rmd R Markdown file]] [[https://jeremydfoote.com/TDIS/week_8/debugging.html HTML file]]


= Assignments =
= Assignments =
Line 77: Line 89:
== Discussion Questions ==
== Discussion Questions ==


This course will have two "modes". For much of the class, we will be reading about theories of communication and rhetoric, about principles of data visualization, etc. For these sessions, you will be required to submit 1-2 discussion questions on Discord on Monday by noon. I will then curate some of these questions (and add some of my own) to use to guide our discussion on Tuesday.
This course will have two "modes". For much of the class, we will be reading about theories of communication and rhetoric, about principles of data visualization, etc. For these sessions, you will be required to submit 1-2 discussion questions on Discord on Monday by noon. I will then curate some of these questions (and add some of my own) to use to guide our discussion on Tuesday. I will post the questions on the Etherpad at https://etherpad.wikimedia.org/p/com-495-data-insight
 
Questions should engage with the readings and either connect to other concepts or to the "real world". Here are some good example questions:
 
* The readings this week talked a lot about how data visualizations can be misleading. How can we tell when visualizations are intentionally trying to mislead versus when they are just poorly designed?
* I was confused by the reading on counterfactuals. We obviously can't really know what would have happened in different conditions, so why even try?
* Imagine you were asked to create an ad campaign to recruit students to Purdue. What types of appeals would you use and why?


During other weeks, we will be more focused on learning practical skills (mostly data manipulation and visualization in R). On those weeks, discussions will center around identifying places where folks are still confused and students will be randomly selected to share their responses to homework questions.
During other weeks, we will be more focused on learning practical skills (mostly data manipulation and visualization in R). On those weeks, discussions will center around identifying places where folks are still confused and students will be randomly selected to share their responses to homework questions.
Line 126: Line 144:
* Exceed requirements, but in fairly straightforward ways - e.g., an additional post in discussion every week.
* Exceed requirements, but in fairly straightforward ways - e.g., an additional post in discussion every week.
* Compose complete and sufficiently detailed reflections.
* Compose complete and sufficiently detailed reflections.
* Complete many of the homework assignments.
* Complete nearly all of the homework assignments, typically at a fairly high level


C: This reflects meeting the minimum expectations of the course. Students reaching this level of achievement
C: This reflects meeting the minimum expectations of the course. Students reaching this level of achievement
Line 133: Line 151:
* Be collegial and continue discussion, through asking simple or limited questions.
* Be collegial and continue discussion, through asking simple or limited questions.
* Compose reflections with straightforward and easily manageable goals and/or avoid discussions of challenges.
* Compose reflections with straightforward and easily manageable goals and/or avoid discussions of challenges.
* Not complete homework assignments or turn some in in a hasty or incomplete manner.
* Not complete homework assignments or turn many in in a hasty or incomplete manner.


D/F: These are reserved for cases in which students do not complete work or participate. Students may also be impeding the ability of others to learn.
D/F: These are reserved for cases in which students do not complete work or participate. Students may also be impeding the ability of others to learn.
Line 156: Line 174:


'''Assignment Due:'''  
'''Assignment Due:'''  
* None
* [[/Discord signup|Sign up for Discord]] and introduce yourself
* Take [https://forms.gle/spJzcKBCsERVLHNSA this very brief survey]


'''Readings (before class):'''  
'''Readings (before class):'''  
Line 169: Line 188:
'''Assignment Due:'''  
'''Assignment Due:'''  
* Read the entire syllabus (this document)
* Read the entire syllabus (this document)
* Sign up for [https://discord.gg/WvzkwY4fDK Discord] and introduce yourself
* Take [https://forms.gle/spJzcKBCsERVLHNSA this very brief survey]


== Week 2: Storytelling and Narratives  ==
== Week 2: Storytelling and Narratives  ==
Line 178: Line 195:


'''Assignment Due:'''  
'''Assignment Due:'''  
* Summary and discussion questions
* [[#Discussion Questions|Discussion questions]]




'''Readings (before class):'''  
'''Readings (before class):'''  
* Zak, P. (2013). [https://greatergood.berkeley.edu/article/item/how_stories_change_brain How stories change the brain]
* Langston, C. [https://www.youtube.com/watch?v=3klMM9BkW5o How to use rhetoric to get what you want] (video)
* Langston, C. [https://www.youtube.com/watch?v=3klMM9BkW5o How to use rhetoric to get what you want] (video)
* Leighfield, L. [https://boords.com/ethos-pathos-logos-aristotle-modes-of-persuasion Ethos, Pathos & Logos: Aristotle’s Modes of Persuasion]
* Leighfield, L. [https://boords.com/ethos-pathos-logos-aristotle-modes-of-persuasion Ethos, Pathos & Logos: Aristotle’s Modes of Persuasion]
Line 187: Line 205:
* [http://www.openculture.com/2014/02/kurt-vonnegut-masters-thesis-rejected-by-u-chicago.html Kurt Vonnegut's Shapes of Stories]
* [http://www.openculture.com/2014/02/kurt-vonnegut-masters-thesis-rejected-by-u-chicago.html Kurt Vonnegut's Shapes of Stories]
* Lafrance, A. [https://www.theatlantic.com/technology/archive/2016/07/the-six-main-arcs-in-storytelling-identified-by-a-computer/490733/ The Six Main Arcs in Storytelling, as Identified by an A.I.]
* Lafrance, A. [https://www.theatlantic.com/technology/archive/2016/07/the-six-main-arcs-in-storytelling-identified-by-a-computer/490733/ The Six Main Arcs in Storytelling, as Identified by an A.I.]
* (Optional) A Rulebook for Arguments (link on Brightspace)




Line 197: Line 216:


'''Assignment Due:'''
'''Assignment Due:'''
* [[#Discussion Questions|Discussion questions]]


'''Readings:'''  
'''Readings:'''
* Effective Data Storytelling (EDS) Ch. 1--3 ([https://purdue-primo-prod.hosted.exlibrisgroup.com/permalink/f/vjfldl/PURDUE_ALMA51860241510001081 Purdue libraries copy])
* Matei, S. [https://purdue.brightspace.com/d2l/le/content/208726/viewContent/4750659/View What is a (data) story?]
* [https://purdue.brightspace.com/d2l/le/content/208726/viewContent/5392546/View Counterfactuals and Storytelling lecture ] [4:49]
* (Optional) Levy, J. (2015). [https://www-tandfonline-com.ezproxy.lib.purdue.edu/doi/full/10.1080/09636412.2015.1070602 Counterfactuals, Causal Inference, and Historical Analysis]
* (Optional) [https://towardsdatascience.com/storytelling-for-data-scientists-317c2723aa31 Storytelling for Data Scientists]
* (Optional) [https://towardsdatascience.com/how-to-properly-tell-a-story-with-data-and-common-pitfalls-to-avoid-317d8817e0c9 How to properly tell a story with data — and common pitfalls to avoid]


'''Class Schedule:'''
'''Class Schedule:'''
Line 204: Line 230:
* Counterfactual thinking
* Counterfactual thinking
* The role of statistics
* The role of statistics


== Week 4: The ethics of data stories (Part I) ==
== Week 4: The ethics of data stories (Part I) ==
Line 215: Line 238:
'''Assignment Due:'''
'''Assignment Due:'''
* Turn in your [[Self Assessment Reflection]] on Brightspace
* Turn in your [[Self Assessment Reflection]] on Brightspace
* [[/Purdue WP Case|Case Study]] (Be prepared to talk about this case, based on the readings and the class so far)
* No Discussion Questions (but feel free to have discussions on Discord!)


'''Readings:'''  
'''Readings:'''  
* Salganik, M. (2017). [https://www.bitbybitbook.com/en/ethics/ethics-intro/ Chapter 6: Ethics] from ''Bit by Bit''.
* Kassner, M. [https://www.techrepublic.com/article/5-ethics-principles-big-data-analysts-must-follow/ 5 ethics principles big data analysts must follow]
* McNulty, K. (2018). [https://drkeithmcnulty.com/2018/07/22/beware-of-storytelling-in-data-and-analytics/ Beware of 'storytelling' in data and analytics]
* (Optional) Steinmann, M., Matei, S. A., & Collmann, J. (2016). A Theoretical Framework for Ethical Reflection in Big Data Research. (On Brightspace)


'''Class Schedule:'''
'''Class Schedule:'''
* Ethical frameworks
* What are ethical data stories?
* When do analysts need to make ethical decisions?
* Transparency, respect, beneficence, honesty


== Week 5: Where does data come from? ==
== Week 5: Where does data come from? ==
Line 226: Line 260:


'''Assignment Due:'''  
'''Assignment Due:'''  
* [[#Discussion Questions|Discussion questions]]


'''Readings:'''
'''Readings:'''
* [https://purdue.brightspace.com/d2l/le/content/208726/viewContent/5431820/View Where data comes from lecture] [14:02]
* Pelz, W. [https://courses.lumenlearning.com/suny-hccc-research-methods/chapter/chapter-6-measurement-of-constructs/ Measurement of Constructs] in ''Research Methods for the Social Sciences''.
* [https://uxplanet.org/dirty-data-what-is-it-and-how-to-prevent-it-742accad081e Dirty Data article]
* Salganik, M. [https://www.bitbybitbook.com/en/1st-ed/observing-behavior Observing behavior] in ''Bit by Bit''
* EDS Chapter 5
* Perkel, J. [https://www-nature-com.ezproxy.lib.purdue.edu/articles/d41586-018-05990-5 A toolkit for data transparency takes shape]
* (Optional) Tayi, G. K. and Ballou, D. P. (1998). [https://www.researchgate.net/publication/27297579_Examining_Data_Quality Examining Data Quality]


'''Class Schedule:'''
'''Class Schedule:'''


== Week 6: Introduction to R ==
== Week 6: Introduction to R ==
Line 239: Line 279:


'''Assignment Due:'''  
'''Assignment Due:'''  
 
* [[/R Lab 1|R Lab 1]]
** [https://purdue.brightspace.com/d2l/le/content/208726/viewContent/5457615/View Video to help with lab] [7:39]




'''Readings:'''  
'''Readings:'''  
* [https://purdue.brightspace.com/d2l/le/content/208726/viewContent/5477440/View Why Programming + Intro to R lecture] [12:53]
* [https://source.opennews.org/articles/what-i-learned-recreating-one-chart-using-24-tools/ What I Learned Recreating One Chart Using 24 Tools]. Lisa Charlotte Rost
* [https://source.opennews.org/articles/what-i-learned-recreating-one-chart-using-24-tools/ What I Learned Recreating One Chart Using 24 Tools]. Lisa Charlotte Rost
* [https://r4ds.had.co.nz/introduction.html RFDS Ch. 1]
* [https://r4ds.had.co.nz/introduction.html R4DS Ch. 1]
(Optional)
(Optional)
* [https://rladiessydney.org/courses/ryouwithme/01-basicbasics-0/ Unit 1: Basic Basics (R Ladies Sydney)]
* [https://rladiessydney.org/courses/ryouwithme/01-basicbasics-0/ Unit 1: Basic Basics (R Ladies Sydney)]
Line 250: Line 292:


'''Class Schedule:'''
'''Class Schedule:'''
* Why programming?
* Why R?
* Functions
* Variables
* Data frames
* Tidyverse


== Week 7: Making figures in R ==
== Week 7: Making figures in R ==
Line 262: Line 298:


'''Assignment Due:'''  
'''Assignment Due:'''  
* [[/R4DS Chapter 3 Exercises|R4DS Chapter 3 Exercises]]
** [https://purdue.brightspace.com/d2l/le/content/208726/viewContent/5507580/View Video overview of how to do assignment + ggplot explanation] [13:33]


'''Readings:'''  
'''Readings:'''  
* [https://r4ds.had.co.nz/data-visualisation.html R4DS Chapter 3]
* [https://socviz.co/gettingstarted.html DV Chapter 2]


'''Class Schedule:'''
'''Class Schedule:'''
* ggplot2
* ggplot2


 
== Week 8: Manipulating and Aggregating Data ==
 
== Week 8: Visualization principles ==
 


March 9
March 9


'''Assignment Due:'''
'''Assignment Due:'''
* Start [[/R4DS Chapter 5 Exercises|R4DS Chapter 5 Exercises]]
** [https://purdue.brightspace.com/d2l/le/content/208726/viewContent/5562641/View Video explanation of homework] [26:45]
* Turn in your [[Self Assessment Reflection]] on Brightspace
* Turn in your [[Self Assessment Reflection]] on Brightspace


'''Readings:'''  
'''Readings:'''
* [https://r4ds.had.co.nz/workflow-basics.html R4DS Chapter 4 - Workflow Basics]
* [https://r4ds.had.co.nz/transform.html R4DS Chapter 5 - Data transformation]


'''Class Schedule:'''
== Week 9: Visualization Principles ==


March 16


'''Assignment Due:'''
* [[/R4DS Chapter 5 Exercises|R4DS Chapter 5 Exercises]]
* [[#Discussion Questions|Discussion questions]]


== Week 9: Visualization Principles II ==
March 16
'''Assignment Due:'''


'''Readings:'''  
'''Readings:'''  
* [https://datavizm20.classes.andrewheiss.com/content/02-content/ Graphic Design] by Andrew Heiss. Make sure to watch all 4 videos.
* EDS Chapter 7
* Healy, K. [https://socviz.co/lookatdata.html Data Visualization Chapter 1]
* (Optional) Gelman, A. and Unwin, A. (2012). [http://www.stat.columbia.edu/~gelman/research/published/vis14.pdf Infovis and statistical graphics: Differrent goals, different looks].
* (Optional) Williams, R. (2008). [https://purdue-primo-prod.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=PURDUE_ALMA51793773920001081&context=L&vid=PURDUE&lang=en_US&search_scope=everything&adaptor=Local%20Search%20Engine&tab=default_tab&query=any,contains,The%20Non-Designer%27s%20Design%20Book&mode=Basic The Non-Designer's Design Book], Chapters 1-6


'''Class Schedule:'''
'''Class Schedule:'''


March 18 - READING DAY
March 18 - READING DAY


 
== Week 10: Visualization Principles II and Exploratory Data Analysis ==
 
== Week 10: Advanced visualizations in R ==


March 23
March 23


'''Assignment Due:'''  
'''Assignment Due:'''  
* [[/Data Source Assignment|Submit the data source for your final project]]
* [[Data_Into_Insights_(Spring_2021)/Final_project#Step_1:_Identify_a_dataset|Submit the data source for your final project]]
* Submit 2 questions for take-home exam
* [[Data_Into_Insights_(Spring_2021)/Visualization Project|Visualization Project]]


'''Readings:'''  
'''Readings:'''  
* [https://socviz.co/groupfacettx.html#groupfacettx DV Chapter 4: Show the right numbers]
* EDS Chapter 8
* Hullman, J. [https://www-scientificamerican-com.ezproxy.lib.purdue.edu/article/how-to-get-better-at-embracing-unknowns/ How to get better at embracing unknowns]
* Yau, N. [https://flowingdata.com/2018/01/08/visualizing-the-uncertainty-in-data/ Visualizing the uncertainty in data].
* (Optional) Review [https://r4ds.had.co.nz/transform.html R4DS Ch 5]


'''Class Schedule:'''
'''Class Schedule:'''
* Summarize and discuss readings
* Peer feedback on data source + visualization project
* R4DS Chapter 5 (continued)


== Week 11: Text as data ==
== Week 11: Text as data ==
Line 316: Line 367:


'''Assignment Due:'''
'''Assignment Due:'''
* [[#Discussion Questions|Discussion questions]] - One discussion question and one or more examples of "bad" visualizations that you found


'''Readings:'''
'''Readings:'''
* Grimmer, J., & Stewart, B. M. (2013). [https://www.cambridge.org/core/services/aop-cambridge-core/content/view/F7AAC8B2909441603FEB25C156448F20/S1047198700013401a.pdf/text-as-data-the-promise-and-pitfalls-of-automatic-content-analysis-methods-for-political-texts.pdf Text as data: The promise and pitfalls of automatic content analysis methods for political texts]. Political Analysis.
* Reagan, A. J., Mitchell, L., Kiley, D., Danforth, C. M., & Dodds, P. S. (2016). [https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-016-0093-1 The emotional arcs of stories are dominated by six basic shapes]. EPJ Data Science.


'''Class Schedule:'''
'''Class Schedule:'''
* Guest lecture by [https://ryanjgallagher.github.io/ Ryan J. Gallagher]
* Guest lecture by [https://ryanjgallagher.github.io/ Ryan J. Gallagher]


 
== Week 12: Advanced visualizations in R ==
== Week 12: Importing and cleaning data ==


April 6
April 6


'''Assignment Due:'''
'''Assignment Due:'''  
* [[Self Assessment Reflection]]
* [[Self Assessment Reflection]]
* [[/Exam|Take-home Exam]]
* [[/Story Time|Story Time Mini-project]]
 
'''Readings:'''
* [https://socviz.co/maps.html#maps DV Chapter 7: Maps]
* [https://r4ds.had.co.nz/graphics-for-communication.html R4DS Ch. 28]


'''Readings:'''
'''Class Schedule:'''
* Maps
* [https://jeremydfoote.com/Communication-and-Social-Networks/week_6/ggraph_walkthrough.html Networks]
* Annotations


== Week 13: Manipulating and aggregating data ==
== Week 13: Importing and cleaning data ==


April 13
April 13
Line 340: Line 402:


* Synchronous session moved to April 15
* Synchronous session moved to April 15


April 15
April 15


'''Assignment Due:'''
'''Assignment Due:'''
* [[/Final project proposal|Proposal for final project]]
* [[Data Into Insights (Spring 2021)/Final project#Step_2:_Explore_the_data_and_write_a_proposal|Proposal for final project]]
* [[/R4DS Chapter 12|R4DS Chapter 12 (12.2 and 12.3)]]
 


'''Readings:'''
'''Readings:'''
* [https://r4ds.had.co.nz/data-import.html R4DS Chapters 11--12]
* (Optional) Wickham, H. (2014). [http://vita.had.co.nz/papers/tidy-data.pdf Tidy Data]. Journal of statistical software, 59(10), 1-23.
* (Optional) Huntington-Klein, N. [https://www.youtube.com/watch?v=CnY5Y5ANnjE&t=785s Data Wrangling with R and the Tidyverse]


'''Course schedule:'''
'''Class schedule:'''
* Provide peer feedback on final project proposal
* Provide peer feedback on final project proposal


Line 357: Line 423:


'''Assignment Due:'''  
'''Assignment Due:'''  
* [[/Final project proposal|New version of final project proposal]] (edited following peer feedback)
* [[#Discussion Questions|One discussion question]]
* [[Data_Into_Insights_(Spring_2021)/Final_project#Step_2:_Explore_the_data_and_write_a_proposal|New version of final project proposal]] (edited following peer feedback)
* [[/R4DS Chapter 12|R4DS Chapter 12 (12.4-12.6)]]


'''Readings:'''
'''Readings:'''
* Kim, Y. et al. (2017). [http://users.eecs.northwestern.edu/~jhullman/explaining_the_gap.pdf Explaining the Gap: Visualizing One’s Predictions Improves Recall and Comprehension of Data].
* Knaflic, C. N. (2019). [https://purdue.alma.exlibrisgroup.com/view/uresolver/01PURDUE_PUWL/openurl?ctx_enc=info:ofi/enc:UTF-8&ctx_id=10_1&ctx_tim=2020-06-13T12%3A39%3A32IST&ctx_ver=Z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&url_ver=Z39.88-2004&rfr_id=info:sid/primo.exlibrisgroup.com-PURDUE_ALMA&req_id=_c20e3fe9e4a9a31b0162ece2023b8d45&rft_dat=ie=01PURDUE_PUWL:51807454010001081,language=eng,view=PURDUE&svc_dat=viewit&u.ignore_date_coverage=true&req.skin=PUWL&Force_direct=true&is_new_ui=true Storytelling with Data] Chapter 6
* EDS Chapter 9


== Week 15: Ethics of data stories (Part II) ==
== Week 15: Ethics of data stories (Part II) ==
Line 366: Line 437:


'''Assignment Due:'''  
'''Assignment Due:'''  
* [[/Final project rough draft|Final project rough draft]] for peer feedback
* 1 [[#Discussion Questions|Discussion question]]
* [[Data_Into_Insights_(Spring_2021)/Final_project#Step_3:_Write_a_rough_draft|Final project rough draft]] for peer feedback


'''Readings:'''
'''Readings:'''
* Re-read McNulty, K. (2018). [https://drkeithmcnulty.com/2018/07/22/beware-of-storytelling-in-data-and-analytics/ Beware of 'storytelling' in data and analytics] and reflect on how you see this differently now that you know more about data storytelling
'''Topics:'''
* What does an ethical data story look like?


April 29
April 29
Line 374: Line 450:
'''Assignment Due:'''
'''Assignment Due:'''
* Peer feedback (via email or Discord)
* Peer feedback (via email or Discord)


== Week 16: Finals week  ==
== Week 16: Finals week  ==

Latest revision as of 19:23, 8 May 2021

Course Information[edit]

COM 495/6/7: Turning Data into Insight and Stories
Location: ONLINE
Class Hours: Tuesdays and Thursdays; 10:30-11:45am

Instructor[edit]

Instructor: Jeremy Foote
Email: jdfoote@purdue.edu
Office Hours: Fridays 10am-noon and by appointment

Course Overview and Learning Objectives[edit]

We are increasingly surrounded by data, and those with the technical skills to analyze it are highly sought after. Even more valuable are those who can not only identify insights from data, but can communicate and persuade with those insights. This course will focus on both developing data skills and crafting persuasive data stories.

Students who complete this course will be able to:

  1. Understand the role of narrative in interpreting and producing data analyses
  2. Competently import, process, and prepare data for analysis in the R programming language
  3. Critically analyze data visualizations and presentations, and recognize poor or misleading visualizations
  4. Produce beautiful, well-designed data visualizations in R using ggplot2
  5. Craft compelling data presentations

Required resources and texts[edit]

Laptop[edit]

This is a data analysis class and you will need access to a decent computer. You will need a machine with at least 2GB of memory. Windows, Mac OS, and Linux are all fine but an iPad or Android tablet won't work.


Readings[edit]

  • Other readings: Readings will be linked to from this page. Where necessary, they will be put on Brightspace

Reading Academic Articles[edit]

Some of the readings will be academic articles. I do not expect you to read every word of these articles. Rather, you should practice intentional directed skimming. This article gives a nice overview. The TL;DR is that you should carefully read the abstract, introduction, and conclusion. For the rest of the article, focus on section headings and topic sentences to extract the main ideas.

Course logistics[edit]

Note About This Syllabus[edit]

This is my first time teaching this course and this syllabus will be a dynamic document. Although the core expectations for this class are fixed, the details of readings and assignments may shift based on how the class goes. As a result, there are three important things to keep in mind:

  1. Although details on this syllabus will change, I will not change readings or assignments less than one week before they are due. If I don't fill in a "To Be Determined" one week before it's due, it is dropped. If you plan to read more than one week ahead, contact me first.
  2. Closely monitor the class Discord. Because this a wiki, you will be able to track every change by clicking the history button on this page. I will also summarize these changes in an announcement on Discord that should be emailed to everybody in the class.
  3. I will ask the class for voluntary anonymous feedback frequently. Please let me know what is working and what can be improved.

Class Sessions[edit]

This course will follow "flipped" classroom model. I expect you to learn most of the content of the course asynchronously. The goal of our time together is not to tell you new things, but to consolidate knowledge and to clear up misconceptions.

The Tuesday meeting will be a collaborative, discussion-centric session. Typically, about half of each session will be devoted to going over assignments and the other half will be a discussion of the readings and videos from that week. We will take collaborative notes using this Etherpad.

If you would like to create collaborative summaries of the readings, you can use this Etherpad.

The Thursday meetings will be more like a lab. Some of these sessions will include synchronous activities but they will often be more of a co-working time, where you can work synchronously on assignments and I can be available to answer questions.

Getting Help[edit]

Your first place to look for help should be each other. By asking and answering questions on Discord, you will not only help to build a repository of shared information, but to reinforce our learning community.

I will also hold office hours Friday mornings on Discord (sign up here). If you come with a programming question, I will expect that you have already tried to solve it yourself in multiple ways and that you have discussed it with a classmate (e.g., on Discord). This policy lets me have time to help more students, but it's also a useful strategy. Often just trying to explain your code can help you to recognize where you've gone wrong.

I will also keep an eye on Discord during normal business hours. I encourage you to post questions there, and to use it as a space where we can help and instruct each other. In general, you should contact me there. I am also available by email. You can reach me at jdfoote@purdue.edu. I try hard to maintain a boundary between work and home and I typically respond only on weekdays during business hours.

Resources[edit]

Especially for the programming assignments, I will often create video walkthroughs that will be linked from the schedule. I also created the following general videos that may be helpful:

Assignments[edit]

There will be multiple types of assignments, designed to encourage learning in different ways.

Participation[edit]

This will be a very participatory class, and I expect you to be an active member of our class, engaged in helping us all to gain insight and inspritation. This includes paying attention in class, participating in activities, and being actively engaged in learning, thinking about, and trying to understand the material.

This also includes doing the readings and watching the videos. To make sure that everyone has an opportunity to participate and to encourage you to do the assignments, I will randomly select students to answer discussion questions or to explain portions of homework assignments and labs. I will keep track of the quantity and quality of your responses and I will make that data available to you to help guide our discussion around grades.

Discussion Questions[edit]

This course will have two "modes". For much of the class, we will be reading about theories of communication and rhetoric, about principles of data visualization, etc. For these sessions, you will be required to submit 1-2 discussion questions on Discord on Monday by noon. I will then curate some of these questions (and add some of my own) to use to guide our discussion on Tuesday. I will post the questions on the Etherpad at https://etherpad.wikimedia.org/p/com-495-data-insight

Questions should engage with the readings and either connect to other concepts or to the "real world". Here are some good example questions:

  • The readings this week talked a lot about how data visualizations can be misleading. How can we tell when visualizations are intentionally trying to mislead versus when they are just poorly designed?
  • I was confused by the reading on counterfactuals. We obviously can't really know what would have happened in different conditions, so why even try?
  • Imagine you were asked to create an ad campaign to recruit students to Purdue. What types of appeals would you use and why?

During other weeks, we will be more focused on learning practical skills (mostly data manipulation and visualization in R). On those weeks, discussions will center around identifying places where folks are still confused and students will be randomly selected to share their responses to homework questions.

Homework/Labs[edit]

There will be a number of homework assignments. At the beginning of the class, these will be designed to help you to grasp foundational concepts about storytelling, visualization, and data. As the class progresses, more and more of them will be based on learning and developing proficiency in visualizing data in R.

Exams[edit]

There will be one in-class exam. It will assess your understanding of core concepts around storytelling and visualization.


Final Project[edit]

The main outcome of this course will be your final project, which will be a data presentation, either as a website or a slide deck + presentation. A detailed description of the project is at this link.

There will be a number of intermediate assignments through the semester to help you to identify a dataset, explore the data for insights, and get and give feedback on visualizations and story elements.

Grades[edit]

This course will follow a "self-assessment" philosophy. I am more interested in helping you to learn things that will be useful to you than in assigning grades. In general, I think that my time is much better spent in providing better feedback and in being available to work through problems together.

The university still requires grades, so you will be leading the evaluation of your work. This will be completed with me in four stages, at the end of weeks 4, 8, 12, and 16. In each stage, you will use this form to reflect on what you have accomplished thus far, how it has met, not met, or exceeded expectations, based both on rubrics and personal goals and objectives. At each of these stages you will receive feedback on your assessments. By the end of the semester, you should have a clear vision of your accomplishments and growth, which you will turn into a grade. As the instructor-of-record, I maintain the right to disagree with your assessment and alter grades as I see fit, but any time that I do this it will be accompanied by an explanation and discussion. These personal assessments, reflecting both honest and meaningful reflection of your work will be the most important factor in final grades.

We will use the following rubric in our assessment:

  • 20%: class participation, including attendance and participation in discussions and group work
  • 20%: Labs and homework assignments
  • 25%: Exam
  • 35%: Final Project

The exam will be graded like a normal exam and the score will make up 25% of your grade. For the rest of the assignments (and the other 75% of your grade), I will provide feedback which will inform an ongoing conversation about your work.

My interpretation of grade levels (A, B, C, D/F) is the following:

A: Reflects work the exceeds expectations on multiple fronts and to a great degree. Students reaching this level of achievement will:

  • Do what it takes to learn the principles and techniques of data storytelling, including looking to outside sources if necessary.
  • Engage thoughtfully with an ambitious final project.
  • Take intellectual risks, offering interpretations based on synthesizing material and asking for feedback from peers.
  • Share work early allowing extra time for engagement with others.
  • Write reflections that grapple meaningfully with lessons learned as well as challenges.
  • Complete all or nearly all homework assignments at a high level.

B: Reflects strong work. Work at this level will be of consistently high quality. Students reaching this level of achievement will:

  • Be more safe or consistent than the work described above.
  • Ask meaningful questions of peers and engage them in fruitful discussion.
  • Exceed requirements, but in fairly straightforward ways - e.g., an additional post in discussion every week.
  • Compose complete and sufficiently detailed reflections.
  • Complete nearly all of the homework assignments, typically at a fairly high level

C: This reflects meeting the minimum expectations of the course. Students reaching this level of achievement will:

  • Turn in and complete the final project on time.
  • Be collegial and continue discussion, through asking simple or limited questions.
  • Compose reflections with straightforward and easily manageable goals and/or avoid discussions of challenges.
  • Not complete homework assignments or turn many in in a hasty or incomplete manner.

D/F: These are reserved for cases in which students do not complete work or participate. Students may also be impeding the ability of others to learn.

Extra Credit for Participating in Research Studies[edit]

The Brian Lamb School of Communication uses an online program that expedites the process of recruiting, signing up, and granting extra credit to students for participating in research studies. The program is called the Research Participation System, and it provides an easy online method for you to sign up for research studies, to keep track of the studies you have completed, and to view how many credits you have earned for each study. You can access the system online at any time, from any computer with a standard web browser. By participating in studies done within the Brian Lamb School of Communication, you can learn first hand how a study is conducted, you can contribute to the advancement of the field, and you can improve your grade by earning extra credit.

  • You earn a ½ percent credit for every half-hour that you participate in a study. The maximum extra credit that you can earn for this course is 3%, which will be added to your total course points
  • If you sign up to participate in a study and fail to show up without canceling your appointment in advance (up to 2 hours before the study), you can be restricted from signing up for any studies for 30 days. You may quickly cancel your appointment online using the Research Participation System.
  • Please review the instructions before you sign up for studies; to view the instructions go to https://www.cla.purdue.edu/communication/research/participation/students.html
  • You can sign up to participate in studies by logging into http://purdue-comm.sona-systems.com/.

Schedule[edit]

NOTE This section will be modified throughout the course to meet the class's needs. Check back in weekly.


Week 1: Introduction[edit]

January 19

Assignment Due:

Readings (before class):

  • None

Class Schedule:

  • Class overview and expectations — We'll walk through this syllabus.


January 21

Assignment Due:

  • Read the entire syllabus (this document)

Week 2: Storytelling and Narratives[edit]

January 26

Assignment Due:


Readings (before class):


Class Schedule:

Week 3: Data insights and data stories[edit]

February 2

Assignment Due:

Readings:

Class Schedule:

  • Identifying insights
  • Counterfactual thinking
  • The role of statistics

Week 4: The ethics of data stories (Part I)[edit]

February 9

Assignment Due:

  • Turn in your Self Assessment Reflection on Brightspace
  • Case Study (Be prepared to talk about this case, based on the readings and the class so far)
  • No Discussion Questions (but feel free to have discussions on Discord!)

Readings:


Class Schedule:

  • Ethical frameworks
  • What are ethical data stories?
  • When do analysts need to make ethical decisions?
  • Transparency, respect, beneficence, honesty

Week 5: Where does data come from?[edit]

February 16

Assignment Due:

Readings:

Class Schedule:

Week 6: Introduction to R[edit]

February 23

Assignment Due:


Readings:

(Optional)

Class Schedule:

Week 7: Making figures in R[edit]

March 2

Assignment Due:

Readings:

Class Schedule:

  • ggplot2

Week 8: Manipulating and Aggregating Data[edit]

March 9

Assignment Due:

Readings:

Week 9: Visualization Principles[edit]

March 16

Assignment Due:


Readings:

Class Schedule:

March 18 - READING DAY

Week 10: Visualization Principles II and Exploratory Data Analysis[edit]

March 23

Assignment Due:

Readings:

Class Schedule:

  • Summarize and discuss readings
  • Peer feedback on data source + visualization project
  • R4DS Chapter 5 (continued)

Week 11: Text as data[edit]

March 30

Assignment Due:

  • Discussion questions - One discussion question and one or more examples of "bad" visualizations that you found

Readings:


Class Schedule:

Week 12: Advanced visualizations in R[edit]

April 6

Assignment Due:

Readings:

Class Schedule:

Week 13: Importing and cleaning data[edit]

April 13

READING DAY

  • Synchronous session moved to April 15

April 15

Assignment Due:


Readings:

Class schedule:

  • Provide peer feedback on final project proposal

Week 14: Crafting data stories[edit]

April 20

Assignment Due:

Readings:

Week 15: Ethics of data stories (Part II)[edit]

April 27

Assignment Due:

Readings:

Topics:

  • What does an ethical data story look like?

April 29

Assignment Due:

  • Peer feedback (via email or Discord)

Week 16: Finals week[edit]

Assignment Due:

Policies[edit]

Attendance[edit]

In general, I expect students to attend our Tuesday meetings and to typically attend our Thursday meetings. It is expected that students communicate well in advance to faculty so that arrangements can be made for making up the work that was missed. It is your responsibility to seek out support from classmates for notes, handouts, and other information.

Only the instructor can excuse a student from a course requirement or responsibility. When conflicts can be anticipated, such as for many University-sponsored activities and religious observations, the student should inform the instructor of the situation as far in advance as possible. For unanticipated or emergency conflicts, when advance notification to an instructor is not possible, the student should contact me as soon as possible on Discord or by email. In cases of bereavement, quarantine, or isolation, the student or the student’s representative should contact the Office of the Dean of Students via email or phone at 765-494-1747. Our course Brightspace includes a link to the Dean of Students under 'Campus Resources.'

Classroom Discussions and Peer Feedback[edit]

Throughout the course, you may receive, read, collaborate, and/or comment on classmates’ work. These assignments are for class use only. You may not share them with anybody outside of class without explicit written permission from the document’s author and pertaining to the specific piece.

It is essential to the success of this class that all participants feel comfortable discussing questions, thoughts, ideas, fears, reservations, apprehensions and confusion. Therefore, you may not create any audio or video recordings during class time nor share verbatim comments with those not in class linked to people’s identities unless you get clear and explicit permission. If you want to share general impressions or specifics of in-class discussions with those not in class, please do so without disclosing personal identities or details.


Academic Integrity[edit]

While I encourage collaboration, I expect that any work that you submit is your own. Basic guidelines for Purdue students are outlined here but I expect you to be exemplary members of the academic community. Please get in touch if you have any questions or concerns.


Nondiscrimination[edit]

I strongly support Purdue's policy of nondiscrimination (below). If you feel like any member of our classroom--including me--is not living up to these principles, then please come and talk to me about it.

Purdue University is committed to maintaining a community which recognizes and values the inherent worth and dignity of every person; fosters tolerance, sensitivity, understanding, and mutual respect among its members; and encourages each individual to strive to reach his or her own potential. In pursuit of its goal of academic excellence, the University seeks to develop and nurture diversity. The University believes that diversity among its many members strengthens the institution, stimulates creativity, promotes the exchange of ideas, and enriches campus life.


Accessibility[edit]

Purdue University strives to make learning experiences as accessible as possible. If you anticipate or experience physical or academic barriers based on disability, you are welcome to let me know so that we can discuss options. You are also encouraged to contact the Disability Resource Center at: drc@purdue.edu or by phone: 765-494-1247.


Emergency Preparation[edit]

In the event of a major campus emergency, I will update the requirements and deadlines as needed.


Mental Health[edit]

If you or someone you know is feeling overwhelmed, depressed, and/or in need of mental health support, services are available. For help, such individuals should contact Counseling and Psychological Services (CAPS) at 765-494-6995 during and after hours, on weekends and holidays, or by going to the CAPS office of the second floor of the Purdue University Student Health Center (PUSH) during business hours.


Incompletes[edit]

A grade of incomplete (I) will be given only in unusual circumstances. The request must describe the circumstances, along with a proposed timeline for completing the course work. Submitting a request does not ensure that an incomplete grade will be granted. If granted, you will be required to fill out and sign an “Incomplete Contract” form that will be turned in with the course grades. Any requests made after the course is completed will not be considered for an incomplete grade.


Additional Policies[edit]

Links to additional Purdue policies are on our Brightspace page. If you have questions about policies please get in touch.