Editing Statistics and Statistical Programming (Spring 2019)

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 2: Line 2:


:'''Statistics and Statistical Programming'''
:'''Statistics and Statistical Programming'''
::Media, Technology & Society (MTS) 525
:'''MTS 525''' Media, Technology & Society
::Thursday 9-11:50am
:'''Northwestern University''' Spring 2019
::Frances Searle Building, Room 1-483
:'''Instructor:''' [http://aaronshaw.org Aaron Shaw] ([https://communication.northwestern.edu/faculty/AaronShaw Northwestern University])
::Spring, 2019
::Northwestern University
 
:'''Instructor:''' [http://aaronshaw.org Aaron Shaw] ([mailto:aaronshaw@northwestern.edu aaronshaw@northwestern.edu])
::Office Hours: M/Th 1-3pm (or by appointment)
::FSB 2-142
 
:'''Teaching Assistant:''' [https://jeremydfoote.com/ Jeremy Foote] ([mailto:jdfoote@u.northwestern.edu jdfoote@u.northwestern.edu])
::Office Hours: Tue/Wed 1-3pm (or by appointment)
:: FSB 2-419 (Next to CollabLab)
 
:'''Course Websites''':
:'''Course Websites''':
:* We will use [https://canvas.northwestern.edu/courses/90927 Canvas] for [https://canvas.northwestern.edu/courses/90927/announcements announcements], [https://canvas.northwestern.edu/courses/90927/assignments turning in most assignments], and [https://canvas.northwestern.edu/courses/90927/discussion_topics discussions].
:* We will use [https://canvas.northwestern.edu/courses/90927 Canvas] for [https://canvas.northwestern.edu/courses/90927/announcements announcements], [https://canvas.northwestern.edu/courses/90927/assignments turning in some assignments], and [https://canvas.northwestern.edu/courses/90927/discussion_topics discussions].
:* Everything else will be linked on this page.
:* Everything else will be linked on this page.
<!---:* List of student git repositories (will be a link)--->
:* List of student git repositories (will be a link)




Line 52: Line 41:
This syllabus will be a dynamic document that will evolve throughout the quarter. Although the core expectations are fixed, the details will shift. As a result, please keep in mind the following:
This syllabus will be a dynamic document that will evolve throughout the quarter. Although the core expectations are fixed, the details will shift. As a result, please keep in mind the following:


# I will not add readings or assignments less than one week before they are due. If I don't add something or fill in a "To Be Determined" one week before it's due, it is dropped. If you plan to read more than one week ahead, contact me first.
# I will not add readings or assignments less than one week before they are due. If I don't fill in a "To Be Determined" one week before it's due, it is dropped. If you plan to read more than one week ahead, contact me first.
# Closely monitor your email and/or [https://canvas.northwestern.edu the announcements section on the course website on Canvas]. When I make changes, these changes will be recorded in [http://wiki.communitydata.cc/ the history of this page] so that you can track what has changed. I will also do my best to summarize these changes in an announcement on Canvas that will be emailed to everybody in the class.
# Closely monitor your email and/or [https://canvas.northwestern.edu the announcements section on the course website on Canvas]. When I make changes, these changes will be recorded in [http://wiki.communitydata.cc/ the history of this page] so that you can track what has changed. I will also do my best to summarize these changes in an announcement on Canvas that will be emailed to everybody in the class.
# I will ask the class for voluntary anonymous feedback — especially toward the beginning of the quarter. Please let me know what is working and what can be improved. In the past, I have made many adjustments based on this feedback.
# I will ask the class for voluntary anonymous feedback — especially toward the beginning of the quarter. Please let me know what is working and what can be improved. In the past, I have made many adjustments based on this feedback.
Line 66: Line 55:
I will also assigning several chapters from the following:
I will also assigning several chapters from the following:


* Reinhart, Alex. 2015. ''Statistics Done Wrong: The Woefully Complete Guide''. SF, CA: No Starch Press. ([https://search.library.northwestern.edu/primo-explore/fulldisplay?docid=01NWU_ALMA51732460650002441&context=L&vid=NULVNEW&search_scope=NWU&tab=default_tab&lang=en_US Safari online via NU libraries])
* Reinhart, Alex. 2015. ''Statistics Done Wrong: The Woefully Complete Guide''. SF, CA: No Starch Press. ([https://www.safaribooksonline.com/library/view/statistics-done-wrong/9781457189845/ Safari online via NU libraries])


This book provides a conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the NU library, but you may find it helpful to purchase.
This book provides a conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the NU library, but you may find it helpful to purchase.
Line 83: Line 72:
* [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution.
* [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution.
* [https://ggplot2.tidyverse.org/ ggplot2 documentation] — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it.
* [https://ggplot2.tidyverse.org/ ggplot2 documentation] — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it.
* [https://depts.washington.edu/madlab/proj/Rstats/ Statistical Analysis and Reporting in R] — A set of resources created and distributed by Jacob Wobbrock (University of Washington, School of Information) in conjunction with a MOOC he teaches. Contains cheatsheets, code snippets, and data to help execute commonly encountered statistical procedures in R.
* [https://www.datacamp.com DataCamp] offers introductory R courses. Northwestern usually has some free accounts that get passed out via Research Data Services each quarter. Apparently, if you are taking or teaching relevant coursework, instructors can [https://www.datacamp.com/groups/education request] free access to DataCamp for their courses from DataCamp. If folks are interested in this, I can reach out.
Computing resources:
* If you are planning to analyze large-scale data (i.e., data that won't fit in memory on your laptop) then you will want to sign up for a research allocation on Quest, which is Northwestern's high-performance computing cluster. Instructions on how to do that are [[Statistics_and_Statistical_Programming_(Spring_2019)/Quest_at_Northwestern|here]].


== Assignments ==
== Assignments ==
Line 101: Line 85:
* '''Empirical paper questions''' about other assigned readings.  
* '''Empirical paper questions''' about other assigned readings.  


You should submit your solutions to the programming challenges (feel free to submit the others if you like, but they're not required!) ahead of each class session. While I will not grade them, we will spend a good chunk of class going through the answers to the assignment due on that day.
You should submit your solutions to the programming challenges ahead of each class session. While I will not grade them, we will spend a good chunk of class going through the answers to the assignment due on that day.


Because randomness is extremely important in statistics, I will use a small R program to '''randomly call on''' students to walk through your answer to statistics questions and empirical paper questions in class. We'll then discuss the answers, address points of confusion, and consider alternative approaches as a group.
Because randomness is extremely important in statistics, I will use a small R program to '''randomly call on''' students to walk through your answer to statistics questions and empirical paper questions in class. We'll then discuss the answers, address points of confusion, and consider alternative approaches as a group.
Line 107: Line 91:
For the programming challenges, you should submit code for your solutions before class (more on how in a moment) so we can walk through the material together. If you get completely stuck on a problem, that's okay, but please share whatever code you have so that you can tell us what you did and what you were thinking.
For the programming challenges, you should submit code for your solutions before class (more on how in a moment) so we can walk through the material together. If you get completely stuck on a problem, that's okay, but please share whatever code you have so that you can tell us what you did and what you were thinking.


Coming to class will be profoundly important to learning the material and to your final grade. Although the problem sets will not be graded, it is critical that you be present and able to discuss your answers to each of the questions. Your ability to do so will figure prominently in your participation grade for the course (40% of your final grade).
Coming to class will be profoundly important to learning the material and to your final grade. Although the problem sets will not be graded, it is critical that you be present and able to discuss your answers to each of the questions. Your ability to do so will figure prominently in your participation grade for the course (40% of your final grade). More on


I strongly encourage you to form groups to work on the problem sets if you find that helpful; however, you must still submit your work individually and respond to my cold-call prompts in class individually to help ensure that you learn and understand the material.
I strongly encourage you to form groups to work on the problem sets if you find that helpful; however, you must still submit your work individually and respond to my cold-call prompts in class individually to help ensure that you learn and understand the material.
Line 142: Line 126:


;Due date: Thursday, May 16, 2019
;Due date: Thursday, May 16, 2019
;Maximum length: ~5 pages
;Maximum length: 5 pages


The project planning document is a basic shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) (Null) hypotheses; (d) Conceptual diagram and explanation of the relationship(s) you plan to test; (e) Measures; (f) Dummy tables/figures; (g) anticipated finding(s) and research contribution(s). Longer descriptions of each of these planning document sections (as well as a few others) can be found [[CommunityData:Planning document|on this wiki page]].
The project planing document is a basic shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) Null hypotheses; (d) Conceptual diagram and/or explanation of the relationship you plan to test; (e) Measures; (e) Dummy tables. Descriptions of each of these planning document section are available [[TODO-planningdoc|on this wiki page]].


I have also provided three example planning documents via our Canvas site:
An exemplary planning document from public health researcher Mika Matsuzaki is [https://canvas.northwestern.edu online in Canavs]. Your diagram will likely be much less complicated than Matsuzaki's. Also, please don't be distracted by the fact that Matsuzaki does public health research. You can (and should!) emulate the form rather than the content. You can also check out [http://ajcn.nutrition.org/content/99/6/1450.full the published paper] to see how the project wound up.
* [https://canvas.northwestern.edu/files/6908602/download?download_frd=1 One by public health researcher Mika Matsuzaki]. The first planning document I ever saw and still one of the best. It's missing a measures section. It's also focused on a research context that is probably very different from yours, but try not to get bogged down by that and imagine how you might map the structure of the document to your own work.
 
* [https://canvas.northwestern.edu/files/6919735/download?download_frd=1 One by Jim Maddock] created as part of a qualifying exam earlier in 2019. Jim doesn't provide dummy tables or anticipated findings/contributions, but he has an especially phenomenal explanation of the conceptual relationships and processes he wants to test.  
Please note that the Matsuzaki planning document includes everything except a "Measures" section. Your Measures section should include a two column table where column 1 is the name of each variable in your analysis and column 2 describes the operationalization of each measures and (if necessary) how you will create it.
* [https://canvas.northwestern.edu/files/6908606/download?download_frd=1 One provided as an appendix to Gerber and Green's excellent textbook, ''Field Experiments: Design, Analysis, and Interpretation'' (FEDAI)]. It's over-detailed and incredibly long for our purposes, but nevertheless an exemplary approach to planning empirical quantitative research in a careful, intentional way that is worthy of imitation.


==== Project presentation and paper ====
==== Project presentation and paper ====
Line 156: Line 139:
;Maximum length: 6000 words (~20 pages)
;Maximum length: 6000 words (~20 pages)


;Presentation due date: Thursday, May 30 or Thursday, June 6, 2019
;Presentation due date: Thursday, June 6, 2019
;Maximum length: 8 minutes
;Maximum length: 12 minutes




''The paper:'' Ideally, I expect you to produce a high quality short research paper that you might revise and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]].
''The paper:'' Ideally, I expect you to produce a high quality short research paper that you might revise and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]].


As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution.
As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. This can happen through Github. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution.


Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work.
Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work.
Line 168: Line 151:
I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper.
I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper.


I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., [https://cscw.acm.org/2019/submit-papers.html ACM SIGCHI CSCW format] or [https://www.apastyle.org/index APA 6th edition] ([https://templates.office.com/en-us/APA-style-report-6th-edition-TM03982351 Word] and [https://www.overleaf.com/latex/templates/sample-apa-paper/fswjbwygndyq LaTeX] templates)) that is applicable for a peer-reviewed journal or conference proceedings in which you aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources.
I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., <TODO link> ACM SIGCHI CSCW format or <TODO link> APA 6th edition) that is applicable for a peer-reviewed journal or conference proceedings in which you aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources.
 
'' [[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations|The presentation:]]'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (file will be posted to Canvas) may be useful.


: More details about the presentation goals, format suggestions, and more are available [[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations|on this page]]
'' The presentation:'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (link is in Canvas) may be useful.


=== Grading ===
=== Grading ===
Line 184: Line 165:
* Final project paper: 40%
* Final project paper: 40%


My assessment of your paper will reflect the clarity of the written work, the effective execution and presentation of quantitative empirical analysis, as well as the quality and originality of the analysis. Throughout the quarter, we will talk a lot about the qualities of exemplary quantitative research. I expect your final project to embody these exemplary qualities.
My assessment of your paper will reflect the clarity of the written work, the effective execution and presentation of quantitative empirical analysis, as well as the quality and originality of the analysis. Throughout the quarter, we will talk a lot about the qualities of exemplary quantitative research. I expect your final project to embody these exemplary qualities.  


== Note on finding a dataset ==
== Note on finding a dataset ==


In order to complete your project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, there are many datasets to draw from. Some ideas are below. Jeremy and Aaron will also be available to help you brainstorm/find resources if needed:
In order to complete your project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, there are many datasets to draw from. Here are some ideas:


* Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze?
* Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze?
Line 196: Line 177:
* Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets.
* Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets.
* Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences.
* Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences.
* The City of Chicago has one of the best [https://data.cityofchicago.org/ data portal sites] of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring.
* <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help.  
<!---
* <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help.
-->
* [http://fivethirtyeight.com FiveThirtyEight.com] has published a [https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html GitHub repository and an R package] with pre-processed and cleaned versions of many of the datasets they use for articles published on their website.  
* [http://fivethirtyeight.com FiveThirtyEight.com] has published a [https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html GitHub repository and an R package] with pre-processed and cleaned versions of many of the datasets they use for articles published on their website.  


Line 214: Line 192:


When it comes to the statistics material, this will mostly be a so-called "flipped" classroom. This means we will rely on the textbook and other resources to introduce the material and we will use the class sessions to discuss questions as they come up.
When it comes to the statistics material, this will mostly be a so-called "flipped" classroom. This means we will rely on the textbook and other resources to introduce the material and we will use the class sessions to discuss questions as they come up.
The problem sets each week will


Although the day-to-day routine will vary, each class session will generally include the following:
Although the day-to-day routine will vary, each class session will generally include the following:
* Quick updates about assignments, projects, and meta-discussion about the class.
* Quick updates about assignments, projects, and meta-discussion about the class.
* Discussion of '''programming challenges''' due that day (and related to the previous week's R lecture materials).
* Discussion of '''programming challenges''' due that day.
* [''Sometimes''] Short lecture and/or Q&A about new material in Diez, Barr, and Çetinkaya-Rundel.
* Discussion of  '''statistics questions''' related to new material in Diez, Barr, and Çetinkaya-Rundel.
* Discussion of  '''statistics questions''' related to new material in Diez, Barr, and Çetinkaya-Rundel.
* Discussion of any exemplary empirical paper we have read and the '''empirical paper questions'''.
* Discussion of any exemplary empirical paper we have read.
* [''Sometimes''] Interactive lecture introducing new statistical programming concepts.


== Schedule ==
== Schedule ==
Line 229: Line 207:
=== Week 1: Thursday April 4: Introduction, Setup, and Data and Variables ===
=== Week 1: Thursday April 4: Introduction, Setup, and Data and Variables ===


* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 1]]
Please complete the readings prior to class so that we can discuss them and start talking through some of the examples in R together.
 
Please complete the readings and assignment prior to class so that we can discuss them and start talking through some of the examples in R together.


'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data)
* Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data)
* Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks. ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Open Access]]
* Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Available through NU libraries]]


'''Recommended Readings:'''
'''Recommended Readings:'''
Line 242: Line 218:
* Verzani: §1 (Getting Started), §2 (Univariate data) [[https://canvas.northwestern.edu/verzani_ch1-ch2.pdf Available via Canvas]]
* Verzani: §1 (Getting Started), §2 (Univariate data) [[https://canvas.northwestern.edu/verzani_ch1-ch2.pdf Available via Canvas]]
* Verzani: §A (Programming)
* Verzani: §A (Programming)
* Healy: §2 (and skim the preferatory material as well as §1)
* Healy: Chapter 2 (and skim the preferatory material as well as Chapter 1)
'''Assignment (Complete before class):'''
'''Assignment (Complete before class):'''


* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 1]]
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 1]]


'''Lectures:'''
'''R screencasts:'''
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w01-R_lecture.zip Week 1 R lecture materials] (.zip file)
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w01-introduction.zip Week 1 R lecture materials] (.zip file)
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s01-intro.webm Week 1 screencast (part 1, 23 minutes)] (the video should load directly in browser window)
* [https://communitydata.cc/~mako/2017-COM521/com521-week_01-r_programming_intro-20170103.ogv Week 1 R lecture screencast (Part I): Introduction to R and univariate statistics] (~1 hour 47 minutes)
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s02-intro.webm Week 1 screencast (part 2, 27 minutes)]  
* [https://communitydata.cc/~mako/2017-COM521/com521-week_01-github_rscripts-20170104.ogv Week 1 R lecture screencast (Part II): Setting up git/GitHub and saving files in RStudio] (~40 minutes)
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 1]]


'''Resources:'''
'''Resources:'''
* [https://www.openintro.org/download.php?file=os3_slides_01&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §1 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_01&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §1 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including some for §1
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including some for §1
* [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 1]]


=== Week 2: Thursday April 11: Probability and Visualization ===
=== Week 2: Thursday April 11: Probability and Visualization ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 2]]
* Questions? Topics you'd like to discuss? Add them to the [https://canvas.northwestern.edu/courses/90927/discussion_topics/601700 Canvas discussion] for this week's material.


'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §2 (Probability)
* Diez, Barr, and Çetinkaya-Rundel: §2 (Probability)
* Shaw, Aaron and Yochai Benkler. 2012. A tale of two blogospheres: Discursive practices on the left and right. ''American Behavioral Scientist''. 56(4): 459-487. [[https://doi.org/10.1177%2F0002764211433793 available via NU libraries]]
* Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) [[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]]
 
'''Recommended Readings:'''
* Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) <!---[[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]]--->
* [https://seeing-theory.brown.edu/ Seeing Theory] §1 (Basic Probability) and §2 (Compound Probability). (Note: this site provides a beautiful visual introduction to core concepts in probability and statistics).
<!---
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]]
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]]
--->
* Healy: §3.


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 2]]
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 2]]


'''Lectures:'''
'''Lectures:'''
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w02-R_lecture.Rmd Week 2 R lecture materials] (.Rmd file)
 
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w02.webm Week 2 screencast (17 minutes)]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 2]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_02-lists_dataframes_graphing-20170111.ogv Week 2 R lecture screencast: lists, matrixes, data frames, and beginning graphing] (~1 hour 8 minutes)


'''Resources:'''
'''Resources:'''
Line 285: Line 255:
* [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes]
* [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2
* [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2
* [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 2]]


=== Week 3: Thursday April 18: Distributions ===
=== Week 3: Thursday April 18: Distributions ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 3]]


'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §3.1-3.2, §3.4: You should read the rest of the chapter (§3.3 and §3.5). I won't assign problem set questions about it but it's still important to be familiar with.
* Diez, Barr, and Çetinkaya-Rundel: §3.1-3.2, §3.4: You should read the rest of the chapter (§3.3 and §3.5). I won't assign problem set questions about it but it's still important to be familiar with.
'''Recommended Readings:'''
* Verzani: §6 (Populations)
* Verzani: §6 (Populations)
* [https://seeing-theory.brown.edu/ Seeing Theory] §3 (Probability Distributions).


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 3]]
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 3]]


'''Lectures:'''
'''Lectures:'''


* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w03-R_lecture.Rmd Week 3 R lecture materials] (.Rmd file)
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 3]]
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w03.webm Week 3 screencast (19 minutes)]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-loading_data_functions_apply_misc.ogv Week 3 R lecture screencast: Loading data, functions; apply(), lapply(), sapply(); several miscellaneous functions] (~34 minutes) — This is the same material I covered in class. If you followed it, there's no reason you need to go back to this.
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-dates_tapply_merge.ogv Week 3 R lecture screencast: Dates; tapply(); and merge()] (~38 minutes) [The audio seems to be broken for the last 10 minutes. Sorry about that! I've rerecorded that below.]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-merge.ogv Week 3 R lecture screencast: merge()] (~13 minutes) [Rerecording of the last few minutes of the previous video.]


'''Resources:'''
'''Resources:'''
Line 311: Line 279:
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2
* [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 3]]


=== Week 4: Thursday April 25: Statistical significance and hypothesis testing ===
=== Week 4: Thursday April 25: Statistical significance and hypothesis testing ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 4]]


'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference)
* Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference)
'''Recommended Readings:'''
* Verzani: §7 (Statistical inference), §8 (Confidence intervals)
* Verzani: §7 (Statistical inference), §8 (Confidence intervals)
* [https://seeing-theory.brown.edu/ Seeing Theory] §4 (Frequentist Inference)


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [https://docs.google.com/forms/d/e/1FAIpQLScMkAPwWQUjB4C5wtbkemkNZYjNl3ipO4Dg5wsORFmdfduEtA/viewform?usp=sf_link Mid-quarter course evaluation survey] (by Monday please!)
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 4]]
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]


'''Lectures:'''
'''Lectures:'''
*[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w04-R_lecture.Rmd Week 4 R lecture materials] (.Rmd file)
 
*(No screencast for this week)
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 4]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_04-misc_confint_simulation-20170125.ogv Week 4 R lecture screencast: order(); confidence intervals; simulations drawn from repeated random samples] (~27 minutes)


'''Resources:'''
'''Resources:'''
Line 336: Line 301:
* [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4
* [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 4]]


=== Week 5: Thursday May 2: Continuous Numeric Data & ANOVA ===
=== Week 5: Thursday May 2: Continuous Numeric Data & ANOVA ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 5|Session plan]]


'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
* Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
<!---* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF from Hill's website]]--->
* Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. [[https://doi.org/10.1016/j.pubrev.2007.05.016 Available through NU Libraries]]
* Reinhart, §1
'''Recommended Readings:'''
* Verzani: §9 (significance tests), §12 (Analysis of variance)
* Verzani: §9 (significance tests), §12 (Analysis of variance)
* Gelman, Andrew and Hal Stern. 2006. “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant.” ''The American Statistician'' 60(4):328–31. [[http://dx.doi.org/10.1198/000313006X152649 Available through NU Libraries]]
* Gelman, Andrew and Hal Stern. 2006. “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant.” ''The American Statistician'' 60(4):328–31. [[http://dx.doi.org/10.1198/000313006X152649 Available through UW Libraries]]
* Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. https://doi.org/10.1016/j.pubrev.2007.05.016 [Available through UW Libraries]
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]]


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 5]]
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 5]]


'''Lectures:'''
'''Lectures:'''
* No new R material for this week.
 
<!---
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 5]]
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_05-ttests_and_anova.ogv Week 5 R lecture screencast: t-tests] (~22 minutes)
* [https://communitydata.cc/~mako/2017-COM521/com521-week_05-ttests_and_anova.ogv Week 5 R lecture screencast: t-tests] (~22 minutes)
* [https://communitydata.cc/~mako/2017-COM521/com521-week_05-for_if.ogv Week 5 R lecture screencast: for loops and if statements] (~12 minutes)
* [https://communitydata.cc/~mako/2017-COM521/com521-week_05-for_if.ogv Week 5 R lecture screencast: for loops and if statements] (~12 minutes)
--->


'''Resources:'''
'''Resources:'''
Line 370: Line 329:
=== Week 6: Thursday May 9: Categorical data ===
=== Week 6: Thursday May 9: Categorical data ===


* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 6|Session plan]]
'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §6.1-6.4 (Inference for categorical data).
* Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data)
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]]
* Reinhart, §4 and §5.
 
'''Recommended Readings:
* Diez, Barr, and Çetinkaya-Rundel: §6.5-6.6 (Small samples and randomization inference)
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit)
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit)
* Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through NU Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.)
* Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through UW Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.)
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]]


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 6]]
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 6]]


'''Lectures:'''
'''Lectures:'''
*[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w06-R_lecture.Rmd Week 6 R lecture materials] (.Rmd file)
 
*(No screencast for this week)
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 6]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes)


'''Resources:'''
'''Resources:'''
* [https://www.openintro.org/download.php?file=os3_slides_06&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §6 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_06&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §6 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7


=== Week 7: Thursday May 16: Linear Regression ===
=== Week 7: Thursday May 16: Linear Regression ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 7|Session plan]]
 
'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression)
* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression); §8.1-8.3 (Multiple regression)
* OpenIntro eschews a mathematical approach to correlation. Look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attention to the formulas. It's tedious to compute, but you should be aware of what goes into it.
* OpenIntro eschews a mathematical instruction to correlation. Can you look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attentions to the formulas. It's tedious to compute but I'd like to you to at least see what goes into it.
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]]
* Verzani: §11.1-2 (Linear regression),
 
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in UW libraries]]
'''Recommended Readings:'''
* Verzani: §11.1-2 (Linear regression).
* [https://seeing-theory.brown.edu/ Seeing Theory] §5 (Regression Analysis)


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]]
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 7]]
* Final project planning document (see details above!)


'''Lectures:'''
'''Lectures:'''
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w07-R_lecture.Rmd Week 7 R lecture materials]
 
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 7]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_07-linear_regression.ogv Week 7 R lecture screencast: linear regression] (~42 minutes)


'''Resources:'''
'''Resources:'''
* [https://www.openintro.org/download.php?file=os3_slides_07&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §7 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_07&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §7 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
Line 420: Line 375:


=== Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression ===
=== Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression ===
* [[Statistics_and_Statistical_Programming_(Spring_2019)/Session plan: Week 8|Session plan]]


'''Required Readings:'''
'''Required Readings:'''
* Diez, Barr, and Çetinkaya-Rundel: §8 (Multiple and logistic regression)
 
* [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully.
* [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully.
* (Revisit) Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]]
* Diez, Barr, and Çetinkaya-Rundel: §8.4 (Multiple and logistic regression)
* Reinhart, §8 and §9.
 
'''Recommended Readings:'''
* Verzani: §11.3 (Linear regression), §13.1 (Logistic regression)
* Verzani: §11.3 (Linear regression), §13.1 (Logistic regression)
* Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” ''PLoS Medicine'' 2(8):e124. [[http://dx.doi.org/10.1371%2Fjournal.pmed.0020124 Open Access]]
* Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” ''PLoS Medicine'' 2(8):e124. [[http://dx.doi.org/10.1371%2Fjournal.pmed.0020124 Open Access]]
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in UW libraries]]
'''Optional Readings:'''
* Head, Megan L., Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” ''PLOS Biology'' 13(3):e1002106. [[http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106 Open Access]]
* Head, Megan L., Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” ''PLOS Biology'' 13(3):e1002106. [[http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106 Open Access]]


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 8]]
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 8]]


'''Lectures:'''
'''Lectures:'''
*[https://communitydata.science/~ads/teaching/2019/stats/r_lectures/w08-R_lecture.Rmd Week 8 R lecture materials]
 
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 8]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_08-more_regression_anova_redux.ogv Week 8 R lecture screencast: more on linear regression, including interactions, polynomials, log transformations; anova] (~28 minutes)


'''Resources:'''
'''Resources:'''
Line 444: Line 401:
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4
* Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R]
* I've written this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R]


=== Week 9: Thursday May 30: Loose ends and Final Presentations (part 1)  ===
=== Week 9: Thursday May 30: TBA ===


* [[Statistics_and_Statistical_Programming_(Spring_2019)/Session plan: Week 9|Session plan]]
Reserved for catch-up, supplementary topics, and maybe some final presentations.


'''Required readings:'''
=== Week 10: Thursday June 6: Final Presentations ===
* Reinhart, §10 and §11.
 
'''[[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations|Final presentations]]: (part 1)'''
* First batch today. The rest next week.
 
'''Resources:'''
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w09-R_lecture.html Week 9 R-lecture] (we will use this in class)
 
=== Week 10: Thursday June 6: Fully reproducible research example, Replications, Final Presentations (part 2), and wrap-up ===
 
* Fully [https://www.overleaf.com/read/tkdpdcspwtkp reproducible research example].
* [https://canvas.northwestern.edu/courses/90927/files/folder/resources/Straub-Cook%20Replication Research replication study] by Polly Straub-Cook (UW Comm. Ph.D. student)
:: (n.b.: cluster & heteroscedasticity robust standard errors!)
 
* '''[[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations|Final presentations]]: (part 2)'''
:: Second batch of presenters today.
* Closing thoughts
:: What next? Beyond your final projects...
:: Class social gathering


Followed by much rejoicing!
Followed by much rejoicing!
Line 511: Line 449:
I receive too much email and I sometimes fail to keep up. If, for some reason, I do not respond to a message related to this course within 48 hours, please do not take it personally and feel free to re-send the message with a polite reminder. This will help me and I will not resent you for it.
I receive too much email and I sometimes fail to keep up. If, for some reason, I do not respond to a message related to this course within 48 hours, please do not take it personally and feel free to re-send the message with a polite reminder. This will help me and I will not resent you for it.


=== Office Hours ===
TBA.


=== Credit and Notes ===
=== Credit and Notes ===


This syllabus has, in ways that should be obvious, borrowed and built on the [https://www.openintro.org/stat/index.php OpenInto Statistics curriculum]. I also based nearly every aspect of the course design on Benjamin Mako Hill's [[Statistics_and_Statistical_Programming_(Winter_2017)|COM 521 class]].
This syllabus has, in ways that should be obvious, borrowed and built on the [https://www.openintro.org/stat/index.php OpenInto Statistics curriculum]. I also based nearly every aspect of the course design on Benjamin Mako Hill's [[Statistics_and_Statistical_Programming_(Winter_2017)|COM 521 class]].
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)