Editing Statistics and Statistical Programming (Spring 2019)

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 14: Line 14:
:'''Teaching Assistant:''' [https://jeremydfoote.com/ Jeremy Foote] ([mailto:jdfoote@u.northwestern.edu jdfoote@u.northwestern.edu])
:'''Teaching Assistant:''' [https://jeremydfoote.com/ Jeremy Foote] ([mailto:jdfoote@u.northwestern.edu jdfoote@u.northwestern.edu])
::Office Hours: Tue/Wed 1-3pm (or by appointment)
::Office Hours: Tue/Wed 1-3pm (or by appointment)
:: FSB 2-419 (Next to CollabLab)


:'''Course Websites''':
:'''Course Websites''':
Line 66: Line 65:
I will also assigning several chapters from the following:
I will also assigning several chapters from the following:


* Reinhart, Alex. 2015. ''Statistics Done Wrong: The Woefully Complete Guide''. SF, CA: No Starch Press. ([https://search.library.northwestern.edu/primo-explore/fulldisplay?docid=01NWU_ALMA51732460650002441&context=L&vid=NULVNEW&search_scope=NWU&tab=default_tab&lang=en_US Safari online via NU libraries])
* Reinhart, Alex. 2015. ''Statistics Done Wrong: The Woefully Complete Guide''. SF, CA: No Starch Press. ([https://www.safaribooksonline.com/library/view/statistics-done-wrong/9781457189845/ Safari online via NU libraries])


This book provides a conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the NU library, but you may find it helpful to purchase.
This book provides a conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the NU library, but you may find it helpful to purchase.
Line 83: Line 82:
* [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution.
* [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution.
* [https://ggplot2.tidyverse.org/ ggplot2 documentation] — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it.
* [https://ggplot2.tidyverse.org/ ggplot2 documentation] — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it.
* [https://depts.washington.edu/madlab/proj/Rstats/ Statistical Analysis and Reporting in R] — A set of resources created and distributed by Jacob Wobbrock (University of Washington, School of Information) in conjunction with a MOOC he teaches. Contains cheatsheets, code snippets, and data to help execute commonly encountered statistical procedures in R.
* [https://www.datacamp.com DataCamp] offers introductory R courses. Northwestern usually has some free accounts that get passed out via Research Data Services each quarter. Apparently, if you are taking or teaching relevant coursework, instructors can [https://www.datacamp.com/groups/education request] free access to DataCamp for their courses from DataCamp. If folks are interested in this, I can reach out.
Computing resources:
* If you are planning to analyze large-scale data (i.e., data that won't fit in memory on your laptop) then you will want to sign up for a research allocation on Quest, which is Northwestern's high-performance computing cluster. Instructions on how to do that are [[Statistics_and_Statistical_Programming_(Spring_2019)/Quest_at_Northwestern|here]].


== Assignments ==
== Assignments ==
Line 101: Line 95:
* '''Empirical paper questions''' about other assigned readings.  
* '''Empirical paper questions''' about other assigned readings.  


You should submit your solutions to the programming challenges (feel free to submit the others if you like, but they're not required!) ahead of each class session. While I will not grade them, we will spend a good chunk of class going through the answers to the assignment due on that day.
You should submit your solutions to the programming challenges ahead of each class session. While I will not grade them, we will spend a good chunk of class going through the answers to the assignment due on that day.


Because randomness is extremely important in statistics, I will use a small R program to '''randomly call on''' students to walk through your answer to statistics questions and empirical paper questions in class. We'll then discuss the answers, address points of confusion, and consider alternative approaches as a group.
Because randomness is extremely important in statistics, I will use a small R program to '''randomly call on''' students to walk through your answer to statistics questions and empirical paper questions in class. We'll then discuss the answers, address points of confusion, and consider alternative approaches as a group.
Line 142: Line 136:


;Due date: Thursday, May 16, 2019
;Due date: Thursday, May 16, 2019
;Maximum length: ~5 pages
;Maximum length: 5 pages
 
The project planing document is a basic shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) Null hypotheses; (d) Conceptual diagram and/or explanation of the relationship you plan to test; (e) Measures; (e) Dummy tables. Descriptions of each of these planning document section are available [[TODO-planningdoc|on this wiki page]].


The project planning document is a basic shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) (Null) hypotheses; (d) Conceptual diagram and explanation of the relationship(s) you plan to test; (e) Measures; (f) Dummy tables/figures; (g) anticipated finding(s) and research contribution(s). Longer descriptions of each of these planning document sections (as well as a few others) can be found [[CommunityData:Planning document|on this wiki page]].
An exemplary planning document from public health researcher Mika Matsuzaki is [https://canvas.northwestern.edu online in Canavs]. Your diagram will likely be much less complicated than Matsuzaki's. Also, please don't be distracted by the fact that Matsuzaki does public health research. You can (and should!) emulate the form rather than the content. You can also check out [http://ajcn.nutrition.org/content/99/6/1450.full the published paper] to see how the project wound up.


I have also provided three example planning documents via our Canvas site:
Please note that the Matsuzaki planning document includes everything except a "Measures" section. Your Measures section should include a two column table where column 1 is the name of each variable in your analysis and column 2 describes the operationalization of each measures and (if necessary) how you will create it.
* [https://canvas.northwestern.edu/files/6908602/download?download_frd=1 One by public health researcher Mika Matsuzaki]. The first planning document I ever saw and still one of the best. It's missing a measures section. It's also focused on a research context that is probably very different from yours, but try not to get bogged down by that and imagine how you might map the structure of the document to your own work.
* [https://canvas.northwestern.edu/files/6919735/download?download_frd=1 One by Jim Maddock] created as part of a qualifying exam earlier in 2019. Jim doesn't provide dummy tables or anticipated findings/contributions, but he has an especially phenomenal explanation of the conceptual relationships and processes he wants to test.
* [https://canvas.northwestern.edu/files/6908606/download?download_frd=1 One provided as an appendix to Gerber and Green's excellent textbook, ''Field Experiments: Design, Analysis, and Interpretation'' (FEDAI)]. It's over-detailed and incredibly long for our purposes, but nevertheless an exemplary approach to planning empirical quantitative research in a careful, intentional way that is worthy of imitation.


==== Project presentation and paper ====
==== Project presentation and paper ====
Line 156: Line 149:
;Maximum length: 6000 words (~20 pages)
;Maximum length: 6000 words (~20 pages)


;Presentation due date: Thursday, May 30 or Thursday, June 6, 2019
;Presentation due date: Thursday, June 6, 2019
;Maximum length: 8 minutes
;Maximum length: 12 minutes




''The paper:'' Ideally, I expect you to produce a high quality short research paper that you might revise and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]].
''The paper:'' Ideally, I expect you to produce a high quality short research paper that you might revise and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]].


As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution.
As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. This can happen through Github. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution.


Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work.
Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work.
Line 168: Line 161:
I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper.
I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper.


I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., [https://cscw.acm.org/2019/submit-papers.html ACM SIGCHI CSCW format] or [https://www.apastyle.org/index APA 6th edition] ([https://templates.office.com/en-us/APA-style-report-6th-edition-TM03982351 Word] and [https://www.overleaf.com/latex/templates/sample-apa-paper/fswjbwygndyq LaTeX] templates)) that is applicable for a peer-reviewed journal or conference proceedings in which you aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources.
I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., <TODO link> ACM SIGCHI CSCW format or <TODO link> APA 6th edition) that is applicable for a peer-reviewed journal or conference proceedings in which you aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources.
 
'' [[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations|The presentation:]]'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (file will be posted to Canvas) may be useful.


: More details about the presentation goals, format suggestions, and more are available [[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations|on this page]]
'' The presentation:'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (link is in Canvas) may be useful.


=== Grading ===
=== Grading ===
Line 188: Line 179:
== Note on finding a dataset ==
== Note on finding a dataset ==


In order to complete your project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, there are many datasets to draw from. Some ideas are below. Jeremy and Aaron will also be available to help you brainstorm/find resources if needed:
In order to complete your project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, there are many datasets to draw from. Here are some ideas:


* Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze?
* Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze?
Line 196: Line 187:
* Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets.
* Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets.
* Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences.
* Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences.
* The City of Chicago has one of the best [https://data.cityofchicago.org/ data portal sites] of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring.
<!---
<!---
* <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help.
* <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help.
Line 242: Line 232:
* Verzani: §1 (Getting Started), §2 (Univariate data) [[https://canvas.northwestern.edu/verzani_ch1-ch2.pdf Available via Canvas]]
* Verzani: §1 (Getting Started), §2 (Univariate data) [[https://canvas.northwestern.edu/verzani_ch1-ch2.pdf Available via Canvas]]
* Verzani: §A (Programming)
* Verzani: §A (Programming)
* Healy: §2 (and skim the preferatory material as well as §1)
* Healy: Chapter 2 (and skim the preferatory material as well as Chapter 1)
'''Assignment (Complete before class):'''
'''Assignment (Complete before class):'''


Line 248: Line 238:


'''Lectures:'''
'''Lectures:'''
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w01-R_lecture.zip Week 1 R lecture materials] (.zip file)
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w01-R_lecture.Rmd Week 1 R lecture materials] (.Rmd file)
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s01-intro.webm Week 1 screencast (part 1, 23 minutes)] (the video should load directly in browser window)
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s01-intro.webm Week 1 screencast (part 1, 23 minutes)] (the video should load directly in browser window)
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s02-intro.webm Week 1 screencast (part 2, 27 minutes)]  
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s02-intro.webm Week 1 screencast (part 2, 27 minutes)]  
Line 257: Line 247:


=== Week 2: Thursday April 11: Probability and Visualization ===
=== Week 2: Thursday April 11: Probability and Visualization ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 2]]
<!---* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 2]]--->
* Questions? Topics you'd like to discuss? Add them to the [https://canvas.northwestern.edu/courses/90927/discussion_topics/601700 Canvas discussion] for this week's material.


'''Required Readings:'''
'''Required Readings:'''
Line 266: Line 255:


'''Recommended Readings:'''
'''Recommended Readings:'''
* Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) <!---[[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]]--->
* Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) <!---[[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]]
* [https://seeing-theory.brown.edu/ Seeing Theory] §1 (Basic Probability) and §2 (Compound Probability). (Note: this site provides a beautiful visual introduction to core concepts in probability and statistics).
<!---
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]]
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]]
--->
--->
* Healy: §3.
* Healy: Chapter 3.


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''
Line 279: Line 266:
'''Lectures:'''
'''Lectures:'''
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w02-R_lecture.Rmd Week 2 R lecture materials] (.Rmd file)
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w02-R_lecture.Rmd Week 2 R lecture materials] (.Rmd file)
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w02.webm Week 2 screencast (17 minutes)]
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w02.webm Week 2 screencast (20 minutes)]


'''Resources:'''
'''Resources:'''
Line 287: Line 274:


=== Week 3: Thursday April 18: Distributions ===
=== Week 3: Thursday April 18: Distributions ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 3]]


'''Required Readings:'''
'''Required Readings:'''
Line 296: Line 281:
'''Recommended Readings:'''
'''Recommended Readings:'''
* Verzani: §6 (Populations)
* Verzani: §6 (Populations)
* [https://seeing-theory.brown.edu/ Seeing Theory] §3 (Probability Distributions).


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''
Line 304: Line 288:
'''Lectures:'''
'''Lectures:'''


* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w03-R_lecture.Rmd Week 3 R lecture materials] (.Rmd file)
<!---
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w03.webm Week 3 screencast (19 minutes)]
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 3]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-loading_data_functions_apply_misc.ogv Week 3 R lecture screencast: Loading data, functions; apply(), lapply(), sapply(); several miscellaneous functions] (~34 minutes) — This is the same material I covered in class. If you followed it, there's no reason you need to go back to this.
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-dates_tapply_merge.ogv Week 3 R lecture screencast: Dates; tapply(); and merge()] (~38 minutes) [The audio seems to be broken for the last 10 minutes. Sorry about that! I've rerecorded that below.]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-merge.ogv Week 3 R lecture screencast: merge()] (~13 minutes) [Rerecording of the last few minutes of the previous video.]
--->


'''Resources:'''
'''Resources:'''
Line 311: Line 299:
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 3]]


=== Week 4: Thursday April 25: Statistical significance and hypothesis testing ===
=== Week 4: Thursday April 25: Statistical significance and hypothesis testing ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 4]]


'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference)
* Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference)
* Reinhart, Chapters 1, 4, and 5 (via Canvas).


'''Recommended Readings:'''
'''Recommended Readings:'''
* Verzani: §7 (Statistical inference), §8 (Confidence intervals)
* Verzani: §7 (Statistical inference), §8 (Confidence intervals)
* [https://seeing-theory.brown.edu/ Seeing Theory] §4 (Frequentist Inference)


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [https://docs.google.com/forms/d/e/1FAIpQLScMkAPwWQUjB4C5wtbkemkNZYjNl3ipO4Dg5wsORFmdfduEtA/viewform?usp=sf_link Mid-quarter course evaluation survey] (by Monday please!)
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]


'''Lectures:'''
'''Lectures:'''
*[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w04-R_lecture.Rmd Week 4 R lecture materials] (.Rmd file)
<!---
*(No screencast for this week)
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 4]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_04-misc_confint_simulation-20170125.ogv Week 4 R lecture screencast: order(); confidence intervals; simulations drawn from repeated random samples] (~27 minutes)
--->


'''Resources:'''
'''Resources:'''
Line 336: Line 325:
* [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 4]]


=== Week 5: Thursday May 2: Continuous Numeric Data & ANOVA ===
=== Week 5: Thursday May 2: Continuous Numeric Data & ANOVA ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 5|Session plan]]


'''Required Readings:'''
'''Required Readings:'''
Line 345: Line 333:
* Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
* Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
<!---* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF from Hill's website]]--->
<!---* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF from Hill's website]]--->
* Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. [[https://doi.org/10.1016/j.pubrev.2007.05.016 Available through NU Libraries]]
* Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. https://doi.org/10.1016/j.pubrev.2007.05.016 [Available through NU Libraries]
* Reinhart, §1


'''Recommended Readings:'''
'''Recommended Readings:'''
Line 357: Line 344:


'''Lectures:'''
'''Lectures:'''
* No new R material for this week.
<!---
<!---
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]]
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]]
Line 370: Line 356:
=== Week 6: Thursday May 9: Categorical data ===
=== Week 6: Thursday May 9: Categorical data ===


* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 6|Session plan]]
'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §6.1-6.4 (Inference for categorical data).
* Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data)
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]]
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]]
* Reinhart, §4 and §5.
* Reinhart, Ch. 9.


'''Recommended Readings:
'''Recommended Readings:'''
* Diez, Barr, and Çetinkaya-Rundel: §6.5-6.6 (Small samples and randomization inference)
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit)
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit)
* Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through NU Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.)
* Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through NU Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.)
Line 387: Line 371:


'''Lectures:'''
'''Lectures:'''
*[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w06-R_lecture.Rmd Week 6 R lecture materials] (.Rmd file)
<!---
*(No screencast for this week)
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 6]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes)
--->


'''Resources:'''
'''Resources:'''
Line 395: Line 381:


=== Week 7: Thursday May 16: Linear Regression ===
=== Week 7: Thursday May 16: Linear Regression ===
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 7|Session plan]]
 
'''Required Readings:'''
'''Required Readings:'''


* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression)
* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression); §8.1-8.3 (Multiple regression)
* OpenIntro eschews a mathematical approach to correlation. Look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attention to the formulas. It's tedious to compute, but you should be aware of what goes into it.
* OpenIntro eschews a mathematical instruction to correlation. Look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attentions to the formulas. It's tedious to compute but I'd like to you to at least see what goes into it.
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]]
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in NU libraries]]
* Reinhart, Ch 8.


'''Recommended Readings:'''
'''Recommended Readings:'''
* Verzani: §11.1-2 (Linear regression).
* Verzani: §11.1-2 (Linear regression),
* [https://seeing-theory.brown.edu/ Seeing Theory] §5 (Regression Analysis)


'''Assignment (Complete Before Class):'''
'''Assignment (Complete Before Class):'''


* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]]
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]]
* Final project planning document (see details above!)


'''Lectures:'''
'''Lectures:'''
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w07-R_lecture.Rmd Week 7 R lecture materials]
<!---
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 7]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_07-linear_regression.ogv Week 7 R lecture screencast: linear regression] (~42 minutes)
--->


'''Resources:'''
'''Resources:'''
Line 420: Line 408:


=== Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression ===
=== Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression ===
* [[Statistics_and_Statistical_Programming_(Spring_2019)/Session plan: Week 8|Session plan]]


'''Required Readings:'''
'''Required Readings:'''
* Diez, Barr, and Çetinkaya-Rundel: §8 (Multiple and logistic regression)
 
* [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully.
* [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully.
* (Revisit) Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]]
* Diez, Barr, and Çetinkaya-Rundel: §8.4 (Multiple and logistic regression)
* Reinhart, §8 and §9.
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]]


'''Recommended Readings:'''
'''Recommended Readings:'''
Line 438: Line 425:


'''Lectures:'''
'''Lectures:'''
*[https://communitydata.science/~ads/teaching/2019/stats/r_lectures/w08-R_lecture.Rmd Week 8 R lecture materials]
 
<!---
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 8]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_08-more_regression_anova_redux.ogv Week 8 R lecture screencast: more on linear regression, including interactions, polynomials, log transformations; anova] (~28 minutes)
--->


'''Resources:'''
'''Resources:'''
Line 446: Line 437:
* Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R]
* Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R]


=== Week 9: Thursday May 30: Loose ends and Final Presentations (part 1)  ===
=== Week 9: Thursday May 30: TBA ===


* [[Statistics_and_Statistical_Programming_(Spring_2019)/Session plan: Week 9|Session plan]]
Reserved for catch-up, supplementary topics, and maybe some final presentations.


'''Required readings:'''
'''Required readings:'''
* Reinhart, §10 and §11.
* Reinhart, Ch. 10-11.
 
'''[[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations|Final presentations]]: (part 1)'''
* First batch today. The rest next week.
 
'''Resources:'''
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w09-R_lecture.html Week 9 R-lecture] (we will use this in class)
 
=== Week 10: Thursday June 6: Fully reproducible research example, Replications, Final Presentations (part 2), and wrap-up ===
 
* Fully [https://www.overleaf.com/read/tkdpdcspwtkp reproducible research example].
* [https://canvas.northwestern.edu/courses/90927/files/folder/resources/Straub-Cook%20Replication Research replication study] by Polly Straub-Cook (UW Comm. Ph.D. student)
:: (n.b.: cluster & heteroscedasticity robust standard errors!)


* '''[[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations|Final presentations]]: (part 2)'''
=== Week 10: Thursday June 6: Final Presentations ===
:: Second batch of presenters today.
* Closing thoughts
:: What next? Beyond your final projects...
:: Class social gathering


Followed by much rejoicing!
Followed by much rejoicing!
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)