Editing Statistics and Statistical Programming (Spring 2019)
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 85: | Line 85: | ||
* [https://depts.washington.edu/madlab/proj/Rstats/ Statistical Analysis and Reporting in R] — A set of resources created and distributed by Jacob Wobbrock (University of Washington, School of Information) in conjunction with a MOOC he teaches. Contains cheatsheets, code snippets, and data to help execute commonly encountered statistical procedures in R. | * [https://depts.washington.edu/madlab/proj/Rstats/ Statistical Analysis and Reporting in R] — A set of resources created and distributed by Jacob Wobbrock (University of Washington, School of Information) in conjunction with a MOOC he teaches. Contains cheatsheets, code snippets, and data to help execute commonly encountered statistical procedures in R. | ||
* [https://www.datacamp.com DataCamp] offers introductory R courses. Northwestern usually has some free accounts that get passed out via Research Data Services each quarter. Apparently, if you are taking or teaching relevant coursework, instructors can [https://www.datacamp.com/groups/education request] free access to DataCamp for their courses from DataCamp. If folks are interested in this, I can reach out. | * [https://www.datacamp.com DataCamp] offers introductory R courses. Northwestern usually has some free accounts that get passed out via Research Data Services each quarter. Apparently, if you are taking or teaching relevant coursework, instructors can [https://www.datacamp.com/groups/education request] free access to DataCamp for their courses from DataCamp. If folks are interested in this, I can reach out. | ||
== Assignments == | == Assignments == | ||
Line 142: | Line 139: | ||
;Due date: Thursday, May 16, 2019 | ;Due date: Thursday, May 16, 2019 | ||
;Maximum length: | ;Maximum length: 5 pages | ||
The project planing document is a basic shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) (Null) hypotheses; (d) Conceptual diagram and/or explanation of the relationship you plan to test; (e) Measures; (e) Dummy tables/figures. Descriptions of each of these planning document sections as well as an exemplary example will be available [[TODO-planningdoc|on this wiki page]]. | |||
<!--- | |||
An exemplary planning document from public health researcher Mika Matsuzaki is [https://canvas.northwestern.edu online in Canavs]. Your diagram will likely be much less complicated than Matsuzaki's. Also, please don't be distracted by the fact that Matsuzaki does public health research. You can (and should!) emulate the form rather than the content. You can also check out [http://ajcn.nutrition.org/content/99/6/1450.full the published paper] to see how the project wound up. | |||
Please note that the Matsuzaki planning document includes everything except a "Measures" section. Your Measures section should include a two column table where column 1 is the name of each variable in your analysis and column 2 describes the operationalization of each measures and (if necessary) how you will create it. | |||
---> | |||
==== Project presentation and paper ==== | ==== Project presentation and paper ==== | ||
Line 156: | Line 154: | ||
;Maximum length: 6000 words (~20 pages) | ;Maximum length: 6000 words (~20 pages) | ||
;Presentation due date: | ;Presentation due date: Thursday, June 6, 2019 | ||
;Maximum length: | ;Maximum length: 12 minutes | ||
Line 170: | Line 168: | ||
I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., [https://cscw.acm.org/2019/submit-papers.html ACM SIGCHI CSCW format] or [https://www.apastyle.org/index APA 6th edition] ([https://templates.office.com/en-us/APA-style-report-6th-edition-TM03982351 Word] and [https://www.overleaf.com/latex/templates/sample-apa-paper/fswjbwygndyq LaTeX] templates)) that is applicable for a peer-reviewed journal or conference proceedings in which you aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources. | I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., [https://cscw.acm.org/2019/submit-papers.html ACM SIGCHI CSCW format] or [https://www.apastyle.org/index APA 6th edition] ([https://templates.office.com/en-us/APA-style-report-6th-edition-TM03982351 Word] and [https://www.overleaf.com/latex/templates/sample-apa-paper/fswjbwygndyq LaTeX] templates)) that is applicable for a peer-reviewed journal or conference proceedings in which you aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources. | ||
'' | '' The presentation:'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (file will be posted to Canvas) may be useful. | ||
=== Grading === | === Grading === | ||
Line 287: | Line 283: | ||
=== Week 3: Thursday April 18: Distributions === | === Week 3: Thursday April 18: Distributions === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 311: | Line 305: | ||
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2 | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 3]] | |||
=== Week 4: Thursday April 25: Statistical significance and hypothesis testing === | === Week 4: Thursday April 25: Statistical significance and hypothesis testing === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 330: | Line 324: | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w04-R_lecture.Rmd Week 4 R lecture materials] (.Rmd file) | *[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w04-R_lecture.Rmd Week 4 R lecture materials] (.Rmd file) | ||
*(No screencast for this week) | *(No screencast recorded for this week) | ||
'''Resources:''' | '''Resources:''' | ||
Line 336: | Line 330: | ||
* [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4 | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 4]] | |||
=== Week 5: Thursday May 2: Continuous Numeric Data & ANOVA === | === Week 5: Thursday May 2: Continuous Numeric Data & ANOVA === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 345: | Line 338: | ||
* Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) | * Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) | ||
<!---* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF from Hill's website]]---> | <!---* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF from Hill's website]]---> | ||
* Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. | * Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. [https://doi.org/10.1016/j.pubrev.2007.05.016 Available through NU Libraries] | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
Line 357: | Line 349: | ||
'''Lectures:''' | '''Lectures:''' | ||
<!--- | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]] | * [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]] | ||
Line 370: | Line 361: | ||
=== Week 6: Thursday May 9: Categorical data === | === Week 6: Thursday May 9: Categorical data === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §6 | * Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data) | ||
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]] | * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]] | ||
* Reinhart, §4 and §5. | * Reinhart, §1, §4, and §5. | ||
'''Recommended Readings: | '''Recommended Readings:''' | ||
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) | * Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) | ||
* Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through NU Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.) | * Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through NU Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.) | ||
Line 387: | Line 376: | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata.cc/~ | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 6]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 395: | Line 386: | ||
=== Week 7: Thursday May 16: Linear Regression === | === Week 7: Thursday May 16: Linear Regression === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression) | * Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression); §8.1-8.3 (Multiple regression) | ||
* OpenIntro eschews a mathematical | * OpenIntro eschews a mathematical introduction to correlation. Look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attention to the formulas. It's tedious to compute, but I'd like to you to at least see what goes into it. | ||
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]] | * Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]] | ||
* Reinhart, §8 and §9. | |||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
Line 409: | Line 401: | ||
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]] | * [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]] | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~ | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 7]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_07-linear_regression.ogv Week 7 R lecture screencast: linear regression] (~42 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 420: | Line 414: | ||
=== Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression === | === Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully. | * [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully. | ||
* ( | * Diez, Barr, and Çetinkaya-Rundel: §8.4 (Multiple and logistic regression) | ||
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]] | |||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
Line 438: | Line 431: | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata. | |||
<!--- | |||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 8]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_08-more_regression_anova_redux.ogv Week 8 R lecture screencast: more on linear regression, including interactions, polynomials, log transformations; anova] (~28 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 446: | Line 443: | ||
* Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R] | * Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R] | ||
=== Week 9: Thursday May 30: | === Week 9: Thursday May 30: TBA === | ||
Reserved for catch-up, supplementary topics, and maybe some final presentations. | |||
'''Required readings:''' | '''Required readings:''' | ||
* Reinhart, §10 and §11. | * Reinhart, §10 and §11. | ||
=== Week 10: Thursday June 6: Final Presentations === | |||
=== Week 10: Thursday June 6: | |||
Followed by much rejoicing! | Followed by much rejoicing! |