Editing Statistics and Statistical Programming (Spring 2019)
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 14: | Line 14: | ||
:'''Teaching Assistant:''' [https://jeremydfoote.com/ Jeremy Foote] ([mailto:jdfoote@u.northwestern.edu jdfoote@u.northwestern.edu]) | :'''Teaching Assistant:''' [https://jeremydfoote.com/ Jeremy Foote] ([mailto:jdfoote@u.northwestern.edu jdfoote@u.northwestern.edu]) | ||
::Office Hours: Tue/Wed 1-3pm (or by appointment) | ::Office Hours: Tue/Wed 1-3pm (or by appointment) | ||
:'''Course Websites''': | :'''Course Websites''': | ||
Line 66: | Line 65: | ||
I will also assigning several chapters from the following: | I will also assigning several chapters from the following: | ||
* Reinhart, Alex. 2015. ''Statistics Done Wrong: The Woefully Complete Guide''. SF, CA: No Starch Press. ([https:// | * Reinhart, Alex. 2015. ''Statistics Done Wrong: The Woefully Complete Guide''. SF, CA: No Starch Press. ([https://www.safaribooksonline.com/library/view/statistics-done-wrong/9781457189845/ Safari online via NU libraries]) | ||
This book provides a conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the NU library, but you may find it helpful to purchase. | This book provides a conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the NU library, but you may find it helpful to purchase. | ||
Line 83: | Line 82: | ||
* [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution. | * [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution. | ||
* [https://ggplot2.tidyverse.org/ ggplot2 documentation] — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it. | * [https://ggplot2.tidyverse.org/ ggplot2 documentation] — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it. | ||
== Assignments == | == Assignments == | ||
Line 101: | Line 95: | ||
* '''Empirical paper questions''' about other assigned readings. | * '''Empirical paper questions''' about other assigned readings. | ||
You should submit your solutions to the programming challenges | You should submit your solutions to the programming challenges ahead of each class session. While I will not grade them, we will spend a good chunk of class going through the answers to the assignment due on that day. | ||
Because randomness is extremely important in statistics, I will use a small R program to '''randomly call on''' students to walk through your answer to statistics questions and empirical paper questions in class. We'll then discuss the answers, address points of confusion, and consider alternative approaches as a group. | Because randomness is extremely important in statistics, I will use a small R program to '''randomly call on''' students to walk through your answer to statistics questions and empirical paper questions in class. We'll then discuss the answers, address points of confusion, and consider alternative approaches as a group. | ||
Line 142: | Line 136: | ||
;Due date: Thursday, May 16, 2019 | ;Due date: Thursday, May 16, 2019 | ||
;Maximum length: | ;Maximum length: 5 pages | ||
The project planing document is a basic shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) Null hypotheses; (d) Conceptual diagram and/or explanation of the relationship you plan to test; (e) Measures; (e) Dummy tables. Descriptions of each of these planning document section are available [[TODO-planningdoc|on this wiki page]]. | |||
An exemplary planning document from public health researcher Mika Matsuzaki is [https://canvas.northwestern.edu online in Canavs]. Your diagram will likely be much less complicated than Matsuzaki's. Also, please don't be distracted by the fact that Matsuzaki does public health research. You can (and should!) emulate the form rather than the content. You can also check out [http://ajcn.nutrition.org/content/99/6/1450.full the published paper] to see how the project wound up. | |||
Please note that the Matsuzaki planning document includes everything except a "Measures" section. Your Measures section should include a two column table where column 1 is the name of each variable in your analysis and column 2 describes the operationalization of each measures and (if necessary) how you will create it. | |||
==== Project presentation and paper ==== | ==== Project presentation and paper ==== | ||
Line 156: | Line 149: | ||
;Maximum length: 6000 words (~20 pages) | ;Maximum length: 6000 words (~20 pages) | ||
;Presentation due date: | ;Presentation due date: Thursday, June 6, 2019 | ||
;Maximum length: | ;Maximum length: 12 minutes | ||
''The paper:'' Ideally, I expect you to produce a high quality short research paper that you might revise and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]]. | ''The paper:'' Ideally, I expect you to produce a high quality short research paper that you might revise and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]]. | ||
As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution. | As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. This can happen through Github. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution. | ||
Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work. | Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work. | ||
Line 168: | Line 161: | ||
I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper. | I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper. | ||
I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., | I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., <TODO link> ACM SIGCHI CSCW format or <TODO link> APA 6th edition) that is applicable for a peer-reviewed journal or conference proceedings in which you aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources. | ||
: | '' The presentation:'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (link is in Canvas) may be useful. | ||
=== Grading === | === Grading === | ||
Line 188: | Line 179: | ||
== Note on finding a dataset == | == Note on finding a dataset == | ||
In order to complete your project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, there are many datasets to draw from. | In order to complete your project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, there are many datasets to draw from. Here are some ideas: | ||
* Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze? | * Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze? | ||
Line 196: | Line 187: | ||
* Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets. | * Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets. | ||
* Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. | * Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. | ||
<!--- | <!--- | ||
* <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help. | * <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help. | ||
Line 242: | Line 232: | ||
* Verzani: §1 (Getting Started), §2 (Univariate data) [[https://canvas.northwestern.edu/verzani_ch1-ch2.pdf Available via Canvas]] | * Verzani: §1 (Getting Started), §2 (Univariate data) [[https://canvas.northwestern.edu/verzani_ch1-ch2.pdf Available via Canvas]] | ||
* Verzani: §A (Programming) | * Verzani: §A (Programming) | ||
* Healy: | * Healy: Chapter 2 (and skim the preferatory material as well as Chapter 1) | ||
'''Assignment (Complete before class):''' | '''Assignment (Complete before class):''' | ||
Line 248: | Line 238: | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w01-R_lecture. | * [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w01-R_lecture.Rmd Week 1 R lecture materials] (.Rmd file) | ||
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s01-intro.webm Week 1 screencast (part 1, 23 minutes)] (the video should load directly in browser window) | * [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s01-intro.webm Week 1 screencast (part 1, 23 minutes)] (the video should load directly in browser window) | ||
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s02-intro.webm Week 1 screencast (part 2, 27 minutes)] | * [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s02-intro.webm Week 1 screencast (part 2, 27 minutes)] | ||
Line 257: | Line 247: | ||
=== Week 2: Thursday April 11: Probability and Visualization === | === Week 2: Thursday April 11: Probability and Visualization === | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 2]] | <!---* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 2]]---> | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 266: | Line 255: | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) <!---[[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]] | * Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) <!---[[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]] | ||
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] | * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] | ||
---> | ---> | ||
* Healy: | * Healy: Chapter 3. | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
Line 279: | Line 266: | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w02-R_lecture.Rmd Week 2 R lecture materials] (.Rmd file) | * [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w02-R_lecture.Rmd Week 2 R lecture materials] (.Rmd file) | ||
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w02.webm Week 2 screencast ( | * [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w02.webm Week 2 screencast (20 minutes)] | ||
'''Resources:''' | '''Resources:''' | ||
Line 287: | Line 274: | ||
=== Week 3: Thursday April 18: Distributions === | === Week 3: Thursday April 18: Distributions === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 296: | Line 281: | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §6 (Populations) | * Verzani: §6 (Populations) | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
Line 304: | Line 288: | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~ | <!--- | ||
* [https://communitydata.cc/~ | * [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 3]] | ||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-loading_data_functions_apply_misc.ogv Week 3 R lecture screencast: Loading data, functions; apply(), lapply(), sapply(); several miscellaneous functions] (~34 minutes) — This is the same material I covered in class. If you followed it, there's no reason you need to go back to this. | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-dates_tapply_merge.ogv Week 3 R lecture screencast: Dates; tapply(); and merge()] (~38 minutes) [The audio seems to be broken for the last 10 minutes. Sorry about that! I've rerecorded that below.] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-merge.ogv Week 3 R lecture screencast: merge()] (~13 minutes) [Rerecording of the last few minutes of the previous video.] | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 311: | Line 299: | ||
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2 | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 3]] | |||
=== Week 4: Thursday April 25: Statistical significance and hypothesis testing === | === Week 4: Thursday April 25: Statistical significance and hypothesis testing === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference) | * Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference) | ||
* Reinhart, Chapters 1, 4, and 5 (via Canvas). | |||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §7 (Statistical inference), §8 (Confidence intervals) | * Verzani: §7 (Statistical inference), §8 (Confidence intervals) | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]] | * [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]] | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata.cc/~ | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 4]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_04-misc_confint_simulation-20170125.ogv Week 4 R lecture screencast: order(); confidence intervals; simulations drawn from repeated random samples] (~27 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 336: | Line 325: | ||
* [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4 | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 4]] | |||
=== Week 5: Thursday May 2: Continuous Numeric Data & ANOVA === | === Week 5: Thursday May 2: Continuous Numeric Data & ANOVA === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 345: | Line 333: | ||
* Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) | * Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) | ||
<!---* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF from Hill's website]]---> | <!---* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF from Hill's website]]---> | ||
* Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. | * Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. https://doi.org/10.1016/j.pubrev.2007.05.016 [Available through NU Libraries] | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
Line 357: | Line 344: | ||
'''Lectures:''' | '''Lectures:''' | ||
<!--- | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]] | * [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]] | ||
Line 370: | Line 356: | ||
=== Week 6: Thursday May 9: Categorical data === | === Week 6: Thursday May 9: Categorical data === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §6 | * Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data) | ||
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]] | * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]] | ||
* Reinhart, | * Reinhart, Ch. 9. | ||
'''Recommended Readings: | '''Recommended Readings:''' | ||
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) | * Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) | ||
* Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through NU Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.) | * Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through NU Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.) | ||
Line 387: | Line 371: | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata.cc/~ | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 6]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 395: | Line 381: | ||
=== Week 7: Thursday May 16: Linear Regression === | === Week 7: Thursday May 16: Linear Regression === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression) | * Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression); §8.1-8.3 (Multiple regression) | ||
* OpenIntro eschews a mathematical | * OpenIntro eschews a mathematical instruction to correlation. Look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attentions to the formulas. It's tedious to compute but I'd like to you to at least see what goes into it. | ||
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available | * Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in NU libraries]] | ||
* Reinhart, Ch 8. | |||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §11.1-2 (Linear regression) | * Verzani: §11.1-2 (Linear regression), | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]] | * [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]] | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~ | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 7]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_07-linear_regression.ogv Week 7 R lecture screencast: linear regression] (~42 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 420: | Line 408: | ||
=== Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression === | === Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully. | * [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully. | ||
* ( | * Diez, Barr, and Çetinkaya-Rundel: §8.4 (Multiple and logistic regression) | ||
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via NU libraries]] | |||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
Line 438: | Line 425: | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata. | |||
<!--- | |||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 8]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_08-more_regression_anova_redux.ogv Week 8 R lecture screencast: more on linear regression, including interactions, polynomials, log transformations; anova] (~28 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 446: | Line 437: | ||
* Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R] | * Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R] | ||
=== Week 9: Thursday May 30: | === Week 9: Thursday May 30: TBA === | ||
Reserved for catch-up, supplementary topics, and maybe some final presentations. | |||
'''Required readings:''' | '''Required readings:''' | ||
* Reinhart, | * Reinhart, Ch. 10-11. | ||
=== Week 10: Thursday June 6: Final Presentations === | |||
: | |||
Followed by much rejoicing! | Followed by much rejoicing! |