Editing Statistics and Statistical Programming (Spring 2019)
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
:'''Statistics and Statistical Programming''' | :'''Statistics and Statistical Programming''' | ||
: | :'''MTS 525''' Media, Technology & Society | ||
: | :'''Northwestern University''' Spring 2019 | ||
:'''Instructor:''' [http://aaronshaw.org Aaron Shaw] ([https://communication.northwestern.edu/faculty/AaronShaw Northwestern University]) | |||
:'''Instructor:''' [http://aaronshaw.org Aaron Shaw] ( | |||
:'''Course Websites''': | :'''Course Websites''': | ||
:* We will use [https://canvas.northwestern.edu/courses/90927 Canvas] for [https://canvas.northwestern.edu/courses/90927/announcements announcements], [https://canvas.northwestern.edu/courses/90927/assignments turning in | :* We will use [https://canvas.northwestern.edu/courses/90927 Canvas] for [https://canvas.northwestern.edu/courses/90927/announcements announcements], [https://canvas.northwestern.edu/courses/90927/assignments turning in some assignments], and [https://canvas.northwestern.edu/courses/90927/discussion_topics discussions]. | ||
:* Everything else will be linked on this page. | :* Everything else will be linked on this page. | ||
<!---:* List of student git repositories (will be a link)---> | <!---:* List of student git repositories (will be a link)---> | ||
Line 66: | Line 55: | ||
I will also assigning several chapters from the following: | I will also assigning several chapters from the following: | ||
* Reinhart, Alex. 2015. ''Statistics Done Wrong: The Woefully Complete Guide''. SF, CA: No Starch Press. ([https:// | * Reinhart, Alex. 2015. ''Statistics Done Wrong: The Woefully Complete Guide''. SF, CA: No Starch Press. ([https://www.safaribooksonline.com/library/view/statistics-done-wrong/9781457189845/ Safari online via NU libraries]) | ||
This book provides a conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the NU library, but you may find it helpful to purchase. | This book provides a conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the NU library, but you may find it helpful to purchase. | ||
Line 83: | Line 72: | ||
* [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution. | * [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution. | ||
* [https://ggplot2.tidyverse.org/ ggplot2 documentation] — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it. | * [https://ggplot2.tidyverse.org/ ggplot2 documentation] — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it. | ||
== Assignments == | == Assignments == | ||
Line 101: | Line 85: | ||
* '''Empirical paper questions''' about other assigned readings. | * '''Empirical paper questions''' about other assigned readings. | ||
You should submit your solutions to the programming challenges | You should submit your solutions to the programming challenges ahead of each class session. While I will not grade them, we will spend a good chunk of class going through the answers to the assignment due on that day. | ||
Because randomness is extremely important in statistics, I will use a small R program to '''randomly call on''' students to walk through your answer to statistics questions and empirical paper questions in class. We'll then discuss the answers, address points of confusion, and consider alternative approaches as a group. | Because randomness is extremely important in statistics, I will use a small R program to '''randomly call on''' students to walk through your answer to statistics questions and empirical paper questions in class. We'll then discuss the answers, address points of confusion, and consider alternative approaches as a group. | ||
Line 107: | Line 91: | ||
For the programming challenges, you should submit code for your solutions before class (more on how in a moment) so we can walk through the material together. If you get completely stuck on a problem, that's okay, but please share whatever code you have so that you can tell us what you did and what you were thinking. | For the programming challenges, you should submit code for your solutions before class (more on how in a moment) so we can walk through the material together. If you get completely stuck on a problem, that's okay, but please share whatever code you have so that you can tell us what you did and what you were thinking. | ||
Coming to class will be profoundly important to learning the material and to your final grade. Although the problem sets will not be graded, it is critical that you be present and able to discuss your answers to each of the questions. Your ability to do so will figure prominently in your participation grade for the course (40% of your final grade). | Coming to class will be profoundly important to learning the material and to your final grade. Although the problem sets will not be graded, it is critical that you be present and able to discuss your answers to each of the questions. Your ability to do so will figure prominently in your participation grade for the course (40% of your final grade). More on | ||
I strongly encourage you to form groups to work on the problem sets if you find that helpful; however, you must still submit your work individually and respond to my cold-call prompts in class individually to help ensure that you learn and understand the material. | I strongly encourage you to form groups to work on the problem sets if you find that helpful; however, you must still submit your work individually and respond to my cold-call prompts in class individually to help ensure that you learn and understand the material. | ||
Line 142: | Line 126: | ||
;Due date: Thursday, May 16, 2019 | ;Due date: Thursday, May 16, 2019 | ||
;Maximum length: | ;Maximum length: 5 pages | ||
The project | The project planing document is a basic shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) Null hypotheses; (d) Conceptual diagram and/or explanation of the relationship you plan to test; (e) Measures; (e) Dummy tables. Descriptions of each of these planning document section are available [[TODO-planningdoc|on this wiki page]]. | ||
An exemplary planning document from public health researcher Mika Matsuzaki is [https://canvas.northwestern.edu online in Canavs]. Your diagram will likely be much less complicated than Matsuzaki's. Also, please don't be distracted by the fact that Matsuzaki does public health research. You can (and should!) emulate the form rather than the content. You can also check out [http://ajcn.nutrition.org/content/99/6/1450.full the published paper] to see how the project wound up. | |||
Please note that the Matsuzaki planning document includes everything except a "Measures" section. Your Measures section should include a two column table where column 1 is the name of each variable in your analysis and column 2 describes the operationalization of each measures and (if necessary) how you will create it. | |||
==== Project presentation and paper ==== | ==== Project presentation and paper ==== | ||
Line 156: | Line 139: | ||
;Maximum length: 6000 words (~20 pages) | ;Maximum length: 6000 words (~20 pages) | ||
;Presentation due date: | ;Presentation due date: Thursday, June 6, 2019 | ||
;Maximum length: | ;Maximum length: 12 minutes | ||
''The paper:'' Ideally, I expect you to produce a high quality short research paper that you might revise and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]]. | ''The paper:'' Ideally, I expect you to produce a high quality short research paper that you might revise and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]]. | ||
As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution. | As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. This can happen through Github. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution. | ||
Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work. | Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work. | ||
Line 168: | Line 151: | ||
I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper. | I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper. | ||
I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., | I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., <TODO link> ACM SIGCHI CSCW format or <TODO link> APA 6th edition) that is applicable for a peer-reviewed journal or conference proceedings in which you aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources. | ||
'' | '' The presentation:'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (link is in Canvas) may be useful. | ||
=== Grading === | === Grading === | ||
Line 184: | Line 165: | ||
* Final project paper: 40% | * Final project paper: 40% | ||
My assessment of your paper will reflect the clarity of the written work, the effective execution and presentation of quantitative empirical analysis, as well as the quality and originality of the analysis. Throughout the quarter, we will talk a lot about the qualities of exemplary quantitative research. I expect your final project to embody these exemplary qualities. | My assessment of your paper will reflect the clarity of the written work, the effective execution and presentation of quantitative empirical analysis, as well as the quality and originality of the analysis. Throughout the quarter, we will talk a lot about the qualities of exemplary quantitative research. I expect your final project to embody these exemplary qualities. | ||
== Note on finding a dataset == | == Note on finding a dataset == | ||
In order to complete your project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, there are many datasets to draw from. | In order to complete your project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, there are many datasets to draw from. Here are some ideas: | ||
* Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze? | * Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze? | ||
Line 196: | Line 177: | ||
* Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets. | * Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets. | ||
* Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. | * Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. | ||
<!--- | <!--- | ||
* <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help. | * <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help. | ||
Line 214: | Line 194: | ||
When it comes to the statistics material, this will mostly be a so-called "flipped" classroom. This means we will rely on the textbook and other resources to introduce the material and we will use the class sessions to discuss questions as they come up. | When it comes to the statistics material, this will mostly be a so-called "flipped" classroom. This means we will rely on the textbook and other resources to introduce the material and we will use the class sessions to discuss questions as they come up. | ||
Although the day-to-day routine will vary, each class session will generally include the following: | Although the day-to-day routine will vary, each class session will generally include the following: | ||
* Quick updates about assignments, projects, and meta-discussion about the class. | * Quick updates about assignments, projects, and meta-discussion about the class. | ||
* Discussion of '''programming challenges''' due that day | * Discussion of '''programming challenges''' due that day. | ||
* Discussion of '''statistics questions''' related to new material in Diez, Barr, and Çetinkaya-Rundel. | * Discussion of '''statistics questions''' related to new material in Diez, Barr, and Çetinkaya-Rundel. | ||
* Discussion of any exemplary empirical paper we have read and the '''empirical paper questions'''. | * Discussion of any exemplary empirical paper we have read and the '''empirical paper questions'''. | ||
Line 229: | Line 207: | ||
=== Week 1: Thursday April 4: Introduction, Setup, and Data and Variables === | === Week 1: Thursday April 4: Introduction, Setup, and Data and Variables === | ||
Please complete the readings prior to class so that we can discuss them and start talking through some of the examples in R together. | |||
Please complete the readings | |||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) | * Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) | ||
* Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. | * Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Available through NU libraries]] | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
Line 242: | Line 218: | ||
* Verzani: §1 (Getting Started), §2 (Univariate data) [[https://canvas.northwestern.edu/verzani_ch1-ch2.pdf Available via Canvas]] | * Verzani: §1 (Getting Started), §2 (Univariate data) [[https://canvas.northwestern.edu/verzani_ch1-ch2.pdf Available via Canvas]] | ||
* Verzani: §A (Programming) | * Verzani: §A (Programming) | ||
* Healy: | * Healy: Chapter 2 (and skim the preferatory material as well as Chapter 1) | ||
'''Assignment (Complete before class):''' | '''Assignment (Complete before class):''' | ||
Line 248: | Line 224: | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w01- | * [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w01-introduction.zip Week 1 R lecture materials] (.zip file) | ||
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s01-intro.webm Week 1 screencast (part 1, 23 minutes)] (the video should load directly in browser window) | * [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s01-intro.webm Week 1 screencast (part 1, 23 minutes)] (the video should load directly in browser window) | ||
* [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s02-intro.webm Week 1 screencast (part 2, 27 minutes)] | * [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s02-intro.webm Week 1 screencast (part 2, 27 minutes)] | ||
Line 255: | Line 231: | ||
* [https://www.openintro.org/download.php?file=os3_slides_01&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §1 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_01&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §1 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including some for §1 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including some for §1 | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 1]] | |||
=== Week 2: Thursday April 11: Probability and Visualization === | === Week 2: Thursday April 11: Probability and Visualization === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §2 (Probability) | * Diez, Barr, and Çetinkaya-Rundel: §2 (Probability) | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) <!---[[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]] | * Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) <!---[[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]] | ||
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] | * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] | ||
---> | ---> | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
Line 278: | Line 249: | ||
'''Lectures:''' | '''Lectures:''' | ||
* [ | <!-- | ||
* [https://communitydata.cc/~ | * [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 2]] | ||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_02-lists_dataframes_graphing-20170111.ogv Week 2 R lecture screencast: lists, matrixes, data frames, and beginning graphing] (~1 hour 8 minutes) | |||
--> | |||
'''Resources:''' | '''Resources:''' | ||
* [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2 | * [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2 | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 2]] | |||
=== Week 3: Thursday April 18: Distributions === | === Week 3: Thursday April 18: Distributions === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 296: | Line 267: | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §6 (Populations) | * Verzani: §6 (Populations) | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
Line 304: | Line 274: | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~ | <!--- | ||
* [https://communitydata.cc/~ | * [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 3]] | ||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-loading_data_functions_apply_misc.ogv Week 3 R lecture screencast: Loading data, functions; apply(), lapply(), sapply(); several miscellaneous functions] (~34 minutes) — This is the same material I covered in class. If you followed it, there's no reason you need to go back to this. | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-dates_tapply_merge.ogv Week 3 R lecture screencast: Dates; tapply(); and merge()] (~38 minutes) [The audio seems to be broken for the last 10 minutes. Sorry about that! I've rerecorded that below.] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-merge.ogv Week 3 R lecture screencast: merge()] (~13 minutes) [Rerecording of the last few minutes of the previous video.] | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 311: | Line 285: | ||
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2 | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 3]] | |||
=== Week 4: Thursday April 25: Statistical significance and hypothesis testing === | === Week 4: Thursday April 25: Statistical significance and hypothesis testing === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 321: | Line 295: | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §7 (Statistical inference), §8 (Confidence intervals) | * Verzani: §7 (Statistical inference), §8 (Confidence intervals) | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]] | * [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]] | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata.cc/~ | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 4]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_04-misc_confint_simulation-20170125.ogv Week 4 R lecture screencast: order(); confidence intervals; simulations drawn from repeated random samples] (~27 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 336: | Line 310: | ||
* [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4 | ||
* [[Statistics and Statistical Programming (Spring 2019)/Session plan: Week 4]] | |||
=== Week 5: Thursday May 2: Continuous Numeric Data & ANOVA === | === Week 5: Thursday May 2: Continuous Numeric Data & ANOVA === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) | * Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) | ||
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF from Hill's website]] | |||
* Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. | * Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. https://doi.org/10.1016/j.pubrev.2007.05.016 [Available through UW Libraries] | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §9 (significance tests), §12 (Analysis of variance) | * Verzani: §9 (significance tests), §12 (Analysis of variance) | ||
* Gelman, Andrew and Hal Stern. 2006. “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant.” ''The American Statistician'' 60(4):328–31. [[http://dx.doi.org/10.1198/000313006X152649 Available through | * Gelman, Andrew and Hal Stern. 2006. “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant.” ''The American Statistician'' 60(4):328–31. [[http://dx.doi.org/10.1198/000313006X152649 Available through UW Libraries]] | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
Line 357: | Line 329: | ||
'''Lectures:''' | '''Lectures:''' | ||
<!--- | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]] | * [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 5]] | ||
Line 370: | Line 341: | ||
=== Week 6: Thursday May 9: Categorical data === | === Week 6: Thursday May 9: Categorical data === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §6 | * Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data) | ||
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on | * Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through UW Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.) | ||
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] | |||
'''Recommended Readings: | '''Recommended Readings:''' | ||
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) | * Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
Line 387: | Line 355: | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata.cc/~ | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 6]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 395: | Line 365: | ||
=== Week 7: Thursday May 16: Linear Regression === | === Week 7: Thursday May 16: Linear Regression === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression) | * Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression); §8.1-8.3 (Multiple regression) | ||
* OpenIntro eschews a mathematical | * OpenIntro eschews a mathematical instruction to correlation. Look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attentions to the formulas. It's tedious to compute but I'd like to you to at least see what goes into it. | ||
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available | * Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in UW libraries]] | ||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
* Verzani: §11.1-2 (Linear regression) | * Verzani: §11.1-2 (Linear regression), | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]] | * [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]] | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~ | <!--- | ||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 7]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_07-linear_regression.ogv Week 7 R lecture screencast: linear regression] (~42 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 420: | Line 391: | ||
=== Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression === | === Week 8: Thursday May 23: Polynomial Terms, Interactions, and Logistic Regression === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully. | * [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully. | ||
* ( | * Diez, Barr, and Çetinkaya-Rundel: §8.4 (Multiple and logistic regression) | ||
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in UW libraries]] | |||
'''Recommended Readings:''' | '''Recommended Readings:''' | ||
Line 438: | Line 408: | ||
'''Lectures:''' | '''Lectures:''' | ||
*[https://communitydata. | |||
<!--- | |||
* [[Statistics and Statistical Programming (Spring 2019)/R lecture outline: Week 8]] | |||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_08-more_regression_anova_redux.ogv Week 8 R lecture screencast: more on linear regression, including interactions, polynomials, log transformations; anova] (~28 minutes) | |||
---> | |||
'''Resources:''' | '''Resources:''' | ||
Line 444: | Line 418: | ||
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4 | ||
* Mako | * Mako wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R] | ||
=== Week 9: Thursday May 30: TBA === | |||
Reserved for catch-up, supplementary topics, and maybe some final presentations. | |||
=== Week 10: Thursday June 6: | === Week 10: Thursday June 6: Final Presentations === | ||
Followed by much rejoicing! | Followed by much rejoicing! | ||
Line 511: | Line 466: | ||
I receive too much email and I sometimes fail to keep up. If, for some reason, I do not respond to a message related to this course within 48 hours, please do not take it personally and feel free to re-send the message with a polite reminder. This will help me and I will not resent you for it. | I receive too much email and I sometimes fail to keep up. If, for some reason, I do not respond to a message related to this course within 48 hours, please do not take it personally and feel free to re-send the message with a polite reminder. This will help me and I will not resent you for it. | ||
=== Office Hours === | |||
TBA. | |||
=== Credit and Notes === | === Credit and Notes === | ||
This syllabus has, in ways that should be obvious, borrowed and built on the [https://www.openintro.org/stat/index.php OpenInto Statistics curriculum]. I also based nearly every aspect of the course design on Benjamin Mako Hill's [[Statistics_and_Statistical_Programming_(Winter_2017)|COM 521 class]]. | This syllabus has, in ways that should be obvious, borrowed and built on the [https://www.openintro.org/stat/index.php OpenInto Statistics curriculum]. I also based nearly every aspect of the course design on Benjamin Mako Hill's [[Statistics_and_Statistical_Programming_(Winter_2017)|COM 521 class]]. |