Editing Statistics and Statistical Programming (Fall 2020)
From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 18: | Line 18: | ||
:Also usually available via chat during "business hours." | :Also usually available via chat during "business hours." | ||
:'''Teaching Assistant:''' [http://nickmvincent.com Nick Vincent] ([mailto:nickvincent@u.northwestern.edu nickvincent@u.northwestern.edu]) | |||
:Office Hours: Monday 10am-12pm and by appointment. I'll try to respond to any asynchronous questions in a timely fashion during "business hours" (9a-5p Central Time), and will also have OH by appointment. I'll respond best to email (above), but am also happy to use Discord for quicker back-and-forth. | ::Office Hours: Monday 10am-12pm and by appointment. I'll try to respond to any asynchronous questions in a timely fashion during "business hours" (9a-5p Central Time), and will also have OH by appointment. I'll respond best to email (above), but am also happy to use Discord for quicker back-and-forth. | ||
:I am happy to try out alternative communication software for OH! | ::I am happy to try out alternative communication software for OH! | ||
<br> | <br> | ||
Line 180: | Line 180: | ||
''' Notes on finding a dataset ''' | |||
In order to complete your final project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, fear not! There are many datasets to draw from. Some ideas are below (please suggest others, provide updated links, or report problems). The teaching team will also be available to help you brainstorm/find resources if needed: | In order to complete your final project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, fear not! There are many datasets to draw from. Some ideas are below (please suggest others, provide updated links, or report problems). The teaching team will also be available to help you brainstorm/find resources if needed: | ||
Line 191: | Line 191: | ||
* Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. | * Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. | ||
* The City of Chicago has one of the best [https://data.cityofchicago.org/ data portal sites] of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring. | * The City of Chicago has one of the best [https://data.cityofchicago.org/ data portal sites] of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring. | ||
<!--- | |||
* <TODO fix/update accordingly> Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help. | |||
--> | |||
* [http://fivethirtyeight.com FiveThirtyEight.com] has published a [https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html GitHub repository and an R package] with pre-processed and cleaned versions of many of the datasets they use for articles published on their website. | * [http://fivethirtyeight.com FiveThirtyEight.com] has published a [https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html GitHub repository and an R package] with pre-processed and cleaned versions of many of the datasets they use for articles published on their website. | ||
* If you interested in studying online communities, there are some great resources for accessing data from Reddit, Wikipedia, and StackExchange. See [https://files.pushshift.io/reddit/ pushshift] for dumps of Reddit data, [https://meta.wikimedia.org/wiki/Research:Data here] for an overview of Wikipedia's data resources, and [https://data.stackexchange.com/ Stack Exchange's data portal]. | * If you interested in studying online communities, there are some great resources for accessing data from Reddit, Wikipedia, and StackExchange. See [https://files.pushshift.io/reddit/ pushshift] for dumps of Reddit data, [https://meta.wikimedia.org/wiki/Research:Data here] for an overview of Wikipedia's data resources, and [https://data.stackexchange.com/ Stack Exchange's data portal]. | ||
==== Research project planning document ==== | ==== Research project planning document ==== | ||
Line 223: | Line 223: | ||
==== Research project paper ==== | ==== Research project paper ==== | ||
;Paper due date: December | ;Paper due date: December 8, 2020, 5pm CT | ||
;Maximum length: 6000 words (~20 pages) | ;Maximum length: 6000 words (~20 pages) | ||
Line 235: | Line 235: | ||
I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., [https://cscw.acm.org/2019/submit-papers.html ACM SIGCHI CSCW format] or [https://www.apastyle.org/index APA 6th edition] ([https://templates.office.com/en-us/APA-style-report-6th-edition-TM03982351 Word] and [https://www.overleaf.com/latex/templates/sample-apa-paper/fswjbwygndyq LaTeX] templates)) that is applicable for a peer-reviewed journal or conference proceedings in which you might aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software like Zotero to handle your bibliographic sources. | I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., [https://cscw.acm.org/2019/submit-papers.html ACM SIGCHI CSCW format] or [https://www.apastyle.org/index APA 6th edition] ([https://templates.office.com/en-us/APA-style-report-6th-edition-TM03982351 Word] and [https://www.overleaf.com/latex/templates/sample-apa-paper/fswjbwygndyq LaTeX] templates)) that is applicable for a peer-reviewed journal or conference proceedings in which you might aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software like Zotero to handle your bibliographic sources. | ||
==== Human subjects research, IRB, and ethics ==== | ==== Human subjects research, IRB, and ethics ==== | ||
Line 316: | Line 317: | ||
'''Recommended''' | '''Recommended''' | ||
* Work through one (or more) introduction(s) to R and Rstudio so that you can complete problem set 0. Here are several suggestions: | * Work through one (or more) introduction(s) to R and Rstudio so that you can complete problem set 0. Here are several suggestions: | ||
** '''From Aaron:''' The [https://communitydata.science/~ads/teaching/2020/stats/r_tutorials/w01-R_tutorial.html Week 01 R tutorial] (you should also download the [https://communitydata.science/~ads/teaching/2020/stats/r_tutorials/w01-R_tutorial. | ** '''From Aaron:''' The [https://communitydata.science/~ads/teaching/2020/stats/r_tutorials/w01-R_tutorial.html Week 01 R tutorial] (you should also download the [https://communitydata.science/~ads/teaching/2020/stats/r_tutorials/w01-R_tutorial.Rmd .Rmd version of the tutorial] that you can open and read/edit in RStudio). These are accompanied by the R and Rstudio intro screencasts ([https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s01-intro.webm Part 1] and [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w01-s02-intro.webm Part 2]) Aaron created for the 2019 version of the course. | ||
** Modern Dive [https://moderndive.netlify.app/index.html Statistical inference via data science] Chapter 1: [https://moderndive.netlify.app/1-getting-started.html Getting started with R]. | ** Modern Dive [https://moderndive.netlify.app/index.html Statistical inference via data science] Chapter 1: [https://moderndive.netlify.app/1-getting-started.html Getting started with R]. | ||
** [https://rladiessydney.org/courses/ryouwithme/ RYouWithMe] course [https://rladiessydney.org/courses/ryouwithme/01-basicbasics-0/ "Basic basics" 1 & 2] (and maybe 3 if you're feeling ambitious). | ** [https://rladiessydney.org/courses/ryouwithme/ RYouWithMe] course [https://rladiessydney.org/courses/ryouwithme/01-basicbasics-0/ "Basic basics" 1 & 2] (and maybe 3 if you're feeling ambitious). | ||
Line 328: | Line 329: | ||
* Read Diez, Çetinkaya-Rundel, and Barr: §1.1-1.3 (Introduction to data). | * Read Diez, Çetinkaya-Rundel, and Barr: §1.1-1.3 (Introduction to data). | ||
* Watch [https://www.youtube.com/playlist?list=PLkIselvEzpM6pZ76FD3NoCvvgkj_p-dE8 Lecture materials for §1.1-3 (Videos 1-4 in the playlist)]. | * Watch [https://www.youtube.com/playlist?list=PLkIselvEzpM6pZ76FD3NoCvvgkj_p-dE8 Lecture materials for §1.1-3 (Videos 1-4 in the playlist)]. | ||
* Complete '''exercises from OpenIntro §1:''' 1.6, 1.9, 1.10, 1.16, 1.21, 1.40, 1.42, 1.43 (and remember that solutions to odd-numbered problems are in the book!) | |||
* Submit, review, and respond to questions or requests for discussion via Discord or some other means. | * Submit, review, and respond to questions or requests for discussion via Discord or some other means. | ||
Line 338: | Line 340: | ||
=== Week 3 (9/29, 10/1) === | === Week 3 (9/29, 10/1) === | ||
==== September 29: R fundamentals: Import, transform, tidy, and describe data ==== | ==== September 29: R fundamentals: Import, transform, tidy, and describe data ==== | ||
'''Required''' | '''Required''' | ||
Line 347: | Line 346: | ||
'''Recommended''' | '''Recommended''' | ||
* [https://communitydata.science/~ads/teaching/2020/stats/r_tutorials/w03-R_tutorial.html Week 3 R tutorial] (note that you can access .rmd or .pdf versions by replacing the suffix of the URL accordingly). | * [https://communitydata.science/~ads/teaching/2020/stats/r_tutorials/w03-R_tutorial.html Week 3 R tutorial] (note that you can access .rmd or .pdf versions by replacing the suffix of the URL accordingly). | ||
<!--- | <!--- | ||
'''Resources''' | '''Resources''' | ||
Line 364: | Line 362: | ||
=== Week 4 (10/6, 10/8) === | === Week 4 (10/6, 10/8) === | ||
==== October 6: Emotional contagion and more advanced R fundamentals: import, tidy, transform, and simulate data; write functions ==== | ==== October 6: Emotional contagion and more advanced R fundamentals: import, tidy, transform, and simulate data; write functions ==== | ||
'''Required''' | '''Required''' | ||
* Read the paper below as well as the attendant [https://www.pnas.org/content/111/29/10779.1 "Expression of editorial concern"] and [https://www.pnas.org/content/111/29/10779.2 "Correction"] that were subsequently appended to it. | * Read the paper below as well as the attendant [https://www.pnas.org/content/111/29/10779.1 "Expression of editorial concern"] and [https://www.pnas.org/content/111/29/10779.2 "Correction"] that were subsequently appended to it. | ||
:Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Open access]] | :Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Open access]] | ||
* Complete | * Complete problem set #2 (due Monday, October 5 at 1pm CT) | ||
''' | '''Resources''' | ||
==== October 8: Distributions ==== | ==== October 8: Distributions ==== | ||
Line 388: | Line 383: | ||
=== Week 5 (10/13, 10/15) === | === Week 5 (10/13, 10/15) === | ||
==== October 13: <Topic> ==== | |||
==== October 13: | |||
'''Required''' | '''Required''' | ||
* Complete | * Complete problem set #3 | ||
''' | '''Resources''' | ||
==== October 15: Foundations for (frequentist) inference ==== | ==== October 15: Foundations for (frequentist) inference ==== | ||
Line 401: | Line 394: | ||
* Watch [https://www.youtube.com/watch?v=oLW_uzkPZGA&list=PLkIselvEzpM4SHQojH116fYAQJLaN_4Xo foundations for inference] (videos 1-3 in the playlist) OpenIntro lectures. | * Watch [https://www.youtube.com/watch?v=oLW_uzkPZGA&list=PLkIselvEzpM4SHQojH116fYAQJLaN_4Xo foundations for inference] (videos 1-3 in the playlist) OpenIntro lectures. | ||
* Complete [https://www.openintro.org/book/stat/why05/ Why .05?] OpenIntro video/exercise. | * Complete [https://www.openintro.org/book/stat/why05/ Why .05?] OpenIntro video/exercise. | ||
* Complete '''exercises from OpenIntro §5:''' | * Complete '''exercises from OpenIntro §5:'''' | ||
'''Resources''' | '''Resources''' | ||
Line 408: | Line 401: | ||
=== Week 6 (10/20, 10/22) === | === Week 6 (10/20, 10/22) === | ||
==== October 20: <Topic> ==== | |||
==== October 20: | |||
'''Required''' | '''Required''' | ||
* Complete | * Complete problem set #4 | ||
* Revisit the Kramer et al. (2014) paper we read a few weeks ago: | * Revisit the Kramer et al. (2014) paper we read a few weeks ago: | ||
:Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Open access]] | :Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Open access]] | ||
'''Resources''' | |||
==== October 22: Inference for categorical data ==== | ==== October 22: Inference for categorical data ==== | ||
Line 420: | Line 412: | ||
* Read Diez, Çetinkaya-Rundel, and Barr: §6 (Inference for categorical data). | * Read Diez, Çetinkaya-Rundel, and Barr: §6 (Inference for categorical data). | ||
* Watch [https://www.youtube.com/watch?list=PLkIselvEzpM5Gn-sHTw1NF0e8IvMxwHDW&v=_iFAZgpWsx0 inference for categorical data] (videos 1-3 in the playlist) OpenIntro lectures. | * Watch [https://www.youtube.com/watch?list=PLkIselvEzpM5Gn-sHTw1NF0e8IvMxwHDW&v=_iFAZgpWsx0 inference for categorical data] (videos 1-3 in the playlist) OpenIntro lectures. | ||
* Complete '''exercises from OpenIntro §6:''' | * Complete '''exercises from OpenIntro §6:'''' | ||
'''Resources''' | '''Resources''' | ||
Line 426: | Line 418: | ||
=== Week 7 (10/27, 10/29) === | === Week 7 (10/27, 10/29) === | ||
==== October 27: <Topics> ==== | |||
==== October 27: | |||
'''Required''' | '''Required''' | ||
* Complete problem set #5 | |||
* Complete | |||
'''Resources''' | '''Resources''' | ||
==== October 29: Inference for numerical data (part 1) ==== | ==== October 29: Inference for numerical data (part 1) ==== | ||
Line 442: | Line 427: | ||
* Read Diez, Çetinkaya-Rundel, and Barr: §7.1-3 (Inference for numerical data: differences of means). | * Read Diez, Çetinkaya-Rundel, and Barr: §7.1-3 (Inference for numerical data: differences of means). | ||
* Watch [https://www.youtube.com/watch?list=PLkIselvEzpM5G3IO1tzQ-DUThsJKQzQCD&v=uVEj2uBJfq0 inference for numerical data] (videos 1-4 in the playlist) OpenIntro lectures (and featuring one of the textbook authors!). | * Watch [https://www.youtube.com/watch?list=PLkIselvEzpM5G3IO1tzQ-DUThsJKQzQCD&v=uVEj2uBJfq0 inference for numerical data] (videos 1-4 in the playlist) OpenIntro lectures (and featuring one of the textbook authors!). | ||
* Complete '''exercises from OpenIntro §7:''' | * Complete '''exercises from OpenIntro §7:'''' | ||
'''Resources''' | '''Resources''' | ||
* [https://gallery.shinyapps.io/CLT_mean/ OpenIntro Central | * [https://gallery.shinyapps.io/CLT_mean/ OpenIntro Central liumit theorem for means demo]. | ||
==== October 30: [[#Research project planning document|Research project planning document]] due 5pm CT==== | ==== October 30: [[#Research project planning document|Research project planning document]] due 5pm CT==== | ||
* Submit via [https://canvas.northwestern.edu/courses/122522/assignments | * Submit via [https://canvas.northwestern.edu/courses/122522/assignments Canvas] (due by 5pm CT) | ||
=== Week 8 (11/3, 11/5) === | === Week 8 (11/3, 11/5) === | ||
==== November 3: | ==== November 3: Self-assessment exercise (no class meeting) ==== | ||
'''Election Day (U.S.): No class meeting today''' | |||
==== November 5: Inference for numerical data (part 2) ==== | ==== November 5: Inference for numerical data (part 2) ==== | ||
Line 460: | Line 443: | ||
* Read Diez, Çetinkaya-Rundel, and Barr: §7.4-5 (Inference for numerical data: power calculations, ANOVA, and multiple comparisons). | * Read Diez, Çetinkaya-Rundel, and Barr: §7.4-5 (Inference for numerical data: power calculations, ANOVA, and multiple comparisons). | ||
* Watch [https://www.youtube.com/watch?list=PLkIselvEzpM5G3IO1tzQ-DUThsJKQzQCD&v=uVEj2uBJfq0 inference for numerical data] (videos 4-8 in the playlist) OpenIntro lectures (and featuring one of the textbook authors!). | * Watch [https://www.youtube.com/watch?list=PLkIselvEzpM5G3IO1tzQ-DUThsJKQzQCD&v=uVEj2uBJfq0 inference for numerical data] (videos 4-8 in the playlist) OpenIntro lectures (and featuring one of the textbook authors!). | ||
* Complete '''exercises from OpenIntro §7:''' | * Complete '''exercises from OpenIntro §7:'''' | ||
'''Resources''' | '''Resources''' | ||
Line 466: | Line 449: | ||
=== Week 9 (11/10, 11/12) === | === Week 9 (11/10, 11/12) === | ||
==== November 10: | ==== November 10: <Topic> ==== | ||
'''Required''' | '''Required''' | ||
* Complete | * Complete problem set #6 | ||
'''Resources''' | '''Resources''' | ||
==== November 12: Linear regression ==== | ==== November 12: Linear regression ==== | ||
Line 480: | Line 460: | ||
* Watch [https://www.youtube.com/playlist?list=PLkIselvEzpM63ikRfN41DNIhSgzboELOM linear regression] (videos 1-4 in the playlist) OpenIntro lectures. | * Watch [https://www.youtube.com/playlist?list=PLkIselvEzpM63ikRfN41DNIhSgzboELOM linear regression] (videos 1-4 in the playlist) OpenIntro lectures. | ||
* Read [https://www.openintro.org/go/?id=stat_more_inference_for_linear_regression&referrer=/book/os/index.php More inference for linear regression] (OpenIntro supplement). | * Read [https://www.openintro.org/go/?id=stat_more_inference_for_linear_regression&referrer=/book/os/index.php More inference for linear regression] (OpenIntro supplement). | ||
* Complete '''exercises from OpenIntro §8:''' | * Complete '''exercises from OpenIntro §8:'''' | ||
* Complete '''exercises from OpenIntro supplement:''' | * Complete '''exercises from OpenIntro supplement:'''' | ||
'''Resources''' | '''Resources''' | ||
Line 487: | Line 467: | ||
=== Week 10 (11/17, 11/19) === | === Week 10 (11/17, 11/19) === | ||
==== November 17: <Topic> ==== | |||
==== November 17: | |||
'''Required''' | '''Required''' | ||
* Complete | * Complete Problem set #7 | ||
'''Resources''' | '''Resources''' | ||
==== November 19: Multiple and logistic regression ==== | ==== November 19: Multiple and logistic regression ==== | ||
'''Required''' | '''Required''' | ||
Line 501: | Line 480: | ||
* Read [https://www.openintro.org/go/?id=stat_interaction_terms&referrer=/book/os/index.php Interaction terms] (OpenIntro supplement). | * Read [https://www.openintro.org/go/?id=stat_interaction_terms&referrer=/book/os/index.php Interaction terms] (OpenIntro supplement). | ||
* Read [https://www.openintro.org/go/?id=stat_nonlinear_relationships&referrer=/book/os/index.php Fitting models for non-linear trends] (OpenIntro supplement). | * Read [https://www.openintro.org/go/?id=stat_nonlinear_relationships&referrer=/book/os/index.php Fitting models for non-linear trends] (OpenIntro supplement). | ||
* Complete '''exercises from OpenIntro §9:''' | * Complete '''exercises from OpenIntro §9:'''' | ||
* Complete '''exercises from OpenIntro supplements:'''' | |||
'''Resources''' | '''Resources''' | ||
=== Week 11 (11/24) === | === Week 11 (11/24) === | ||
==== November 24: | ==== November 24: <Topic> and assessment ==== | ||
'''Required''' | '''Required''' | ||
* Complete | * Complete Problem set #8 | ||
* Complete [https://apps3.cehd.umn.edu/artist/user/scale_select.html post-course assessment of statistical concepts] (access code TBA VIA email). '''Submission deadline: December 1, 11:00pm Chicago time''' | |||
'''Resources''' | '''Resources''' | ||
* Mako Hill created | * Mako Hill created an example of [https://communitydata.science/~mako/2017-COM521/logistic_regression_interpretation.html interpreting logistic regression coefficients with examples in R] | ||
=== Week 12+ === | === Week 12+ === | ||
==== December 3: [[#Research project presentation|Research project presentation]] due by 5pm CT ==== | ==== December 3: [[#Research project presentation|Research project presentation]] due by 5pm CT ==== | ||
==== December 10: [[#Research project paper|Research project paper]] due by 5pm CT ==== | ==== December 10: [[#Research project paper|Research project paper]] due by 5pm CT ==== | ||
== Credit and Notes == | == Credit and Notes == | ||
This syllabus has, in ways that should be obvious, borrowed and built on the [https://www.openintro.org/stat/index.php OpenInto Statistics curriculum]. Most aspects of this course design extend Benjamin Mako Hill's [[Statistics_and_Statistical_Programming_(Winter_2017)|COM 521 class]] from the University of Washington as well as a [[Statistics_and_Statistical_Programming_(Spring_2019)|prior iteration of the same course]] offered at Northwestern in Spring 2019. | This syllabus has, in ways that should be obvious, borrowed and built on the [https://www.openintro.org/stat/index.php OpenInto Statistics curriculum]. Most aspects of this course design extend Benjamin Mako Hill's [[Statistics_and_Statistical_Programming_(Winter_2017)|COM 521 class]] from the University of Washington as well as a [[Statistics_and_Statistical_Programming_(Spring_2019)|prior iteration of the same course]] offered at Northwestern in Spring 2019. |