Latest revision |
Your text |
Line 1: |
Line 1: |
| | DRAFT -- IN PROGRESS -- IGNORE |
|
| |
|
| Welcome to the StatsGaps StudyGroup page -- a set of suggested learning pathways making use of course resources produced by community data faculty, meant to be used by folks who have a mixture of familiarity and non-familiarity with R, statistics, and research processes. The primary text is: [https://www.openintro.org/download.php?file=os3&referrer=/stat/textbook.php Open Intro to Statistics] | | Welcome to the StatsGaps StudyGroup page -- a set of suggested learning pathways making use of course resources produced by community data faculty, meant to be used by folks who have a mixture of familiarity and non-familiarity with R, statistics, and research processes. The primary text is: [[https://www.openintro.org/download.php?file=os3&referrer=/stat/textbook.php Open Intro to Statistics]] |
|
| |
|
| We borrow heavily from the course most recently taught by Aaron: [[Statistics_and_Statistical_Programming_(Spring_2019)]] | | We borrow heavily from the course most recently taught by Aaron: [[https://wiki.communitydata.science/Statistics_and_Statistical_Programming_(Spring_2019)]] |
| | |
| To support participation from people with ranging prior experiences and learning goals for this summer, we've organized this content into the following strands:
| |
|
| |
|
| | Follow the strand(s) that apply to you: |
| * '''Learn R''' -- you don't know R | | * '''Learn R''' -- you don't know R |
| * '''Learn Stats''' -- you haven't taken much if any statistics, or otherwise feel you're mostly starting from scratch | | * '''Learn Stats''' -- you haven't taken much if any statistics, or otherwise feel you're mostly starting from scratch |
| * '''Refresh''' -- overview and shore up your stats knowledge if it feels rusty | | * '''Refresh''' -- overview and shore up your stats knowledge if it feels rusty |
| * '''Stronger''' -- your stats knowledge is strong but your class stopped before you got to good stuff you see used in lots of the papers in this group (e.g. regression) | | * '''Stronger''' -- your stats knowledge is strong but your class stopped before you got to good stuff you see used in lots of the papers in this group, like regression) |
| | | |
| Depending on which strand(s) best apply to you, we provide different recommended readings and assignments each week.
| |
| | |
| Meet at http://meet.jit.si/cdsc on Monday at 11:00 Pacific, 1:00 Central.
| |
| | |
| Schedule:
| |
| * [[CommunityData:StatsGaps#Week 1|7/15 -- Discuss weeks 1, 2, and 3]]
| |
| * [[CommunityData:StatsGaps#Week 4|7/22 -- Discuss weeks 4 and 5]]
| |
| * [[CommunityData:StatsGaps#Week 6|7/29 -- Discuss week 6]]
| |
| * [[CommunityData:StatsGaps#Week 7|8/5 -- Discuss week 7]]
| |
| * [[CommunityData:StatsGaps#Week 8|8/12 -- Discuss week 8]]
| |
| * [[CommunityData:StatsGaps#Week 9|8/19 -- Office Hours]]
| |
| * [[CommunityData:StatsGaps#Week 10|8/26 -- Office Hours]]
| |
| | |
| === Week 1 === | | === Week 1 === |
|
| |
|
| All: | | All: |
| * Read: '''Preventing harassment and increasing group participation through social norms in 2,190 online science discussions''' J. Nathan Matias | | * Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks. ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Open Access]] ''If you haven't read this, you should--also, check the 'Editorial expression of concern' at the top and toss "kramer guillory hancock 2014" if you want to see the firestorm that this article created.-khc'' |
| PNAS May 14, 2019 116 (20) 9785-9789; first published April 29, 2019 -- [https://doi.org/10.1073/pnas.1813486116 Mattias--Harassment Prevention]
| |
|
| |
|
| Learn R: | | Learn R: |
Line 37: |
Line 23: |
| Learn Stats: | | Learn Stats: |
| * Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) | | * Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) |
| * [[CommunityData:StatsGaps_PS1|Do Problem Set 1]] | | * [[https://wiki.communitydata.science/Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_1 Do Problem Set 1]] |
|
| |
|
| Refresh: | | Refresh: |
| * Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) | | * Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) |
| * [[CommunityData:StatsGaps_PS1|Read Problem Set 1]] | | * [[https://wiki.communitydata.science/Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_1 Read Problem Set 1]] |
|
| |
|
| Stronger: | | Stronger: |
| * [[CommunityData:StatsGaps_PS1|Skim Problem Set 1]] -- since we may discuss it f2f. Take a look at the text's Chapter 1 if you find any of the questions to be confusing or the answer you came up with is different than the key. | | * [[https://wiki.communitydata.science/Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_1 Skim Problem Set 1]] -- since we may discuss it f2f. Take a look at the text if you find any of the questions to be confusing or the answer you came up with is different than the key. |
|
| |
|
|
| |
|
Line 50: |
Line 36: |
| * [https://www.openintro.org/download.php?file=os3_slides_01&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §1 Lecture Notes] | | * [https://www.openintro.org/download.php?file=os3_slides_01&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §1 Lecture Notes] |
| * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including some for §1 | | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including some for §1 |
|
| |
| === Week 2: Probability and Visualization ===
| |
|
| |
| All:
| |
| * Shaw, Aaron and Yochai Benkler. 2012. A tale of two blogospheres: Discursive practices on the left and right. ''American Behavioral Scientist''. 56(4): 459-487. [https://doi.org/10.1177%2F0002764211433793]
| |
|
| |
| Learn R:
| |
| * [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w02-R_lecture.Rmd Week 2 R lecture materials] (.Rmd file)
| |
| * [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w02.webm Week 2 screencast (17 minutes)]
| |
|
| |
| Learn Stats:
| |
| * Diez, Barr, and Çetinkaya-Rundel: §2 (Probability)
| |
| * Do problem set 2 -- [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 2]]
| |
|
| |
| Refresh:
| |
| * Diez, Barr, and Çetinkaya-Rundel: §2 (Probability)
| |
| * Read problem set 2 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 2]]
| |
|
| |
| Stronger:
| |
| * Skim problem set 2 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 2]]
| |
|
| |
|
| |
| '''Extra Resources:'''
| |
| * [https://seeing-theory.brown.edu/ Seeing Theory] §1 (Basic Probability) and §2 (Compound Probability). (Note: this site provides a beautiful visual introduction to core concepts in probability and statistics).
| |
| * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF]
| |
| * [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes]
| |
| * [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2
| |
|
| |
| === Week 3: Distributions ===
| |
|
| |
| All: (N/A)
| |
|
| |
| Learn R:
| |
| * [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w03-R_lecture.Rmd Week 3 R lecture materials] (.Rmd file)
| |
| * [https://communitydata.cc/~ads/teaching/2019/stats/screencasts/w03.webm Week 3 screencast (19 minutes)]
| |
|
| |
| Learn Stats:
| |
| * Read Diez, Barr, and Çetinkaya-Rundel: §3.1-3.2, §3.4 (Aaron says: You should read the rest of the chapter (§3.3 and §3.5). I won't assign problem set questions about it but it's still important to be familiar with.)
| |
| * Do Problem Set 3 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 3]]
| |
|
| |
| Refresh:
| |
| * Read Problem Set 3 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 3]]
| |
|
| |
| Stronger:
| |
| * Skim Problem Set 3 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 3]]
| |
|
| |
|
| |
| '''Extra Resources:'''
| |
| * [https://seeing-theory.brown.edu/ Seeing Theory] §3 (Probability Distributions).
| |
| * [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes]
| |
| * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2
| |
|
| |
| === Week 4: Statistical significance and hypothesis testing ===
| |
|
| |
| All:
| |
| * Read Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference) (''I suggest everyone read this chapter -- this topic is a source of much confusion. -khc'')
| |
| * Gelman, Andrew and Hal Stern. 2006. “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant.” ''The American Statistician'' 60(4):328–31. [[http://dx.doi.org/10.1198/000313006X152649 Available via your library]]
| |
|
| |
| Learn R: N/A
| |
|
| |
| Learn Stats:
| |
| * Do [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]
| |
|
| |
| Refresh:
| |
| * Read Problem Set 4 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]
| |
|
| |
| Stronger:
| |
| * Skim Problem Set 4 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]
| |
|
| |
| '''Resources:'''
| |
| *[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w04-R_lecture.Rmd Week 4 R lecture materials] (.Rmd file)
| |
| * [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes]
| |
| * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4
| |
| * Verzani: §7 (Statistical inference), §8 (Confidence intervals)
| |
| * [https://seeing-theory.brown.edu/ Seeing Theory] §4 (Frequentist Inference)
| |
|
| |
| === Week 5: Continuous Numeric Data & ANOVA ===
| |
|
| |
| All:
| |
| * Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. [[https://doi.org/10.1016/j.pubrev.2007.05.016 Available through NU Libraries]]
| |
|
| |
| Learn R:
| |
| * [https://communitydata.cc/~mako/2017-COM521/com521-week_05-ttests_and_anova.ogv Week 5 R lecture screencast: t-tests] (~22 minutes)
| |
| * [https://communitydata.cc/~mako/2017-COM521/com521-week_05-for_if.ogv Week 5 R lecture screencast: for loops and if statements] (~12 minutes)
| |
|
| |
| Learn Stats:
| |
| * Read Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
| |
| * Do [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 5]]
| |
|
| |
| Refresh:
| |
| * Skim Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
| |
| * Do [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 5]]
| |
|
| |
| Stronger:
| |
| * Skim [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 5]]
| |
|
| |
| '''Resources:'''
| |
| * [https://www.openintro.org/download.php?file=os3_slides_05&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §5 Lecture Notes]
| |
|
| |
| === Week 6: Categorical data ===
| |
|
| |
| All:
| |
| * Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through Library Subscription]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.) Also note the correction here: https://statmodeling.stat.columbia.edu/2014/10/14/didnt-say-part-2/
| |
|
| |
| Learn R:
| |
| *[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w06-R_lecture.Rmd Week 6 R lecture materials] (.Rmd file)
| |
|
| |
| Learn Stats:
| |
| * Read Diez, Barr, and Çetinkaya-Rundel: §6.1-6.4 (Inference for categorical data).
| |
| * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]]
| |
| * Do [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 6]]
| |
|
| |
| Refresh and get Stronger:
| |
| * Skim Diez, Barr, and Çetinkaya-Rundel: §6.1-6.4 (Inference for categorical data).
| |
| * Read over [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 6]]
| |
|
| |
| '''Resources'''
| |
| * Diez, Barr, and Çetinkaya-Rundel: §6.5-6.6 (Small samples and randomization inference)
| |
| * Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit)
| |
| * [https://www.openintro.org/download.php?file=os3_slides_06&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §6 Lecture Notes]
| |
| * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7
| |
|
| |
| === Week 7: Linear Regression ===
| |
| All:
| |
| * Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression)
| |
| * OpenIntro eschews a mathematical approach to correlation. Look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attention to the formulas. It's tedious to compute, but you should be aware of what goes into it.
| |
| * Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via library]]
| |
|
| |
| Learn Stats:
| |
| * [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]]
| |
|
| |
| Learn R:
| |
| * [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w07-R_lecture.Rmd Week 7 R lecture materials]
| |
|
| |
| '''Resources:'''
| |
| * [https://seeing-theory.brown.edu/ Seeing Theory] §5 (Regression Analysis)
| |
| * [https://www.openintro.org/download.php?file=os3_slides_07&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §7 Lecture Notes]
| |
| * [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
| |
| * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7 and 3 videos on the sections §8.1-8.3
| |
|
| |
|
| |
| === Week 8 ===
| |
| Polynomial Terms, Interactions, and Logistic Regression
| |
|
| |
| ====All:====
| |
| * Diez, Barr, and Çetinkaya-Rundel: §8 (Multiple and logistic regression)
| |
| * [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully.
| |
| * Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R]
| |
|
| |
| ====Learn Stats:====
| |
| * [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 8]]
| |
|
| |
| ====Learn R:====
| |
| *[https://communitydata.science/~ads/teaching/2019/stats/r_lectures/w08-R_lecture.Rmd Week 8 R lecture materials]
| |
|
| |
| ====Resources====
| |
| * Verzani: §11.3 (Linear regression), §13.1 (Logistic regression)
| |
| * Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” ''PLoS Medicine'' 2(8):e124. [[http://dx.doi.org/10.1371%2Fjournal.pmed.0020124 Open Access]]
| |
| * Head, Megan L., Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” ''PLOS Biology'' 13(3):e1002106. [[http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106 Open Access]]
| |
| * [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
| |
| * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4
| |