Editing CommunityData:StatsGaps

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:


Welcome to the StatsGaps StudyGroup page -- a set of suggested learning pathways making use of course resources produced by community data faculty, meant to be used by folks who have a mixture of familiarity and non-familiarity with R, statistics, and research processes. The primary text is: [https://www.openintro.org/download.php?file=os3&referrer=/stat/textbook.php Open Intro to Statistics]
Welcome to the StatsGaps StudyGroup page -- a set of suggested learning pathways making use of course resources produced by community data faculty, meant to be used by folks who have a mixture of familiarity and non-familiarity with R, statistics, and research processes. The primary text is: [[https://www.openintro.org/download.php?file=os3&referrer=/stat/textbook.php Open Intro to Statistics]]


We borrow heavily from the course most recently taught by Aaron: [[Statistics_and_Statistical_Programming_(Spring_2019)]]
We borrow heavily from the course most recently taught by Aaron: [[https://wiki.communitydata.science/Statistics_and_Statistical_Programming_(Spring_2019)]]
 
To support participation from people with ranging prior experiences and learning goals for this summer, we've organized this content into the following strands: 


Follow the strand(s) that apply to you:
* '''Learn R''' -- you don't know R
* '''Learn R''' -- you don't know R
* '''Learn Stats''' -- you haven't taken much if any statistics, or otherwise feel you're mostly starting from scratch
* '''Learn Stats''' -- you haven't taken much if any statistics, or otherwise feel you're mostly starting from scratch
* '''Refresh''' -- overview and shore up your stats knowledge if it feels rusty
* '''Refresh''' -- overview and shore up your stats knowledge if it feels rusty
* '''Stronger''' -- your stats knowledge is strong but your class stopped before you got to good stuff you see used in lots of the papers in this group (e.g. regression)
* '''Stronger''' -- your stats knowledge is strong but your class stopped before you got to good stuff you see used in lots of the papers in this group, like regression)


Depending on which strand(s) best apply to you, we provide different recommended readings and assignments each week.
Meet at http://meet.jit.si/cdsc on Monday at 11:00 Pacific -- 1 Central.
 
Meet at http://meet.jit.si/cdsc on Monday at 11:00 Pacific, 1:00 Central.


Schedule:
Schedule:
* [[CommunityData:StatsGaps#Week 1|7/15 -- Discuss weeks 1, 2, and 3]]
[[CommunityData:StatsGaps#Week 1|7/15 -- Discuss weeks 1, 2, and 3]]
* [[CommunityData:StatsGaps#Week 4|7/22 -- Discuss weeks 4 and 5]]
[[CommunityData:StatsGaps#Week 4|7/22 -- Discuss weeks 4 and 5]]
* [[CommunityData:StatsGaps#Week 6|7/29 -- Discuss week 6]]
[[CommunityData:StatsGaps#Week 6|7/29 -- Discuss week 6]]
* [[CommunityData:StatsGaps#Week 7|8/5 -- Discuss week 7]]
[[CommunityData:StatsGaps#Week 7|8/5 -- Discuss week 7]]
* [[CommunityData:StatsGaps#Week 8|8/12 -- Discuss week 8]]
[[CommunityData:StatsGaps#Week 8|8/12 -- Discuss week 8]]
* [[CommunityData:StatsGaps#Week 9|8/19 -- Office Hours]]
[[CommunityData:StatsGaps#Week 9|8/19 -- Discuss weeks 9 and 10]]
* [[CommunityData:StatsGaps#Week 10|8/26 -- Office Hours]]
[[CommunityData:StatsGaps#Week 10|8/26 -- Circle back and pick up dropped threads, discuss next steps -- what's out there, what do you need to know, etc. ]]


=== Week 1 ===
=== Week 1 ===


All:
All:
* Read: '''Preventing harassment and increasing group participation through social norms in 2,190 online science discussions''' J. Nathan Matias
* Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks. ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Open Access]] ''If you haven't read this, you should--also, check the 'Editorial expression of concern' at the top and toss "kramer guillory hancock 2014" if you want to see the firestorm that this article created.-khc''
PNAS May 14, 2019 116 (20) 9785-9789; first published April 29, 2019 -- [https://doi.org/10.1073/pnas.1813486116 Mattias--Harassment Prevention]


Learn R:
Learn R:
Line 37: Line 33:
Learn Stats:
Learn Stats:
* Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data)
* Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data)
* [[CommunityData:StatsGaps_PS1|Do Problem Set 1]]
* [[https://wiki.communitydata.science/Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_1 Do Problem Set 1]]


Refresh:
Refresh:
* Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data)
* Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data)
* [[CommunityData:StatsGaps_PS1|Read Problem Set 1]]
* [[https://wiki.communitydata.science/Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_1 Read Problem Set 1]]


Stronger:
Stronger:
* [[CommunityData:StatsGaps_PS1|Skim Problem Set 1]] -- since we may discuss it f2f. Take a look at the text's Chapter 1 if you find any of the questions to be confusing or the answer you came up with is different than the key.
* [[https://wiki.communitydata.science/Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_1 Skim Problem Set 1]] -- since we may discuss it f2f. Take a look at the text's Chapter 1 if you find any of the questions to be confusing or the answer you came up with is different than the key.




Line 54: Line 50:


All:
All:
* Shaw, Aaron and Yochai Benkler. 2012. A tale of two blogospheres: Discursive practices on the left and right. ''American Behavioral Scientist''. 56(4): 459-487. [https://doi.org/10.1177%2F0002764211433793]
* Shaw, Aaron and Yochai Benkler. 2012. A tale of two blogospheres: Discursive practices on the left and right. ''American Behavioral Scientist''. 56(4): 459-487. [[https://doi.org/10.1177%2F0002764211433793]]


Learn R:
Learn R:
Line 74: Line 70:
'''Extra Resources:'''
'''Extra Resources:'''
* [https://seeing-theory.brown.edu/ Seeing Theory] §1 (Basic Probability) and §2 (Compound Probability). (Note: this site provides a beautiful visual introduction to core concepts in probability and statistics).
* [https://seeing-theory.brown.edu/ Seeing Theory] §1 (Basic Probability) and §2 (Compound Probability). (Note: this site provides a beautiful visual introduction to core concepts in probability and statistics).
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF]
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF]]
* [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes]
* [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2
* [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2
Line 101: Line 97:
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2
=== Week 4: Statistical significance and hypothesis testing ===
All:
* Read Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference) (''I suggest everyone read this chapter -- this topic is a source of much confusion. -khc'')
* Gelman, Andrew and Hal Stern. 2006. “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant.” ''The American Statistician'' 60(4):328–31. [[http://dx.doi.org/10.1198/000313006X152649 Available via your library]]
Learn R: N/A
Learn Stats:
* Do [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]
Refresh:
* Read Problem Set 4 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]
Stronger:
* Skim Problem Set 4 [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 4]]
'''Resources:'''
*[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w04-R_lecture.Rmd Week 4 R lecture materials] (.Rmd file)
* [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4
* Verzani: §7 (Statistical inference), §8 (Confidence intervals)
* [https://seeing-theory.brown.edu/ Seeing Theory] §4 (Frequentist Inference)
=== Week 5: Continuous Numeric Data & ANOVA ===
All:
* Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. [[https://doi.org/10.1016/j.pubrev.2007.05.016 Available through NU Libraries]]
Learn R:
* [https://communitydata.cc/~mako/2017-COM521/com521-week_05-ttests_and_anova.ogv Week 5 R lecture screencast: t-tests] (~22 minutes)
* [https://communitydata.cc/~mako/2017-COM521/com521-week_05-for_if.ogv Week 5 R lecture screencast: for loops and if statements] (~12 minutes)
Learn Stats:
* Read Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
* Do [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 5]]
Refresh:
* Skim Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
* Do [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 5]]
Stronger:
* Skim [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 5]]
'''Resources:'''
* [https://www.openintro.org/download.php?file=os3_slides_05&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §5 Lecture Notes]
=== Week 6: Categorical data ===
All:
* Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through Library Subscription]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.) Also note the correction here: https://statmodeling.stat.columbia.edu/2014/10/14/didnt-say-part-2/
Learn R:
*[https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w06-R_lecture.Rmd Week 6 R lecture materials] (.Rmd file)
Learn Stats:
* Read Diez, Barr, and Çetinkaya-Rundel: §6.1-6.4 (Inference for categorical data).
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on Hill's personal website]]
* Do [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 6]]
Refresh and get Stronger:
* Skim Diez, Barr, and Çetinkaya-Rundel: §6.1-6.4 (Inference for categorical data).
* Read over [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 6]]
'''Resources'''
* Diez, Barr, and Çetinkaya-Rundel: §6.5-6.6 (Small samples and randomization inference)
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit)
* [https://www.openintro.org/download.php?file=os3_slides_06&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §6 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7
=== Week 7: Linear Regression ===
All:
* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression)
* OpenIntro eschews a mathematical approach to correlation. Look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attention to the formulas. It's tedious to compute, but you should be aware of what goes into it.
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available via library]]
Learn Stats:
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 7]]
Learn R:
* [https://communitydata.cc/~ads/teaching/2019/stats/r_lectures/w07-R_lecture.Rmd Week 7 R lecture materials]
'''Resources:'''
* [https://seeing-theory.brown.edu/ Seeing Theory] §5 (Regression Analysis)
* [https://www.openintro.org/download.php?file=os3_slides_07&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §7 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7 and 3 videos on the sections §8.1-8.3
=== Week 8 ===
Polynomial Terms, Interactions, and Logistic Regression
====All:====
* Diez, Barr, and Çetinkaya-Rundel: §8 (Multiple and logistic regression)
* [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully.
* Mako Hill wrote this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R]
====Learn Stats:====
* [[Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 8]]
====Learn R:====
*[https://communitydata.science/~ads/teaching/2019/stats/r_lectures/w08-R_lecture.Rmd Week 8 R lecture materials]
====Resources====
* Verzani: §11.3 (Linear regression), §13.1 (Logistic regression)
* Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” ''PLoS Medicine'' 2(8):e124. [[http://dx.doi.org/10.1371%2Fjournal.pmed.0020124 Open Access]]
* Head, Megan L., Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” ''PLOS Biology'' 13(3):e1002106. [[http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106 Open Access]]
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)