Editing Statistics and Statistical Programming (Winter 2017)

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 30: Line 30:
* Feel comfortable reading papers that use basic statistical techniques.
* Feel comfortable reading papers that use basic statistical techniques.
* Feel comfortable and prepared enrolling in future statistics courses in CSSS.
* Feel comfortable and prepared enrolling in future statistics courses in CSSS.
== Why Statistical Programming? ==
This class will focus much more on statistical programming in R than most similar classes. Most similar classes in communication will focus on using an easier to use statistical package like SPSS.
We're focusing on programming instead of a package like SPSS for several reasons:
* Student who understands a programming language won't be limited to the "canned" functions in the off-the-shelf packages.
* Pedagogically, programming supports students in building a deeper understanding of the mathematics and assumptions behind the canned functions by both allowing them to read the code "behind" the canned functions and by allowing the students to implement the functions themselves in assignments.
* Analyses composed of code instead of clicks supports reproducible analyses that can document every step of the process of an analysis including during data cleaning and conversion where errors are common and very difficult to detect.
* Because programming is a skill that is in demand in our department and discipline more generally and that I strongly believe is generally useful.
Of course, there are other programming languages well suited to statistics including Stata and Python.  Ultimately, I'm teaching R because a few of us that seemed mostly to teach in this sequence going forward future got together and the decision was that R made the most sense and because there was consensus among the faculty in the department who were likely to teach statistics classes in the future that this made the most sense.
Our reasoning was that:
* R is freely available and open source
* R is becoming the most widely used package in statistical fields and is (by our estimate) used by most academics in my cohort or later in statistics, political science, and economics already.
* R is the system (along with Stata) that will be in other CSSS advanced stats classes we hope students will continue to take after COM521.
* R is better general purpose programming language than software like Stata which means that R programming skills will let students solve non-stastical problems like collecting data from the web and will make it easier to learn other programming languages.
For students with a strong psychometric focus or whose research will be limited to linear and logistic regression or ANOVA on small pre-collected datasets and similar, SPSS will likely be fine. R has a higher barrier to entry than SPSS but it's ceiling is ''much'' higher.


== Note About This Syllabus ==
== Note About This Syllabus ==
Line 146: Line 124:


;Paper Due Date: March 19
;Paper Due Date: March 19
;Maximum length: 6000 words (~20 pages)
;Maximum outline length: 6000 words (~20 pages)
;Presentation Date: March 14
;Presentation Date: March 7
;All Deliverables: Turn in in Canvas
;All Deliverables: Turn in in Canvas


Line 156: Line 134:
I have a strong preference for you to write this paper individually but I'm open to the idea that you may want to work with others in the class.
I have a strong preference for you to write this paper individually but I'm open to the idea that you may want to work with others in the class.


In terms of content:
'''''Details Forthcoming:''''' ''Although this material is still somewhat thin, I'll be posting many additional details about the expectations for the final paper as we move forward through the quarter.''
 
* In terms of the structure of the paper, please see the page that I've written on the [[structure of a quantitative empirical research paper]].
* In terms of the structure of your presentation, you've got some latitude but this document on [https://canvas.uw.edu/files/40848246/download?download_frd=1 Creating a Successful Scholarly Presentation] (link is in Canvas) will likely be useful.


=== Grading ===
=== Grading ===
Line 225: Line 200:
'''Lectures:'''
'''Lectures:'''


* [https://communitydata.cc/~mako/2017-COM521/com521-week_01-r_programming_intro-20170103.ogv Week 1 R lecture screencast (Part I): Introduction to R and univariate statistics] (~1 hour 47 minutes)
* [https://communitydata.cc/~mako/com521-week_01-r_programming_intro-20170103.ogv Week 1 R lecture screencast (Part I): Introduction to R and univariate statistics] (~1 hour 47 minutes)
* [https://communitydata.cc/~mako/2017-COM521/com521-week_01-github_rscripts-20170104.ogv Week 1 R lecture screencast (Part II): Setting up git/GitHub and saving files in RStudio] (~40 minutes)
* [https://communitydata.cc/~mako/com521-week_01-github_rscripts-20170104.ogv Week 1 R lecture screencast (Part II): Setting up git/GitHub and saving files in RStudio] (~40 minutes)
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 1]]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 1]]


Line 249: Line 224:


* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 2]]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 2]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_02-lists_dataframes_graphing-20170111.ogv Week 2 R lecture screencast: lists, matrixes, data frames, and beginning graphing] (~1 hour 8 minutes)
* [https://communitydata.cc/~mako/com521-week_02-lists_dataframes_graphing-20170111.ogv Week 2 R lecture screencast: lists, matrixes, data frames, and beginning graphing] (~1 hour 8 minutes)


'''Resources:'''
'''Resources:'''
Line 271: Line 246:


* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 3]]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 3]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-loading_data_functions_apply_misc.ogv Week 3 R lecture screencast: Loading data, functions; apply(), lapply(), sapply(); several miscellaneous functions] (~34 minutes) — This is the same material I covered in class. If you followed it, there's no reason you need to go back to this.
* [https://communitydata.cc/~mako/com521-week_03-loading_data_functions_apply_misc.ogv Week 3 R lecture screencast: Loading data, functions; apply(), lapply(), sapply(); several miscellaneous functions] (~34 minutes) — This is the same material I covered in class. If you followed it, there's no reason you need to go back to this.
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-dates_tapply_merge.ogv Week 3 R lecture screencast: Dates; tapply(); and merge()] (~38 minutes) [The audio seems to be broken for the last 10 minutes. Sorry about that! I've rerecorded that below.]
* [https://communitydata.cc/~mako/com521-week_03-dates_tapply_merge.ogv Week 3 R lecture screencast: Dates; tapply(); and merge()] (~38 minutes) [The audio seems to be broken for the last 10 minutes. Sorry about that! I've rerecorded that below.]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_03-merge.ogv Week 3 R lecture screencast: merge()] (~13 minutes) [Rerecording of the last few minutes of the previous video.]
* [https://communitydata.cc/~mako/com521-week_03-merge.ogv Week 3 R lecture screencast: merge()] (~13 minutes) [Rerecording of the last few minutes of the previous video.]


'''Resources:'''
'''Resources:'''
Line 295: Line 270:


* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 4]]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 4]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_04-misc_confint_simulation-20170125.ogv Week 4 R lecture screencast: order(); confidence intervals; simulations drawn from repeated random samples] (~27 minutes)
* [https://communitydata.cc/~mako/com521-week_04-misc_confint_simulation-20170125.ogv Week 4 R lecture screencast: order(); confidence intervals; simulations drawn from repeated random samples] (~27 minutes)


'''Resources:'''
'''Resources:'''
Line 320: Line 295:


* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 5]]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 5]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_05-ttests_and_anova.ogv Week 5 R lecture screencast: t-tests] (~22 minutes)
* [https://communitydata.cc/~mako/com521-week_05-ttests_and_anova.ogv Week 5 lecture screencast: t-tests] (~22 minutes)
* [https://communitydata.cc/~mako/2017-COM521/com521-week_05-for_if.ogv Week 5 R lecture screencast: for loops and if statements] (~12 minutes)
* [https://communitydata.cc/~mako/com521-week_05-for_if.ogv Week 5 R lecture screencast: for loops and if statements] (~12 minutes)


'''Resources:'''
'''Resources:'''
Line 343: Line 318:


* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 6]]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 6]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes)
* [https://communitydata.cc/~mako/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes)


'''Resources:'''
'''Resources:'''
Line 357: Line 332:
* OpenIntro eschews a mathematical instruction to correlation. Can you look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attentions to the formulas. It's tedious to compute but I'd like to you to at least see what goes into it.
* OpenIntro eschews a mathematical instruction to correlation. Can you look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attentions to the formulas. It's tedious to compute but I'd like to you to at least see what goes into it.
* Verzani: §11.1-2 (Linear regression),
* Verzani: §11.1-2 (Linear regression),
* Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” ''PLoS Medicine'' 2(8):e124. [[http://dx.doi.org/10.1371%2Fjournal.pmed.0020124 Open Access]]
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in UW libraries]]
* Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in UW libraries]]


Line 366: Line 342:


* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 7]]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 7]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_07-linear_regression.ogv Week 7 R lecture screencast: linear regression] (~42 minutes)
* [https://communitydata.cc/~mako/com521-week_07-linear_regression.ogv Week 7 R lecture screencast: linear regression] (~42 minutes)


'''Resources:'''
'''Resources:'''
Line 395: Line 371:


* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 8]]
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 8]]
* [https://communitydata.cc/~mako/2017-COM521/com521-week_08-more_regression_anova_redux.ogv Week 8 R lecture screencast: more on linear regression, including interactions, polynomials, log transformations; anova] (~28 minutes)
<!-- * [https://communitydata.cc/~mako/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes) -->


'''Resources:'''
'''Resources:'''
Line 401: Line 377:
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
* [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes]
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4
* I've written this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R]


=== Week 9: Tuesday February 28: Consulting Meetings ===
=== Week 9: Tuesday February 28: Consulting Meetings ===
Line 411: Line 386:
We won't meet as a group. Instead, you will each meet on-on-one with me to work through challenges and issues with your analysis.
We won't meet as a group. Instead, you will each meet on-on-one with me to work through challenges and issues with your analysis.


=== Week 11: March 14: Final Presentations ===
=== Week 11: Date/Time TBD (Tentatively March 14): Final Presentations ===


== Administrative Notes ==
== Administrative Notes ==
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)