Editing Statistics and Statistical Programming (Winter 2017)
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 7: | Line 7: | ||
:* We will use Canvas for [https://canvas.uw.edu/courses/1098035/announcements announcements], [https://canvas.uw.edu/courses/1098035/assignments turning in assignments], and [https://canvas.uw.edu/courses/1098035/discussion_topics discussion] (if you choose to use them) | :* We will use Canvas for [https://canvas.uw.edu/courses/1098035/announcements announcements], [https://canvas.uw.edu/courses/1098035/assignments turning in assignments], and [https://canvas.uw.edu/courses/1098035/discussion_topics discussion] (if you choose to use them) | ||
:* Everything else will be linked on this page. | :* Everything else will be linked on this page. | ||
:'''Course Catalog Description:[https://www.washington.edu/students/crscat/com.html#com521]''' | :'''Course Catalog Description:[https://www.washington.edu/students/crscat/com.html#com521]''' | ||
Line 30: | Line 29: | ||
* Feel comfortable reading papers that use basic statistical techniques. | * Feel comfortable reading papers that use basic statistical techniques. | ||
* Feel comfortable and prepared enrolling in future statistics courses in CSSS. | * Feel comfortable and prepared enrolling in future statistics courses in CSSS. | ||
== Note About This Syllabus == | == Note About This Syllabus == | ||
Line 70: | Line 47: | ||
Diez, Barr, and Çetinkaya-Rundel's is a free, and freely-licensed, online statistics textbook. Over the last seven years, the book has also developed a large online community of students and teachers who have shared other resources. The book, lectures notes, and more are all freely licensed which has allowed the text to be adapted in a series of different fields. The book is excellent and it has been adopted extraordinarily widely. You can buy versions from Amazon in either [https://www.openintro.org/redirect.php?go=amazon_os3_hardcover&referrer=/stat/textbook.php full color hardcover] ($19.99) or in [https://www.openintro.org/redirect.php?go=createspace_os3&referrer=/stat/textbook.php black and white paperback] ($7.60). I haven't purchased a paper copy so I can't speak to the quality of either. | Diez, Barr, and Çetinkaya-Rundel's is a free, and freely-licensed, online statistics textbook. Over the last seven years, the book has also developed a large online community of students and teachers who have shared other resources. The book, lectures notes, and more are all freely licensed which has allowed the text to be adapted in a series of different fields. The book is excellent and it has been adopted extraordinarily widely. You can buy versions from Amazon in either [https://www.openintro.org/redirect.php?go=amazon_os3_hardcover&referrer=/stat/textbook.php full color hardcover] ($19.99) or in [https://www.openintro.org/redirect.php?go=createspace_os3&referrer=/stat/textbook.php black and white paperback] ($7.60). I haven't purchased a paper copy so I can't speak to the quality of either. | ||
Verzani's book is an introduction to the R programming language. It's designed to be used as a companion to a basic introductory statistics textbook (like OpenIntro). It's a poor stand-alone text but it will provide good resources for the material we're covering in the course and it should act as a good reference going forward. The book is available online for about $50. | Verzani's book is an introduction to the R programming language. It's designed to be used as a companion to a basic introductory statistics textbook (like OpenIntro). It's a poor stand-alone text but it will provide good resources for the material we're covering in the course and it should act as a good reference going forward. The book is available online for about $50. ''I'd recommend holding off on purchasing the book until after the first class.'' | ||
Although it's not required for the course, I want to point you to these two books. When I was learning R, these both were very useful references: | Although it's not required for the course, I want to point you to these two books. When I was learning R, these both were very useful references: | ||
Line 131: | Line 108: | ||
* An identification of the dataset you will use and a description of the columns or type of data it will include. If you do not currently have access to these data, explain when you will have access to the data. | * An identification of the dataset you will use and a description of the columns or type of data it will include. If you do not currently have access to these data, explain when you will have access to the data. | ||
==== Final Project | ==== Final Project ==== | ||
;Outline Due Date: February 21 | ;Outline Due Date: February 21 | ||
;Maximum outline length: 5 pages | ;Maximum outline length: 5 pages | ||
;Paper Due Date: March 19 | ;Paper Due Date: March 19 | ||
;Maximum length: 6000 words (~20 pages) | ;Maximum outline length: 6000 words (~20 pages) | ||
;Presentation Date: March | ;Presentation Date: March 7 | ||
;All Deliverables: Turn in in Canvas | ;All Deliverables: Turn in in Canvas | ||
Line 156: | Line 123: | ||
I have a strong preference for you to write this paper individually but I'm open to the idea that you may want to work with others in the class. | I have a strong preference for you to write this paper individually but I'm open to the idea that you may want to work with others in the class. | ||
'''''Details Forthcoming:''''' ''Although this material is still somewhat thin, I'll be posting many additional details about the expectations for the final paper as we move forward through the quarter.'' | |||
=== Grading === | === Grading === | ||
Line 182: | Line 146: | ||
* Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. | * Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. | ||
* Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help. | * Set up a meeting with Jennifer Muilenburg — Data Curriculum and Communications Librarian who runs [https://www.lib.washington.edu/digitalscholarship/services/data research data services at the UW libraries]. Her email is: libdata@uw.edu I've have talked to her about this course and she is excited about meeting with you to help. | ||
In general, you're responsible for make sure that you're on the right side of the human subject rules and that work is ethical. Class projects generally do not need IRB approval but I hope that each of your projects will turn into something more. If your study involves human subjects research, ''that'' work will need IRB oversight of some sort. In general, you can't do a class project with IRB approval and then retroactively get it later. Secondary analysis of anonymized data is generally not considered human subjects research but I strongly suggest that you get a determination from [https://www.washington.edu/research/hsd UW's Human Subject Division] before you start. For work that is not considered human subjects research, this can often happen in a few hours or days. If you need a faculty sponsor, that should ideally be your advisor. If that doesn't make sense for any of you, I'm happy to talk about serving as the faculty supervisor for the work. | In general, you're responsible for make sure that you're on the right side of the human subject rules and that work is ethical. Class projects generally do not need IRB approval but I hope that each of your projects will turn into something more. If your study involves human subjects research, ''that'' work will need IRB oversight of some sort. In general, you can't do a class project with IRB approval and then retroactively get it later. Secondary analysis of anonymized data is generally not considered human subjects research but I strongly suggest that you get a determination from [https://www.washington.edu/research/hsd UW's Human Subject Division] before you start. For work that is not considered human subjects research, this can often happen in a few hours or days. If you need a faculty sponsor, that should ideally be your advisor. If that doesn't make sense for any of you, I'm happy to talk about serving as the faculty supervisor for the work. | ||
Line 214: | Line 177: | ||
* Verzani: §1 (Getting Started), §2 (Univariate data) [[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch1_ch2.pdf Available with UWNetID]] | * Verzani: §1 (Getting Started), §2 (Univariate data) [[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch1_ch2.pdf Available with UWNetID]] | ||
* Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Available through UW libraries]] | * Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Available through UW libraries]] | ||
'''Assignment (Complete Before Class):''' | '''Assignment (Complete Before Class):''' | ||
Line 225: | Line 184: | ||
'''Lectures:''' | '''Lectures:''' | ||
* [https://communitydata.cc/~mako | * [https://communitydata.cc/~mako/com521-week_01-r_programming_intro-20170103.ogv Week 1 R Lecture (Part I): Introduction to R and Univariate statistics] (~1 hour 47 minutes) | ||
* [https://communitydata.cc/~mako | * [https://communitydata.cc/~mako/com521-week_01-r_programming_intro-20170103.ogv Week 1 R Lecture (Part II): Setting up Git/GitHub and saving files in RStudio] (~40 minutes) | ||
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 1]] | * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 1]] | ||
'''Optional Readings:''' | |||
* Verzani: §A (Programming) | |||
'''Resources:''' | '''Resources:''' | ||
Line 239: | Line 202: | ||
* Diez, Barr, and Çetinkaya-Rundel: §2 (Probability) | * Diez, Barr, and Çetinkaya-Rundel: §2 (Probability) | ||
* Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) | * Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) | ||
* Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] | * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] | ||
Line 245: | Line 208: | ||
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 2]] | * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 2]] | ||
'''Resources:''' | '''Resources:''' | ||
Line 255: | Line 213: | ||
* [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes] | ||
* [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2 | * [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2 | ||
* [[Statistics and Statistical Programming (Winter 2017)/ | |||
'''Lectures:''' | |||
* [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 2]] | |||
=== Week 3: Tuesday January 17: Distributions === | === Week 3: Tuesday January 17: Distributions === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §3.1-3.2, §3.4 | * Diez, Barr, and Çetinkaya-Rundel: §3.1-3.2, §3.4 | ||
* Verzani: §6 (Populations) | * Verzani: §6 (Populations) | ||
* ''Empirical Paper TBD'' | |||
* | |||
=== Week 4: Tuesday January 24: Statistical significance and hypothesis testing === | === Week 4: Tuesday January 24: Statistical significance and hypothesis testing === | ||
Line 287: | Line 233: | ||
* Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference) | * Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference) | ||
* Verzani: §7 (Statistical inference), §8 (Confidence intervals) | * Verzani: §7 (Statistical inference), §8 (Confidence intervals) | ||
* ''Empirical Paper TBD'' | |||
* | |||
=== Week 5: Tuesday January 31: Continuous Numeric Data & ANOVA === | === Week 5: Tuesday January 31: Continuous Numeric Data & ANOVA === | ||
Line 309: | Line 241: | ||
* Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) | * Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) | ||
* Verzani: §9 (significance tests), §12 (Analysis of variance) | * Verzani: §9 (significance tests), §12 (Analysis of variance) | ||
* | * ''Empirical Paper TBD'' | ||
=== Week 6: Tuesday February 7: Categorical data === | === Week 6: Tuesday February 7: Categorical data === | ||
Line 333: | Line 249: | ||
* Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data) | * Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data) | ||
* Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) | * Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) | ||
* | * ''Empirical Paper TBD'' | ||
=== Week 7: Tuesday February 14: Simple Linear Regression === | |||
=== Week 7: Tuesday February 14: Linear Regression === | |||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression) | * Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression) | ||
* Verzani: §11.1-2 (Linear regression), | * Verzani: §11.1-2 (Linear regression), | ||
* | * ''Empirical Paper TBD'' | ||
=== Week 8: Tuesday February 21: Multiple and Logistic Regression === | |||
=== Week 8: Tuesday February 21: | |||
'''Required Readings:''' | '''Required Readings:''' | ||
* Diez, Barr, and Çetinkaya-Rundel: §8 (Multiple and logistic regression) | |||
* Diez, Barr, and Çetinkaya-Rundel: §8 | |||
* Verzani: §11.3 (Linear regression), §13.1 (Logistic regression) | * Verzani: §11.3 (Linear regression), §13.1 (Logistic regression) | ||
* | * ''Empirical Paper TBD'' | ||
=== Week 9: Tuesday February 28: Consulting Meetings === | === Week 9: Tuesday February 28: Consulting Meetings === | ||
Line 411: | Line 275: | ||
We won't meet as a group. Instead, you will each meet on-on-one with me to work through challenges and issues with your analysis. | We won't meet as a group. Instead, you will each meet on-on-one with me to work through challenges and issues with your analysis. | ||
=== Week 11: March 14: Final Presentations === | === Week 11: Date/Time TBD (Tentatively March 14): Final Presentations === | ||
== Administrative Notes == | == Administrative Notes == |