Editing User:Aaronshaw/Stats course
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
:'''Statistics and Statistical Programming''' | :'''Statistics and Statistical Programming''' | ||
:'''MTS 525''' Media, Technology & Society | :'''MTS 525''' Media, Technology & Society, Northwestern University | ||
:'''Instructor:''' [http://aaronshaw.org Aaron Shaw] ([https://communication.northwestern.edu/faculty/AaronShaw Northwestern University]) | :'''Instructor:''' [http://aaronshaw.org Aaron Shaw] ([https://communication.northwestern.edu/faculty/AaronShaw Northwestern University]) | ||
:'''Course Websites''': | :'''Course Websites''': | ||
:* We will use | :* We will use Canvas for [https://canvas.northwestern.edu announcements], [https://canvas.northwestern.edu/ turning in assignments], and [https://canvas.northwestern.edu discussion] (if you choose to use them) | ||
:* Everything else will be linked on this page. | :* Everything else will be linked on this page. | ||
:* List of student git repositories (will be a link) | :* List of student git repositories (will be a link) | ||
Line 53: | Line 52: | ||
The texbook (in any format) is required material for the course. You can download it at no cost and/or buy (affordable!) hard copy versions in either [https://www.openintro.org/redirect.php?go=amazon_os3_hardcover&referrer=/stat/textbook.php full color hardcover] or in [https://www.openintro.org/redirect.php?go=createspace_os3&referrer=/stat/textbook.php black and white paperback]. The book is excellent and has been adopted widely. It has also developed a large online community of students and teachers who have shared other resources. Lecture slides, videos, notes, and more are all freely licensed (many through the website and others elsewhere). | The texbook (in any format) is required material for the course. You can download it at no cost and/or buy (affordable!) hard copy versions in either [https://www.openintro.org/redirect.php?go=amazon_os3_hardcover&referrer=/stat/textbook.php full color hardcover] or in [https://www.openintro.org/redirect.php?go=createspace_os3&referrer=/stat/textbook.php black and white paperback]. The book is excellent and has been adopted widely. It has also developed a large online community of students and teachers who have shared other resources. Lecture slides, videos, notes, and more are all freely licensed (many through the website and others elsewhere). | ||
I | I am also assigning several chapters from the book | ||
* | * <TODO> Reinhardt book. | ||
This book provides a conceptual introduction to some common failures in statistical analysis that you | This book provides a conceptual introduction to some common failures in statistical analysis that you need to learn to recognize and avoid. It was also written by a Ph.D. student. | ||
A few other books may be useful resources while you're learning to analyze, visualize, and interpret statistical data with R | A few other (optional) books may be useful resources while you're learning to analyze, visualize, and interpret statistical data with R: | ||
* Teetor, Paul. 2011. ''R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics''. 1 edition. Sebastopol, CA: O’Reilly Media. ([http://proquest.safaribooksonline.com/9780596809287 Safari Proquest/UW Libraries]; [https://en.wikipedia.org/wiki/Special:BookSources/978-0-596-80915-7 Various Sources]; [https://www.amazon.com/Cookbook-Analysis-Statistics-Graphics-Cookbooks/dp/0596809158/ref=sr_1_1?ie=UTF8&qid=1482802812&sr=8-1&keywords=r+cookbook Amazon]) | |||
* Teetor, Paul. 2011. ''R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics''. 1 edition. Sebastopol, CA: O’Reilly Media. ([http://proquest.safaribooksonline.com/9780596809287 Safari Proquest/ | |||
* Verzani, John. 2014. ''Using R for Introductory Statistics, Second Edition''. 2 edition. Boca Raton: Chapman and Hall/CRC. ([https://en.wikipedia.org/wiki/Special:BookSources/978-1-4665-9073-1 Various Sources]; [https://www.amazon.com/Using-Introductory-Statistics-Second-Chapman/dp/1466590734/ref=mt_hardcover?_encoding=UTF8&me= Amazon]) | * Verzani, John. 2014. ''Using R for Introductory Statistics, Second Edition''. 2 edition. Boca Raton: Chapman and Hall/CRC. ([https://en.wikipedia.org/wiki/Special:BookSources/978-1-4665-9073-1 Various Sources]; [https://www.amazon.com/Using-Introductory-Statistics-Second-Chapman/dp/1466590734/ref=mt_hardcover?_encoding=UTF8&me= Amazon]) | ||
* Wickham, Hadley. 2010. ''ggplot2: Elegant Graphics for Data Analysis''. 1st ed. 2009. Corr. 3rd printing 2010 edition. New York: Springer. ([https://link.springer.com/book/10.1007%2F978-3-319-24277-4 Springer/ | * Wickham, Hadley. 2010. ''ggplot2: Elegant Graphics for Data Analysis''. 1st ed. 2009. Corr. 3rd printing 2010 edition. New York: Springer. ([https://link.springer.com/book/10.1007%2F978-3-319-24277-4 Springer/UW Libraries]; [https://en.wikipedia.org/wiki/Special:BookSources/978-0-596-80915-7 Various Sources]) | ||
There are also some | There are also some non-textbook resources that are invaluable: | ||
* [ftp://cran.r-project.org/pub/R/doc/contrib/Baggott-refcard-v2.pdf Baggott's R Reference Card v2] — Print this out. Take it with you everywhere and look at it dozens of times a day. You will learn the language faster! | * [ftp://cran.r-project.org/pub/R/doc/contrib/Baggott-refcard-v2.pdf Baggott's R Reference Card v2] — Print this out. Take it with you everywhere and look at it dozens of times a day. You will learn the language faster! | ||
* [https://stackoverflow.com/questions/tagged/r StackOverflow R Tag] — Somebody already had your question about how to do ''X'' in R. They asked it, and several people have answered it, on StackOverflow. Learning to read this effectively will take time but as build up some basic familiarity with R and with StackOverflow, it will get easier. I promise. | * [https://stackoverflow.com/questions/tagged/r StackOverflow R Tag] — Somebody already had your question about how to do ''X'' in R. They asked it, and several people have answered it, on StackOverflow. Learning to read this effectively will take time but as build up some basic familiarity with R and with StackOverflow, it will get easier. I promise. | ||
* [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution. | * [http://rseek.org/ Rseek] — Rseek is a modified version of Google that just search R websites online. Sometimes, R is hard to search before because R is a common letter. This has become much easier over time as R has become more popular but it might still be the case sometimes and Rseek is a good solution. | ||
* | * <TODO> ggplot2 documentation — Ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it. | ||
== Assignments == | == Assignments == | ||
Line 93: | Line 91: | ||
Coming to class will be profoundly important to learning the material and to your final grade. Although the problem sets will not be graded, it is critical that you be present and able to discuss your answers to each of the questions. Your ability to do so will figure prominently in your participation grade for the course (40% of your final grade). More on | Coming to class will be profoundly important to learning the material and to your final grade. Although the problem sets will not be graded, it is critical that you be present and able to discuss your answers to each of the questions. Your ability to do so will figure prominently in your participation grade for the course (40% of your final grade). More on | ||
I | I encourage you to form groups to work on the problem sets if you find that helpful; however, you must still submit your work individually to help ensure that you learn and understand the material. | ||
<TODO create rubric?> The "Participation Rubric" section of [https://mako.cc/teaching/assessment.html my page on assessment] gives the details on how I evaluate participation in my classes. If you sense a conflict between material in this section and material on that page, you can safely assume that the syllabus takes precedence. | |||
=== Research project === | === Research project === | ||
Line 104: | Line 102: | ||
* '''Find a dataset''' — Very quickly, you should identify a dataset you will use to complete this project. For most of you, I suspect you will be engaging in secondary data analysis or a analysis of a previously collected dataset. | * '''Find a dataset''' — Very quickly, you should identify a dataset you will use to complete this project. For most of you, I suspect you will be engaging in secondary data analysis or a analysis of a previously collected dataset. | ||
* '''Engage in descriptive data analysis''' — Use R to calculate descriptive statistics and visualizations to describe your data. | * '''Engage in descriptive data analysis''' — Use R to calculate descriptive statistics and visualizations to describe your data. | ||
* ''' | * '''Test at least one hypothesis about relationships between two or more variables''' | ||
* '''Report and interpret your findings''' — You will do this in both a short paper and a short presentation. | * '''Report and interpret your findings''' — You will do this in both a short paper and a short presentation. | ||
* '''Ensure that your work is replicable''' — You will need to provide code and data for your analysis in a way that makes your work replicable by other researchers. | * '''Ensure that your work is replicable''' — You will need to provide code and data for your analysis in a way that makes your work replicable by other researchers. | ||
Line 114: | Line 112: | ||
==== Project plan and dataset identification ==== | ==== Project plan and dataset identification ==== | ||
;Due date: | ;Due date: <TBA> | ||
;Maximum length: 500 words (~1-2 pages) | ;Maximum length: 500 words (~1-2 pages) | ||
Line 125: | Line 123: | ||
==== Project planning document ==== | ==== Project planning document ==== | ||
;Due date: | ;Due date: <TBA> | ||
;Maximum length: 5 pages | ;Maximum length: 5 pages | ||
Line 136: | Line 134: | ||
==== Project presentation and paper ==== | ==== Project presentation and paper ==== | ||
;Paper due date: | ;Paper due date: <TBA> | ||
;Maximum length: 6000 words (~20 pages) | ;Maximum length: 6000 words (~20 pages) | ||
;Presentation due date: | ;Presentation due date: <TBA> | ||
;Maximum length: | ;Maximum length: <TBA> minutes | ||
Line 147: | Line 145: | ||
As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. This can happen through Github. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution. | As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. This can happen through Github. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution. | ||
Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts | Because the emphasis in this class is on statistics and methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts. | ||
I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class | I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. | ||
I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., <TODO link> ACM SIGCHI CSCW format or <TODO link> APA 6th edition) that is applicable for | I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., <TODO link> ACM SIGCHI CSCW format or <TODO link> APA 6th edition) that is applicable for the journal or conference in which you aim to publish the work (they all have formatting or submission guidelines published online and you can follow them). This includes the references. I also strongly recommend that you use reference management software to handle your bibliographic sources. | ||
'' The presentation:'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (link is in Canvas) | '' The presentation:'' The presentation will provide an opportunity to share a brief summary of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu Creating a Successful Scholarly Presentation] (link is in Canvas) will likely be useful. | ||
=== Grading === | === Grading === | ||
<TODO decide/update?>I have put together a very detailed page that describes [https://mako.cc/teaching/assessment.html grading rubric] I will be using in this course. Please read it carefully. | |||
I will assign grades (usually a numeric value ranging from 0-10) for each of the following aspects of your performance. The percentage values in parentheses are weights that will be applied to calculate your overall grade for the course. | I will assign grades (usually a numeric value ranging from 0-10) for each of the following aspects of your performance. The percentage values in parentheses are weights that will be applied to calculate your overall grade for the course. | ||
Line 164: | Line 164: | ||
* Final project presentation: 10% | * Final project presentation: 10% | ||
* Final project paper: 40% | * Final project paper: 40% | ||
== Note on finding a dataset == | == Note on finding a dataset == | ||
Line 185: | Line 183: | ||
Class projects generally do not need IRB approval, but research for publications, dissertations, and sometimes even pilot studies generally fall under IRB purview. You should ''not'' plan to seek IRB approval/determination retroactively. If your study may involve human subjects and you may ever publish it in any form, you will need IRB oversight of some sort. | Class projects generally do not need IRB approval, but research for publications, dissertations, and sometimes even pilot studies generally fall under IRB purview. You should ''not'' plan to seek IRB approval/determination retroactively. If your study may involve human subjects and you may ever publish it in any form, you will need IRB oversight of some sort. | ||
Secondary analysis of anonymized data is generally not considered human subjects research, but I strongly suggest that you get a determination from [ | Secondary analysis of anonymized data is generally not considered human subjects research, but I strongly suggest that you get a determination from [LINK the Northwestern IRB] before you start. For work that is not considered human subjects research, this can often happen in a few hours or days. If you need to list a faculty sponsor or Principal Investigator, that should ideally be your advisor. If that doesn't make sense for some reason, please talk to me. | ||
== Structure of Class == | == Structure of Class == | ||
Line 205: | Line 203: | ||
When reading the schedule below, the following key might help resolve ambiguity: §n denotes chapter n; §n.x denotes section x of chapter; §n.x-y denotes sections x through y of chapter n. | When reading the schedule below, the following key might help resolve ambiguity: §n denotes chapter n; §n.x denotes section x of chapter; §n.x-y denotes sections x through y of chapter n. | ||
=== Week 1: | === Week 1: Tuesday January 3: Introduction, Setup, and Data and Variables === | ||
Please complete the readings prior to class so that we can discuss them and start talking through some of the examples in R together. | Please complete the readings prior to class so that we can discuss them and start talking through some of the examples in R together. | ||
Line 212: | Line 210: | ||
* Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) | * Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) | ||
* Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Available through | * Verzani: §1 (Getting Started), §2 (Univariate data) [[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch1_ch2.pdf Available with UWNetID]] | ||
* Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Available through UW libraries]] | |||
''' | '''Optional Readings:''' | ||
* Verzani: §A (Programming) | * Verzani: §A (Programming) | ||
'''Assignment (Complete | '''Assignment (Complete Before Class):''' | ||
* [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 1]] | * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 1]] | ||
''' | '''Lectures:''' | ||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_01-r_programming_intro-20170103.ogv Week 1 R lecture screencast (Part I): Introduction to R and univariate statistics] (~1 hour 47 minutes) | * [https://communitydata.cc/~mako/2017-COM521/com521-week_01-r_programming_intro-20170103.ogv Week 1 R lecture screencast (Part I): Introduction to R and univariate statistics] (~1 hour 47 minutes) | ||
* [https://communitydata.cc/~mako/2017-COM521/com521-week_01-github_rscripts-20170104.ogv Week 1 R lecture screencast (Part II): Setting up git/GitHub and saving files in RStudio] (~40 minutes) | * [https://communitydata.cc/~mako/2017-COM521/com521-week_01-github_rscripts-20170104.ogv Week 1 R lecture screencast (Part II): Setting up git/GitHub and saving files in RStudio] (~40 minutes) | ||
* [[Statistics and Statistical Programming ( | * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 1]] | ||
'''Resources:''' | '''Resources:''' | ||
Line 234: | Line 232: | ||
* [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 1]] | * [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 1]] | ||
=== Week 2: | === Week 2: Tuesday January 10: Probability and Visualization === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 257: | Line 255: | ||
* [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 2]] | * [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 2]] | ||
=== Week 3: | === Week 3: Tuesday January 17: Distributions === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 281: | Line 279: | ||
* [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 3]] | * [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 3]] | ||
=== Week 4: | === Week 4: Tuesday January 24: Statistical significance and hypothesis testing === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 303: | Line 301: | ||
* [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 4]] | * [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 4]] | ||
=== Week 5: | === Week 5: Tuesday January 31: Continuous Numeric Data & ANOVA === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 327: | Line 325: | ||
* [https://www.openintro.org/download.php?file=os3_slides_05&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §5 Lecture Notes] | * [https://www.openintro.org/download.php?file=os3_slides_05&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §5 Lecture Notes] | ||
=== Week 6: | === Week 6: Tuesday February 7: Categorical data === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 350: | Line 348: | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7 | ||
=== Week 7: | === Week 7: Tuesday February 14: Linear Regression === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 374: | Line 372: | ||
* [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7 and 3 videos on the sections §8.1-8.3 | * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7 and 3 videos on the sections §8.1-8.3 | ||
=== Week 8: | === Week 8: Tuesday February 21: Polynomial Terms, Interactions, and Logistic Regression === | ||
'''Required Readings:''' | '''Required Readings:''' | ||
Line 403: | Line 401: | ||
* I've written this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R] | * I've written this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R] | ||
=== Week 9: | === Week 9: Tuesday February 28: Consulting Meetings === | ||
We won't meet as a group. Instead, you will each meet on-on-one with me to work through challenges and issues with your analysis. | |||
=== Week 10: | === Week 10: Tuesday March 7: Consulting Meetings === | ||
We won't meet as a group. Instead, you will each meet on-on-one with me to work through challenges and issues with your analysis. | |||
== | === Week 11: March 14: Final Presentations === | ||
== Administrative Notes == | |||
=== Attendance === | === Attendance === | ||
As detailed in [https://mako.cc/teaching/assessment.html my page on assessment], attendance in class is expected of all participants. If you need to miss class for any reason, please contact me ahead of time (email is best). Multiple unexplained absences will likely result in a lower grade or (in extreme circumstances) a failing grade. In the event of an absence, you are responsible for obtaining class notes, handouts, assignments, etc. | |||
=== | === Office Hours === | ||
I will not hold regular office hours. In general, I will be available to meet after class. Please contact me on email to arrange a meeting then or at another time. | |||
=== Accommodations === | === Accommodations === | ||
In general, if you have an issue, such as needing an accommodation for a religious obligation or learning disability, speak with me before it affects your performance; afterward it is too late. Do not ask for favors; instead, offer proposals that show initiative and a willingness to work. | |||
To request academic accommodations due to a disability please contact Disability Resources for Students, 448 Schmitz, 206-543-8924/V, 206-5430-8925/TTY. If you have a letter from Disability Resources for Students indicating that you have a disability that requires academic accommodations, please present the letter to me so we can discuss the accommodations that you might need for the class. I am happy to work with you to maximize your learning experience. | |||
=== | === Academic Misconduct === | ||
I | I am committed to upholding the academic standards of the University of Washington’s Student Conduct Code. If I suspect a student violation of that code, I will first engage in a conversation with that student about my concerns. | ||
If we cannot successfully resolve a suspected case of academic misconduct through our conversations, I will refer the situation to the department of communication advising office who can then work with the COM Chair to seek further input and if necessary, move the case up through the College. | |||
While evidence of academic misconduct may result in a lower grade, I will not unilaterally lower a grade without addressing the issue with you first through the process outlined above. | |||
=== Credit and Notes === | === Credit and Notes === | ||
This syllabus has, in ways that should be obvious, borrowed and built on the [https://www.openintro.org/stat/index.php OpenInto Statistics curriculum]. I also | This syllabus has, in ways that should be obvious, borrowed and built on the [https://www.openintro.org/stat/index.php OpenInto Statistics curriculum]. In the sense that he used the same two textbooks, I also drew some inspiration and confidence from Tom S. Clark's [http://www.tomclarkphd.com/teaching/POLS508F14.pdf syllabus for POLS 508: Data Analysis in Fall 2014]. |