Statistics and Statistical Programming (Winter 2021)
- Statistical Methods in Communication
- Introductory Statistics and Statistical Programming
- COM 521 — Department of Communication, University of Washington
- Description in Course Catalog
- Reviews the steps taken in social scientific research on communication, with emphasis on the conceptualization, operationalization, and analysis of quantifiable variables. Highlights understanding of computer application of univariate and bivariate statistics, focusing on both parametric and nonparametric tests.
- Mondays and Wednesday, 9:30-11:20am
- Course websites
- Benjamin Mako Hill (firstname.lastname@example.org)
- Office Hours: By appointment (I'm usually available via chat during "business hours.") You can view out my calendar and/or put yourself on it.
Overview and learning objectives
This course provides a get-your-hands-dirty introduction to inferential statistics and statistical programming for applications in communication research. My main objectives are for all participants to acquire the conceptual, technical, and practical skills to conduct your own statistical analyses and become more sophisticated consumers of quantitative research in communication, human computer interaction (HCI), and adjacent disciplines.
I will consider the course a complete success if every student is able to do all of the following things at the end of the quarter:
- Design and execute a quantitative research project that involves statistical inference—from start to finish.
- Read, modify, and create short programs in the R statistical programming language.
- Feel comfortable reading and interpreting papers that use basic statistical techniques.
- Feel prepared to enroll in more specialized and advanced statistics courses and proceed onward on the Statistics Concentration in the Communication for the MA/PhD program offered between the Department of Communication and Center for Statistics and the Social Sciences.
There will be readings on conceptualization and operationalization in quantitative research although these will overlap with reading in COM 501. The course will focus on a number of techniques, including the following: t-tests; chi-squared tests; ANOVA; linear regression; and logistic regression. We will also consider salient issues in quantitative research such as reproducibility and "the statistical crisis in science." We may cover other topics as time and interest allow.
The course materials will consist of readings, problem sets, assessment exercises, and recorded lectures and screencasts (some created by me, some created by other people). The course requirements will emphasize active participation, self-evaluation, and will include a final project focused on the design and execution of an original piece of quantitative research. We will use the R programming language for all examples and assignments.
You are not required to have any prior training in statistics or statistical programming to take this class. I will assume some (very little!) knowledge of the basics of empirical research methods and design, basic algebra and arithmetic, and a willingness to work to learn the rest. In general, we are not going to cover most of the math behind the techniques we'll be learning. Although we may do some math, this is not a math class. This course will also not require knowledge of calculus or matrix algebra. I will *not* do proofs on the board. Instead, the class is unapologetically focused on the application of statistical methods. Likewise, while some exposure to R, other programming languages, or other statistical computing resources will be helpful, but nothing it is not assumed.
Why statistical programming? Why R?
Some courses in statistics and quantitative methods do not emphasize statistical programming and rely on point-and-click tools like SPSS instead. Why bother learning R?
By learning statistical programming you will gain a deeper understanding of both the principles behind your analysis techniques as well as the tools you use to apply those techniques. In addition, a solid grasp of statistical programming will prepare you to create reproducible research, avoid common errors, and enable both greater durability and validity of your work.
Other programming languages are also well suited to statistics, including Stata and Python. Ultimately, I'm teaching R because R is ascendant (i.e., it is increasing and is well on its way to "taking over") and there was consensus among the faculty in the department who were likely to teach statistics classes in the future that this made the most sense. I also do quite a lot of my own statistical work with R, so that also guides my choice. That said, I opt to use and teach with R for a few reasons:
- R is freely available and open source.
- R is the most widely used package in statistics and several social scientific fields.
- R (along with Stata) will be used in most of the advanced stats classes I hope you will take after this course.
- R is better general purpose programming language than Stata which means that R programming skills will let you solve non-statistical problems and may make it easier to learn other programming languages like Python.
For students with a strong psychometric focus or whose research will be limited to linear and logistic regression or ANOVA on small pre-collected datasets and similar, SPSS will likely be fine. R has a higher barrier to entry than SPSS but it's ceiling is much higher.
Note About This Syllabus
You should expect this syllabus to be a dynamic document. Although the core expectations for this class are fixed, the details of readings and assignments will shift based on how the class goes, guest speakers that I might arrange, my own readings in this area, etc. As a result, there are three important things to keep in mind:
- Although details on this syllabus will change, I will try to ensure that I never change readings more than six days before they are due. We will send an announcement no later than before we go to sleep each Tuesday evening that fixes the schedule for the next week. This means that if I don't fill in a reading marked "[To Be Decided]" or "[Forthcoming]" six days before it's due, it is dropped. If we don't change something marked "[Tentative]" before the deadline, then it is assigned. This also means that if you plan to read more than six days ahead, contact the teaching team first.
- Because this syllabus a wiki, you will be able to track every change by clicking the history button on this page when I make changes. I will summarize these changes in the weekly an announcement on Canvas sent that will be emailed to everybody in the class. Closely monitor your email or the announcements section on the course website on Canvas to make sure you don't miss these announcements.
- I will ask the class for voluntary anonymous feedback frequently — especially toward the beginning of the quarter. Please let me know what is working and what can be improved. In the past, I have made many adjustments to courses that I teach while the quarter progressed based on this feedback.
- Many readings are marked as "[Available through UW libraries]". Most of these will be accessible to anybody who connects from the UW network. This means that if you're on campus, it will likely work. Although you can go through the UW libraries website to get most of these, the easiest way to get things using the UW library proxy bookmarklet. This is a little button you can drag-and-drop onto your bookmarks toolbar on your browser. When you press the button, it will ask you to log in using your UW NetID and then will automatically send your traffic through UW libraries. You can also use the other tools on this UW libraries webpage.
Class format and structure
This course will proceed in a remote format that includes asynchronous and synchronous elements (more on those below). In general, the organization of the course adopts a "flipped" approach where participants consume, discuss, and process instructional materials outside of "class" and we use synchronous meetings to answer questions, address challenges or concerns, work through solutions, and hold semi-structured discussions.
The course introduces both basic statistical concepts as well as applications of those concepts through statistical programming. As a result, we will usually dedicate part of each week to a particular set of concepts and part of each week to applied data analysis and/or interpretation.
Asynchronous elements of the course
These include all readings, recorded lectures/slides, tutorials, textbook exercises, problem sets, and other assignments. I expect you to complete—or put a good effort into attempting to complete so you can share your progress—these problem sets outside of our class meeting times. I also strongly encourage you to identify, submit, and discuss questions about the material before each class meeting whenever possible.
We will use Discord for everyday discussions and chat related to the course. In general, I will try to keep an eye on the various server channels during "business hours." To the extent that I can respond to questions and concerns there, I'll do so. I strongly benefit that you raise issues in the "public" channels so that your classmates can answer the questions if they are struggling with similar issues.
We'll also use the discussion channels to identify topics that might benefit from conversation during synchronous course meetings. Hopefully, writing and talking about questions and concerns outside of our formal course meetings will help support accountability, learning, and more effective use of our limited time together.
For nearly all of the "instructional" material introducing particular statistical concepts and techniques, you are assigned materials from the OpenIntro textbook and lecture materials created by the textbook authors. Please note that this means I will not deliver any formal lectures during our class meetings. Please also note that this means you are responsible for coordinating any collaborative work with other members of the class outside of our class meeting times.
Synchronous elements of the course
The synchronous elements of the course will be the two weekly class meetings that will happen via video conference in Discord on the "Classroom Voice" channel on the course Discord. These are scheduled to run for a maximum of 110 minutes. I plan to use the entire time.
We will use the class meetings to discuss and work through any questions or challenges you encounter in the materials assigned for that day. This means that I encourage you to identify, submit, and discuss questions about the material before each class meeting over Discord whenever possible. Doing so will give me time to sift, sort, and organize your questions into a plan for each class session that is tailored to the specific concerns you have encountered in the material. Obviously, questions will arise during the class sessions too as well and we'll do our best to adapt as we go.
A couple of other notes about the synchronous course meetings:
- I plan to record the course meetings and have them available to class participants in an access-control-restricted fashion. Please get in touch if you have concerns or requests about this.
- I will do my best to notice and respond to any questions or comments that come up via Discord or Zoom during the class. Please do what you can to support these efforts.
- You might want to create/acquire something like NU Mechanical Engineering Professor Michael Peshkin's homebrew document camera to facilitate sharing hand-written notes/drawings during class.
In addition, because randomness is extremely important in statistics, I plan to randomly call on students to share and discuss their solutions to selected textbook exercises or problem set questions during class. The idea here is to structure some participation in the synchronous sessions to ensure an equitable distribution of the responsibility to discuss questions, answers, points of confusion, and alternatives.
Although the day-to-day routine will vary, class session will include some combination of the following:
- Quick updates about assignments, projects, and meta-discussion about the class.
- Discussion of programming challenges due that day (and related to the previous week's R lecture materials).
- Discussion of statistics questions related to new material in we've covered.
Textbook, readings, and resources
This class will use a freely-licensed textbook:
- Diez, David M., Christopher D. Barr, and Mine Çetinkaya-Rundel. 2019. OpenIntro Statistics. 4th edition. OpenIntro, Inc. [Available free online]
The texbook (in any format) is required for the course. You can download it at no cost and purchase hard copy versions in either full color ($60) or in black and white ($20). The B&W version is very affordable and I strongly recommend buying a hard copy for the purposes of the course and subsequent reference use. The book is excellent and has been adopted widely. It has also developed a large online community of students and teachers who have shared other resources. Lecture slides, videos, notes, and more are all freely licensed (many through the website and others elsewhere).
I will also assigning several chapters from the following:
- Reinhart, Alex. 2015. Statistics Done Wrong: The Woefully Complete Guide. SF, CA: No Starch Press. [Available from UW libraries]
This book provides a readable conceptual introduction to some common failures in statistical analysis that you should learn to recognize and avoid. It was also written by a Ph.D. student. You have access to an electronic copy via the UW libraries (you'll need to sign-in and/or use the UW VPN to access it), but you may find it helpful to purchase as well.
A few other books may be useful resources while you're learning to analyze, visualize, and interpret statistical data with R. I will share some advice about these during the first class meeting:
- Babbie, Earl R. 2015. The Practice of Social Research, 14th edition. Boston, MA: Cengage Learning. [Chapters will be made available in Canvas]
- Healy, Kieran. 2019. Data Visualization: A Practical Introduction. Princeton, NJ: Princeton University Press.[Available free online]
- Teetor, Paul. 2011. R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics. 1 edition. Sebastopol, CA: O’Reilly Media. [Available from UW libraries]; [Available for purchase through various sources]; [Available for purchase through Amazon]
- Verzani, John. 2014. Using R for Introductory Statistics, Second Edition. 2 edition. Boca Raton: Chapman and Hall/CRC. [Available for purchase through various sources]; [Available for purchase through Amazon]
- Wickham, Hadley. 2010. ggplot2: Elegant Graphics for Data Analysis. 1st ed. 2009. Corr. 3rd printing 2010 edition. New York: Springer. [Available from UW libraries]; [Available for purchase through various sources]
- Wickham, Hadly and Grolemund, Garret. 2017. R for Data Science. Sebastopol, CA: O'Reilly. [Available free online]
There are also some invaluable non-textbook resources:
- Baggott's R Reference Card v2 — Print this out. Take it with you everywhere and look at it dozens of times a day. You will learn the language faster!
- StackOverflow R Tag — Somebody already had your question about how to do X in R. They asked it, and several people have answered it, on StackOverflow. Learning to read this effectively will take time but as build up some basic familiarity with R and with StackOverflow, it will get easier. I promise.
- Rseek — Rseek is a modified version of Google that just searches R websites online. Sometimes, R is hard to search because R is a common letter. This has become much easier over time as R has become more popular, but it can still be an issue sometimes and Rseek is a good solution.
- ggplot2 documentation — ggplot is a powerful data visualization package for R that I recommend highly. The documentation is indispensable for learning how to use it.
- Statistical Analysis and Reporting in R — A set of resources created and distributed by Jacob Wobbrock (University of Washington, School of Information) in conjunction with a MOOC he teaches. Contains cheatsheets, code snippets, and data to help execute commonly encountered statistical procedures in R.
- DataCamp offers introductory R courses. Northwestern usually has some free accounts that get passed out via Research Data Services each quarter. Apparently, if you are taking or teaching relevant coursework, instructors can request free access to DataCamp for their courses from DataCamp. If folks are interested in this, I can reach out.
- Statistics symbols you need to know which is just what it says on the tin. Thanks Kate Rich!
There are two types of assignments in the course: (a) problem sets that we will discuss during each class session; and (b) a large course project.
In order to support continuous progress towards the learning goals for the course, I have assigned problem sets for each class. These problem sets include some textbook exercises, some programming challenges, and some other questions.
Problem sets and these may incorporate several kinds of questions:
- Statistics questions about statistical concepts and principles.
- Programming challenges that you should solve using R.
- Empirical paper questions about other assigned readings.
Although you will never hand these in to be graded, I will randomly call on students to share your answers to these questions and I will assess your preparedness after every single class meeting. I will not grade you on whether you get these answers correct or incorrect. Although the problem sets will not be assigned a letter grade, they are the central focus of the course and completing them will support your mastery of the material in multiple ways. These assignments will provide the basis on which I will assess and provide feedback on your participation and engagement with the course material.
For the programming challenges, be ready to share code and text for your solutions via screen share. If you get completely stuck on a problem, that's okay, but be ready to provide whatever you have and describe what tripped you up. In general, we will cover the problem sets in the first session of the week and the textbook materials in the second session.
As a demonstration of your learning in this course, you will design and carry out a quantitative research project, start to finish. This means you will all:
- Design and describe a plan for a study — The study you design should involve quantitative analysis and should be something you can complete at least a first pass on during this quarter.
- Find a dataset — Very quickly, you should identify a dataset you will use to complete this project. For most of you, I suspect you will be engaging in secondary data analysis or a analysis of a previously collected dataset.
- Engage in descriptive data analysis — Use R to calculate descriptive statistics and visualizations to describe your data.
- Motivate and test at least one hypothesis about relationships between two or more variables — I'm happy to discuss alternatives to formal hypothesis testing procedures (even if some of them are beyond the scope of this course).
- Report and interpret your findings — You will do this in both a short paper and a short (recorded) presentation.
- Ensure that your work is replicable — You will need to provide code and data for your analysis in a way that makes your work replicable by other researchers.
I strongly urge you to produce a project that will further your academic career outside of the class. There are many ways that this can happen. Some obvious options are to prepare a project that you can submit for publication, use as pilot analysis that you can report in a grant or thesis proposal, and/or use to fulfill a degree requirement. The last time I taught a statistical course, a majority of students in the class used their course projects either to satisfy a general examination requirement, as a published paper, or both.
There are several intermediate milestones, deliverables, and deadlines to help you accomplish a successful research project. Unless otherwise noted, all deliverables should be submitted via Canvas at 11:59pm Seattle time on the day they are due.
Research project plan and dataset identification
- Due date
- Friday January 15, 2021
- Maximum length
- 500 words (~1-2 pages)
Very early on, I want you to identify and describe your final project. Your description should be short and can be either paragraphs or bullets. It should include the following:
- An abstract of the proposed study including the topic, research question, theoretical motivation, object(s) of study, and anticipated research contribution.
- An identification of the dataset you will use and a description of the rows and columns or type(s) of data it will include. If you do not currently have access to these data, explain why and when you will.
- A short (several sentences?) description of how the project will fit into your career trajectory.
Notes on finding a dataset
In order to complete your final project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, fear not! There are many datasets to draw from. Some ideas are below (please suggest others, provide updated links, or report problems). The teaching team will also be available to help you brainstorm/find resources if needed:
- Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze?
- If there's an important study you loved, you can send a polite email to the author(s) asking if they are willing and able to share an archival or replication version of the dataset used in their paper. Be very polite and make it clear that this is starting as a class project, but that it might turn into a paper for publication. Make your timeline clear. In Communication and HCI, replication datasets are still very rare, so be prepared for a negative answer and/or questions about your motives in conducting the analysis.
- Do some Google Scholar and normal internet searching for datasets in your research area. You'll probably be surprised at what's available.
- Take a look at datasets available in the Harvard Dataverse (a very large collection of social science research data) or one of the other members of the Dataverse network.
- Look at the collection of social scientific datasets at ICPSR at the University of Michigan (UW is a member). There are an enormous number of very rich datasets.
- Use the ISA Explorer to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences.
- The City of Seattle has one of the best data portal sites of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring.
- FiveThirtyEight.com has published a GitHub repository and an R package with pre-processed and cleaned versions of many of the datasets they use for articles published on their website.
- If you interested in studying online communities, there are some great resources for accessing data from Reddit, Wikipedia, and StackExchange. See pushshift for dumps of Reddit data, here for an overview of Wikipedia's data resources, and Stack Exchange's data portal.
- The NY Times is publishing a COVID-19 data repository that includes county-level metrics for deaths, mask usage, and other pandemic-related data. The release a lot of it as frequently updated .csv files and the repository includes documentation of the measurements, data collection details, and more.
- The Community Data Science Collective and colleagues have created a COVID-19 digital observatory (hosted in part right here on this wiki!) that publishes a bunch of pandemic-related data as csv and json files.
- The Stanford Open Policing project has published a huge archive of policing data related mostly to traffic stops in states and many cities of the U.S. We'll use at least one of these files for a problem set.
Research project planning document
- Due date
- February 12, 2021
- Suggested length
- ~5 pages
The project planning document is a shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) (Null) hypotheses; (d) Conceptual diagram and explanation of the relationship(s) you plan to test; (e) Measures; (f) Dummy tables/figures; (g) anticipated finding(s) and research contribution(s). Longer descriptions of each of these planning document sections (as well as a few others) can be found on this wiki page.
I will also provide example planning documents via our Canvas site:
- One by public health researcher Mika Matsuzaki. The first planning document I ever saw and still one of the best. It's missing a measures section. It's also focused on a research context that is probably very different from yours, but try not to get bogged down by that and imagine how you might map the structure of the document to your own work.
- [One by Jim Maddock] created as part of a qualifying exam early in 2019. Jim doesn't provide dummy tables or anticipated findings/contributions, but he has an especially phenomenal explanation of the conceptual relationships and processes he wants to test. [Forthcoming]
- [One provided as an appendix to Gerber and Green's excellent textbook, Field Experiments: Design, Analysis, and Interpretation (FEDAI)]. It's over-detailed and over-long for the purposes of this assignment, but nevertheless an exemplary approach to planning empirical quantitative research in a careful, intentional way that is worthy of imitation. [Forthcoming]
Research project presentation
- Presentation due date
- March 10, 2021
- Maximum length
- 15 minutes
You will also create and record a short presentation of your final project. The presentation will provide an opportunity to share a brief overview of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [Creating a Successful Scholarly Presentation] (file posted to Canvas) may be useful. [Forthcoming]
Additional details about the presentation goals, format suggestions, resources, and more will be provided later in the quarter.
Research project paper
- Paper due date
- March 19, 2021
- Maximum length
- 6000 words (~20 pages)
I expect you to produce a short, high quality research paper that you might revise, extend, and submit for publication and/or a dissertation milestone like a methods general examination. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the structure of a quantitative empirical research paper.
As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution.
Because the emphasis in this class is on statistics and methods and because I'm probably not an expert in the substance of your research domain, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work.
I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me before you attempt to pursue a collaborative final paper.
I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, your paper must follow a standard format (e.g., APA 6th edition (Word and LaTeX templates) or ACM SIGCHI CSCW format) that is applicable for a peer-reviewed journal or conference proceedings in which you might aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software like Zotero to handle your bibliographic sources.
Human subjects research, IRB, and ethics
In general, you are responsible for making sure that you're on the right side of the IRB requirements and that your work meets applicable ethical norms and standards.
Class projects generally do not need IRB approval, but research for publications, dissertations, and sometimes even pilot studies do fall under IRB purview. You should not plan to seek IRB approval/determination retroactively. If your study may involve human subjects and you may ever publish it in any form, you will need IRB oversight of some sort.
Secondary analysis of anonymized data is generally not considered human subjects research, but I strongly suggest that you get a determination from Human Subjects Division (the UW IRB) before you start. For work that is not considered human subjects research, this can often happen in a few hours or days. If you need to list a faculty sponsor or Principal Investigator, that should ideally be your advisor. If that doesn't make sense for some reason, please talk to me.
Research ethics are broad and complex topic. We'll talk about issues related to ethics and quantitative empirical research a bit more during class, but will likely only scratch the surface. I strongly encourage you to pursue further reading, conversation, coursework, and reflection as you consider how to understand and apply ethical principles in the context of your own research and teaching.
Grading and assessment
I will assign grades (typically on the UW 4.0 grade scale) for each of the following aspects of your performance. The percentage values in parentheses are weights that will be applied to calculate your overall grade for the course.
- Problem set discussion: 40%
- Project identification: 5%
- Final project planning document: 5%
- Final project presentation: 15%
- Final project paper: 35%
I will jointly and holistically evaluate your participation in problem set discussions along four dimensions: participation, preparation, engagement, and contribution. These are quite similar to the dimensions described in the "Participation Rubric" section of my assessment page. Exceptional participation means excelling along all four dimensions. Please note that participation ≠ talking/typing more and I encourage all of us to seek balance in our discussions.
My assessment of your final project proposal, planning document, presentation, and paper will reflect the clarity of the work, the effective execution and presentation of quantitative empirical analysis, as well as the quality and originality of the analysis. Throughout the quarter, we will talk about the qualities of exemplary quantitative research. In general, I expect your final project to embody these exemplary qualities.
When reading the schedule below, the following key might help resolve ambiguity: §n denotes chapter n; §n.x denotes section x of chapter n; §n.x-y denotes sections x through y (inclusive) of chapter n.
The required and recommended tasks are meant to be completed before class and will typically be necessary to complete the problem sets for each day.
Day 1: Monday January 4: Intro and setup
- Read this syllabus, discuss any questions/concerns with the teaching team.
- Confirm course registration and access to the textbook (pdf download available for $0 and b&w paperbacks for $20) as well as any software and web-services you'll need for course (Discord, Canvas, this wiki, R, RStudio). Discord invites will be sent via email.
Day 2: Wednesday January 6: Data and R
Required readings and resources:
- Read Diez, Çetinkaya-Rundel, and Barr: §1.1-1.3 (Introduction to data)
Recommended readings and resources:
- Complete Problem set 2: exercises from OpenIntro §1: (1.6, 1.9, 1.10, 1.16, 1.21, 1.40, 1.42, 1.43). Remember that solutions to odd-numbered problems are in the book!
- Problem set 2 worked solutions [HTML, RMarkdown, PDF]
Day 3: Monday January 11: Numerical and categorical data
- Read Diez, Çetinkaya-Rundel, and Barr: §2.1-2 (Numerical and categorical data).
- The R tutorial webcast and RMarkdown tutorial that I've put together including:
- Watch Lecture materials for §2.1 and §2.2 (Videos 6-7 in the playlist).
- Watch COM520 R Tutorial #1 Screencast on Panopto
- Watch COM520 R Tutorial #2 Screencast on Panopto
- If you want additional material that will provide an introductions to R, these are great resources:
- Complete /Problem set 3 (OpenIntro questions & programming challenges)
- Problem set 3 worked solutions [HTML, RMarkdown, PDF]
Day 4: Wednesday January 13: Applied data manipulation
- Additional material from any of the recommended R learning resources suggested last week or elsewhere in the syllabus. In particular, you may find the ModernDive, RYouWithMe, Healy, and/or Wickham and Grolemund resources valuable.
- Complete /Problem set 4 (programming challenges and statistical questions)
- Problem set 4 worked solutions [HTML, RMarkdown, PDF]
NO CLASS: Monday January 18: Martin Luther King Jr Day
Day 5: Wednesday Janaury 20: Probability and R fundamentals
- Read Diez, Çetinkaya-Rundel, and Barr: §3 (Probability).
- COM520 R Tutorial #4: Additional R fundamentals [HTML, RMarkdown, PDF]
- Watch Probability introduction and Probability trees OpenIntro lectures (just videos 1 and 2 in the playlist).
- Watch COM520 R Tutorial #4.1 Screencast on Panopto
- Watch COM520 R Tutorial #4.2 Screencast on Panopto
- Complete /Problem set 5 (OpenIntro excercises & programming challenges)
- Problem set 5 worked solutions [HTML, RMarkdown, PDF]
Day 6: Monday January 25: Distributions
- Read Diez, Çetinkaya-Rundel, and Barr: §4.1-3 (Normal and binomial distributions).
- Watch normal and binomial distributions OpenIntro lectures (videos 1-3 in the playlist)
- Seeing Theory §3 (Probability distributions)
- Go back and complete any questions from /Problem set 5 that you were not able to get last time.
- Complete Problem set 6: exercises from OpenIntro §4: 4.4, 4.6, 4.15, 4.22
Day 7: Wednesday January 27: Descriptive analysis and visualization
- Complete /Problem set 7
Day 8: Monday February 1: Foundations for inference
- Read Diez, Çetinkaya-Rundel, and Barr: §5 (Foundations for inference).
- Complete Why .05? OpenIntro video/exercise.
- Read Kelly M., Emily Dickinson and monkeys on the stair Or: What is the significance of the 5% significance level? Significance 10:5. 2013.
- Seeing Theory §4 (Frequentist Inference)
- Watch foundations for inference (videos 1-3 in the playlist) OpenIntro lectures.
- Complete Problem set 8: exercises from OpenIntro §5: 5.4, 5.8, 5.10, 5.17, 5.30, 5.35, 5.36
Day 9: Wednesday February 3: Reinforced foundations for inference
- COM520 R Tutorial #6: Distributions in R and more [HTML, RMarkdown, PDF, Screencast]
- Read Reinhart, §1. [Available from UW libraries]
- Read the following paper (it will be familiar to those of you in COM501): Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” Proceedings of the National Academy of Sciences 111 (24): 8788–90. https://doi.org/10.1073/pnas.1320040111. [Available from UW libraries]
- Complete /Problem set 9
Day 10: Monday February 8: Inference for categorical data
- Read Diez, Çetinkaya-Rundel, and Barr: §6 (Inference for categorical data).
- Watch inference for categorical data (videos 1-3 in the playlist) OpenIntro lectures.
- OpenIntro Central limit theorem for proportions demo.
- Complete Problem set 10: exercises from OpenIntro §6: 6.10, 6.16, 6.22, 6.30, 6.40 (just parts a and b; part c gets tedious)
Day 11: Wednesday February 10: Applied inference for categorical data
- COM520 R Tutorial #7: Categorical data [HTML, RMarkdown, PDF]
- Read Reinhart, §4 and §5 (both are quite short).
- Skim the following (all are referenced in the problem set)
- Aronow PM, Karlan D, Pinson LE. (2018). The effect of images of Michelle Obama’s face on trick-or-treaters’ dietary choices: A randomized control trial. PLoS ONE 13(1): e0189693. https://doi.org/10.1371/journal.pone.0189693. [Available free online]
- Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in Proceedings of the 8th ACM Conference on Designing Interactive Systems. Aarhus, Denmark: ACM. [Available free online]
- Complete /Problem set 11
NO CLASS: Monday February 15: Presidents' Day
Day 12: Wednesday February 17: Inference for numerical data (t-tests and ANOVA)
- Read Diez, Çetinkaya-Rundel, and Barr: §7.1-5 (Inference for numerical data: differences of means; power calculations, ANOVA, and multiple comparisons).
- OpenIntro supplement on ANOVA calculations (particularly useful if you think you'll be doing more ANOVAs).
- Watch inference for numerical data (videos 1-8 in the playlist) OpenIntro lectures (and featuring one of the textbook authors!).
- COM520 R Tutorial #8: t-tests and ANOVA [HTML, RMarkdown, PDF, Screencast]
- Complete /Problem set 12
Day 13: Monday February 22: Linear regression
- Read Diez, Çetinkaya-Rundel, and Barr: §8 (Linear regression).
- Read More inference for linear regression (OpenIntro supplement).
- Watch linear regression (videos 1-4 in the playlist) OpenIntro lectures.
- Read Seeing Theory §6 (Regression analysis)
- Complete /Problem set 13
Day 14: Wednesday February 24: Applied linear regression
- Complete /Problem set 14
Day 15: Monday March 1: Multiple and logistic regression
- Read Diez, Çetinkaya-Rundel, and Barr: §9 (Multiple and logistic regression). (Skim §9.2-9.4)
- Disclaimer: Mako doesn't like §9.2-9.3, but it should be useful to understand and discuss them, so we'll do that.
- Read Interaction terms (OpenIntro supplement).
- Read Fitting models for non-linear trends (OpenIntro supplement).
- Watch multiple and logistic regression (videos 1-4 in the playlist) OpenIntro lectures.
- Complete Problem set 16: exercises from OpenIntro §9: 9.4, 9.13, 9.16, 9.18,
Day 16: Wednesday March 3: Applied multiple and logistic regression
- Complete /Problem set 16
Day 17: Monday March 8: Consulting Day
We'll forgo meeting as a group. Instead, I will meet one-on-one with each of you to work through challenges you're having with your own projects.
- COM520 R Tutorial #11: Bonus material [Forthcoming]
Day 18: Wednesday March 10: Final Presentations
Post your video via this "Discussion" on Canvas] [Forthcoming] — Please view and provide constructive feedback on other's videos!
- Post videos directly to the "Discussion." The Canvas text editor has an option to upload/record a video. That's what you want.
- Please remember not to over-work/think this. I mentioned this in class, but just to reiterate, the focus of this assignment should not be your video editing skills. Please do what you can to record and convey your ideas clearly without devoting insane hours to creating the perfect video.
- Some resources for recording presentations: There are a bunch of ways you might record/share your video. Some ideas include using the embedded media recorder in Canvas (!) that can record with with your webcam (maybe attach a few visuals to accompany this?); recording a "meeting" with yourself in Zoom; and "Panopto," a piece of high-end video recording, sharing, and editing software that UW licenses for campus use. Here are some pointers:
- You should be able to use your UW zoom account to create a zoom meeting, record your meeting (in which you deliver your presentation and share your screen with any visuals), and then share a link to the recording via the "Recordings" item in the left-hand menu of your https://northwestern.zoom.us/ account page.
- If nothing works, please get in touch.
Teaching and learning in a pandemic
The COVID-19 pandemic will impact this course in various ways, some of them obvious and tangible and others harder to pin down. On the obvious and tangible front, we have things like a mix of remote, synchronous, and asynchronous instruction and the fact that many of us will not be anywhere near campus or each other this year. These will reshape our collective "classroom" experience in major ways.
On the "harder to pin down" side, many of us may experience elevated levels of exhaustion, stress, uncertainty and distraction. We may need to provide unexpected support to family, friends, or others in our communities. I have personally experienced all of these things at various times over the past six months and I expect that some of you have too. It is a difficult time.
I believe it is important to acknowledge these realities of the situation and create the space to discuss and process them in the context of our class throughout the quarter. As your instructor and colleague, I commit to do my best to approach the course in an adaptive, generous, and empathetic way. I will try to be transparent and direct with you throughout—both with respect to the course material as well as the pandemic and the university's evolving response to it. I ask that you try to extend a similar attitude towards everyone in the course. When you have questions, feedback, or concerns, please try to share them in an appropriate way. If you require accommodations of any kind at any time (directly related to the pandemic or not), please contact the teaching team.
- This text is borrowed and adapted from Aaron Shaw's statistics course.
Expectations for synchronous remote sessions
The following are some baseline expectations for our synchronous remote class sessions. I expect that these can and will evolve. Please feel free to ask questions, suggest changes, or raise concerns during the quarter. I welcome all input:
- All members of the class are expected to create a supportive and welcoming environment that is respectful of the conditions under which we are participating in this class.
- All members of the class are expected to take reasonable steps to create an effective teaching/learning environment for themselves and others.
And here are suggested protocols for any video/audio portions of our class:
- Please mute your microphone whenever you're not speaking and learn to use "push-to-talk" if/when possible (Discord supports the feature).
- Video is optional for all students at all times, although if you're willing/able to keep the instructor company in the video channel, it would be nice.
- If you need to excuse yourself at any time and for any reason you may do so.
- Children, family, pets, roommates, and others with whom you may share your workspace are welcome to join our class as needed.
Statistics and power
The subject matter of this course—statistics and statistical programming—has historical and present-day affinities with a variety of oppressive ideologies and projects, including white supremacy, discrimination on the basis of gender and sexuality, state violence, genocide, and colonialism. It has also been used to challenge and undermine these projects in various ways. I will work throughout the quarter to acknowledge and represent these legacies accurately, at the same time as I also strive to advance equity, inclusion, and justice through my teaching practice, the selection of curricular materials, and the cultivation of an inclusive classroom environment.
Your Presence in Class
As detailed in section on assignments and in my detailed page on assessment, your homework in the class is to prepare for discussion of problem sets which means that presence is an important way that I will assess learning. Obviously, you must be in class in order to participate. In the event of an absence, you are responsible for obtaining class notes, handouts, assignments, etc.
Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW’s policy, including more information about how to request an accommodation, is available at Religious Accommodations Policy. Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form.
The University of Washington Student Conduct Code (WAC 478-121) defines prohibited academic and behavioral conduct and describes how the University holds students accountable as they pursue their academic goals. Allegations of misconduct by students may be referred to the appropriate campus office for investigation and resolution. More information can be found online at https://www.washington.edu/studentconduct/ Safety
Call SafeCampus at 206-685-7233 anytime–no matter where you work or study–to anonymously discuss safety and well-being concerns for yourself or others. SafeCampus’s team of caring professionals will provide individualized support, while discussing short- and long-term solutions and connecting you with additional resources when requested.
The University takes academic integrity very seriously. Behaving with integrity is part of our responsibility to our shared learning community. If you’re uncertain about if something is academic misconduct, ask us. We are willing to discuss questions you might have.
Acts of academic misconduct may include but are not limited to:
- Cheating (working collaboratively on quizzes/exams and discussion submissions, sharing answers and previewing quizzes/exams)
- Plagiarism (representing the work of others as your own without giving appropriate credit to the original author(s))
- Unauthorized collaboration (working with each other on assignments)
Concerns about these or other behaviors prohibited by the Student Conduct Code will be referred for investigation and adjudication by the College’s Director of Community Standards and Student Conduct.
If you have already established accommodations with Disability Resources for Students (DRS), please communicate your approved accommodations to uw at your earliest convenience so we can discuss your needs in this course.
If you have not yet established services through DRS, but have a temporary health condition or permanent disability that requires accommodations (conditions include but not limited to; mental health, attention-related, learning, vision, hearing, physical or health impacts), you are welcome to contact DRS at 206-543-8924 or email@example.com or disability.uw.edu. DRS offers resources and coordinates reasonable accommodations for students with disabilities and/or temporary health conditions. Reasonable accommodations are established through an interactive process between you, your instructor(s) and DRS. It is the policy and practice of the University of Washington to create inclusive and accessible learning environments consistent with federal and state law.
Other Student Support
Any student who has difficulty affording groceries or accessing sufficient food to eat every day, or who lacks a safe and stable place to live, and believes this may affect their performance in the course, is urged to contact the graduate program advisor for support. Furthermore, please notify the professors if you are comfortable in doing so. This will enable us to provide any resources that we may possess (adapted from Sara Goldrick-Rab). Please also note the student food pantry, Any Hungry Husky at the ECC.
Students should ensure that they can access all Internet resources required for this course reliably and safely before registering for this course. Participation in this course requires students to access Internet resources that may not be accessible directly in some places outside of the UW campus. Specifically, students in this course will need to access UW resources including Canvas, UW Libraries which require users to login with a UW NetID, and some external resources such as Zoom, Google Docs, YouTube, and/or eBook websites. For students who are off-campus and are in a situation where direct access to these required resources is not possible, UW IT recommends that students use the official UW VPN, called Husky OnNet VPN (see instructions below). However, students who are outside the US while taking this course should be aware that they may be subject to laws, policies and/or technological systems which restrict the use of any VPNs. UW does not guarantee students’ access to UW resources when students are off-campus, and students are responsible for their own compliance with all laws regarding the use of Husky OnNet and all other UW resources.
UW-IT provides the Husky OnNet VPN free for UW students via this link, and advises students to use it with the “All Internet Traffic” option enabled (see the UW Libraries instructions and UW-IT’s FAQs regarding the Husky OnNet VPN). Doing so will route all incoming and outgoing Internet through UW servers while it is enabled.
Credit and Notes
This syllabus has, in ways that should be obvious, borrowed and built on the OpenInto Statistics curriculum. Many aspects of this course design extend from a version of COM 521 I taught in 2017 as well versions of this course taught at Northwestern University in Spring 2019 and Fall 2020 by Aaron Shaw.