Statistics and Statistical Programming (Winter 2017)


 * Advanced Stastical Methods in Communication: Statistics and Statistical Programming
 * COM521 - Department of Communication, University of Washington
 * Instructor: Benjamin Mako Hill (University of Washington)
 * Course Websites:
 * We will use Canvas for announcements, turning in assignments, and discussion (if you choose to use them)
 * Everything else will be linked on this page.
 * Course Catalog Description:


 * Discusses complexities in quantitative research on communication. Focus on multivariate data design and analysis, including multiple and logistic regression, ANOVA and MANOVA, and factor analysis.

Overview and Learning Objectives
This course is the second course in a two-quarter quantitative methods sequence in the University of Washington's Department of Communication MA/PhD program. The first course (COM 520) is a first introduction to quantitative social science in communication and focuses primarily on what you might think of the "soft skills" associated with doing social science: the conceptualization, operationalization of quantifiable variables and the design of quantitative analyses. That course introduces some univariate and bivariate statistics at the end and briefly touches on linear regression. That said, all of the statistical work in that course this is done using the tools that students already know (e.g. with spreadsheet software like LibreOffice, Google Sheets or Microsoft Excel). This class assumes that students have taken COM 520 and that they understand what is involved in describing and testing social scientific theories with data and that basic terminology of quantitative social science is going to be familiar.

This course (COM 521) is focused on technical skill-building and aims to be a get-your-hands-dirty introduction to statistics and statistical programming. The point of the course is to give you the mathematical and technical tools to carry out your own statistical analyses. Through the process, we're going to try to help you become more sophisticated consumers of quantitative research.

Although we'll be doing some math in the course, this is not a math class. I am going to assume you're familiar with basic algebra and arithmetic. This course will not require knowledge of calculus. In general we're not going to cover the math behind the techniques we'll be covering. Unlike many statistics classes, I'm definitely not going to be doing proofs on the board. Instead, the class is unapologetically focused on the application of statistic methodology. In that sense, the goal of the is course is to create informed consumers of quantitative methodology, not producers of new types of methods. My goal is to train producers of social scientific research that use statistics as a means toward an end.

This course does not seek to be the last stats class you take. I started grad school having not taken a math class since high school (basically) and took 12 different statistics and math courses over the course of my time in graduate school. Honestly, I wish I had done more. What this class seeks to do is give you a solid basis on which to build statistical knowledge. Anyone who finishes this class should feel comfortable moving on to take advance classes in CSSS and to start building toward a Concentration in Statistics in Communication certificate.

We'll cover theses basic statistical techniques: t-tests; chi-squared tests; ANOVA, MANOVA, and related methods; linear regression; and end with logistic regression.

I will consider the course a complete success if every student is able to do all of these things at the end of the quarter:


 * Carry out a complete analysis of a quantitative research project, start to finish. This means you will all:
 * Design a study.
 * Find, collect, or build a quantitative dataset.
 * Test a hypotheses about relationships between two or more variables using a real dataset.
 * Report your findings in a short paper and a short presentation.
 * Provide code and data for your analysis in a way that makes your work replicable by other researchers.
 * Read, modify, and create short programs in the GNU R statistical programming language.
 * Feel comfortable reading papers that use basic statistical techniques.
 * Feel comfortable and prepared enrolling in future statistics courses in CSSS.

Note About This Syllabus
You should expect this syllabus to be a dynamic document and you will notice that there are a few places marked "To Be Determined." Although the core expectations for this class are fixed, the details of readings and assignments will shift. As a result, there are three important things to keep in mind:


 * 1) Although details on this syllabus will change, I will not change readings or assignments less than one week before they are due. If I don't fill in a "To Be Determined" one week before it's due, it is dropped. If you plan to read more than one week ahead, contact me first.
 * 2) Closely monitor your email or the announcements section on the course website on Canvas. When I make changes, these changes will be recorded in the history of this page so that you can track what has changed and I will summarize these changes in an announcement on Canvas that will be emailed to everybody in the class.
 * 3) I will ask the class for voluntary anonymous feedback frequently — especially toward the beginning of the quarter. Please let me know what is working and what can be improved. In the past, I have made many adjustments based on this feedback.

Books
This class is going to use two textbooks:


 * Diez, David M., Christopher D. Barr, and Mine Çetinkaya-Rundel. 2015. OpenIntro Statistics. 3rd edition. OpenIntro, Inc. (PDF; Table-friendly PDF)
 * Verzani, John. 2014. Using R for Introductory Statistics, Second Edition. 2 edition. Boca Raton: Chapman and Hall/CRC. (Amazon)

Diez, Barr, and Çetinkaya-Rundel's is a free, and freely-licensed, online statistics textbook as well as a large online community of students and teachers. The book, lectures notes, and more are all freely licensed which has allowed the text to be adapted in a series of different fields. The book is excellent and it has been adopted extraordinarily widely. You can buy versions from Amazon in either full color hardcover ($19.99) or in black and white paperback ($7.60). I haven't purchased a paper copy so I can't speak to the quality of either.

Verzani's book is an introduction to the R programming language. It's designed to be used as a companion to a basic introductory statistics textbook (like OpenIntro). It's a poor stand-alone text but it will provide a good resources for the material we're covering in the course and it should act as a good reference going forward.

Although it's not required for the course course, I strongly suggest that you all considering adding this book to your library. When I was learning to program for the first time, these "cookbooks" by this publisher came in enormously handy:


 * Teetor, Paul. 2011. R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics. 1 edition. Beijing ; Sebastopol, CA: O’Reilly Media. (Safari (UW Libraries); Various Sources; Amazon)

These books include:

Assignments
The assignments in this class are designed to give you an opportunity to try your hand at using the conceptual material taught in the class. There will be no exams or quizzes. Unless otherwise noted, all assignments are due at the end of the day (i.e., 11:59pm on the day they are due).

Research Project
As a demonstration of your learning in this course, you will design a plan for an internet research project and will, if possible, also collect (at least) an initial sample of a dataset that you will use to complete the project.

The genre of the paper you can produce can one of the following three things:


 * 1) A draft of a manuscript for submission to a conference or journal.
 * 2) A proposal for funding (e.g., for submission for the NSF for a graduate student fellowship).
 * 3) A draft of the methods chapter of your dissertation.

In any the three paths, I expect you take this opportunity to produce a document that will further your to academic career outside of the class.

Project Identification

 * Due Date: April 10
 * Maximum paper length: 500 words (~1-2 page)
 * Deliverables: Turn in in Canvas

Early on, I want you to identify your final project. Your proposal should be short and can be either paragraphs or bullets. It should include the following things:


 * The genre of the project and a short description of how it fits into your career trajectory.
 * A one paragraph abstract of the proposed study and research question, theory, community, and/or groups you plan to study.
 * A short description of the type of data you plan to collect as part of your final project.

Final Project

 * Outline Due Date: May 8
 * Maximum outline length: 2 pages
 * Paper Due Date: June 12
 * Maximum outline length: 6000 words (~20 pages)
 * Presentation Date: June 2
 * All Deliverables: Turn in in Canvas

Because the emphasis in this class is on methods and because I'm not an expert in each of your areas or fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is so important. Instead of providing all of this details, instead feel free to start with a brief summary of the purpose and importance of this research, and an introduction of your research questions or hypotheses. If your provide more detail, that's fine, but I won't give you detailed feedback on this parts.

The final paper should include:


 * a statement of the purpose, central focus, relevance and significance of this research;
 * a description of the specific Internet application(s) and/or environment(s) and/or objects to be studied and employed in the research;
 * key research questions or hypotheses;
 * operationalization of key concepts;
 * a description and rationale of the specific method(s), (if more than one method will be used, explain how the methods will produce complementary findings);
 * a description of the step-by-step plan for data collection;
 * description and rationale of the level(s), unit(s) and process of analysis (if more than one kind of data are generated, explain how each kind will be analyzed individually and/or comparatively);
 * an explanation of how these analyses will enable you to answer the RQs
 * a sample instrument (as appropriate);
 * a sample dataset and description of a formative analysis you have completed;
 * a description of actual or anticipated results and any potential problems with their interpretation;
 * a plan for publishing/disseminating the findings from this research
 * a summary of technical, ethical, human subjects and legal issues that may be encountered in this research, and how you will address them;
 * a schedule (using specific dates) and proposed budget.

I also expect each student to begin data for your project (i.e., using the technical skills you learn in the class) and describe your progress in this regard this in your paper. If collecting data for a proposed project is impractical (e.g., because of IRB applications, funding, etc) I would love for you to engage in the collection of public dataset as part of a pilot or formative study. If this is not feasible or useful, we can discuss other options.

I have a strong preference for you to write this paper individually but I'm open to the idea that you may want to work with others in the class.

Participation
The course relies heavily on participation and discussion. It is important to realize that we will not summarize reading in class and I will not cover it in lecture. I expect you all to have read it and we will jump in and start discussing it. The "Participation Rubric" section of my detailed page on assessment gives the rubric I will use in evaluating participation.

Grading
I have put together a very detailed page that describes grading rubric I will be using in this course. Please read it carefully I will assign grades for each of following items on the UW 4.0 grade scale according to the weights below:


 * Participation: 25%
 * Presentation of method/approach: 15%
 * Proposal identification: 5%
 * Final paper outline: 5%
 * Final Presentation: 10%
 * Final Paper: 40%

Week 1: Tuesday January 3: Introduction, Setup, and Data and Variables
Assignment before class:


 * Install RStudio

Readings:


 * Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data)
 * Verzani: §1 (Getting Started), §2 (Univariate data), §A (Programming)

Week 2: Tuesday January 10: Probability and Visualization
Readings:


 * Diez, Barr, and Çetinkaya-Rundel: §2 (Probability)
 * Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics)

Week 3: Tuesday January 17: Distributions
Readings:


 * Diez, Barr, and Çetinkaya-Rundel: §3.1-3.2, §3.4
 * Verzani: §6 (Populations)

Week 4: Tuesday January 24: Statistical significance and hypothesis testing
Readings:


 * Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference)
 * Verzani: §7 (Statistical inference), §8 (Confidence intervals)

Week 5: Tuesday January 31: Continuous Numeric Data & ANOVA
Readings:


 * Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data)
 * Verzani: §9 (significance tests), §12 (Analysis of variance)

Week 6: Tuesday February 7: Categorical data
Readings:


 * Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data)
 * Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit)

Week 7: Tuesday February 14: Simple Linear Regression

 * Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression)
 * Verzani: §11.1-2 (Linear regression),

Week 8: Tuesday February 21: Multiple and Logistic Regression

 * Diez, Barr, and Çetinkaya-Rundel: §8 (Multiple and logistic regression)
 * Verzani: §11.3 (Linear regression), §13.1 (Logistic regression)

Attendance
As detailed in my page on assessment, attendance in class is expected of all participants. If you need to miss class for any reason, please contact me ahead of time (email is best). Multiple unexplained absences will likely result in a lower grade or (in extreme circumstances) a failing grade. In the event of an absence, you are responsible for obtaining class notes, handouts, assignments, etc.

Office Hours
I will not hold regular office hours. In general, I will be available to meet after class. Please contact me on email to arrange a meeting then or at another time.

Accommodations
In general, if you have an issue, such as needing an accommodation for a religious obligation or learning disability, speak with me before it affects your performance; afterward it is too late. Do not ask for favors; instead, offer proposals that show initiative and a willingness to work.

To request academic accommodations due to a disability please contact Disability Resources for Students, 448 Schmitz, 206-543-8924/V, 206-5430-8925/TTY. If you have a letter from Disability Resources for Students indicating that you have a disability that requires academic accommodations, please present the letter to me so we can discuss the accommodations that you might need for the class. I am happy to work with you to maximize your learning experience.

Academic Misconduct
I am committed to upholding the academic standards of the University of Washington’s Student Conduct Code. If I suspect a student violation of that code, I will first engage in a conversation with that student about my concerns.

If we cannot successfully resolve a suspected case of academic misconduct through our conversations, I will refer the situation to the department of communication advising office who can then work with the COM Chair to seek further input and if necessary, move the case up through the College.

While evidence of academic misconduct may result in a lower grade, I will not unilaterally lower a grade without addressing the issue with you first through the process outlined above.

Credit and Notes
This syllabus was inspired by, and borrows with permission from, a syallbus from an earlier version of this class taught by Kirsten Foot in Spring 2014.