Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Page
Discussion
Edit
View history
Editing
Statistics and Statistical Programming (Winter 2021)
(section)
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Assignments == There are two types of assignments in the course: (a) problem sets that we will discuss during each class session; and (b) a large course project. === Problem Sets=== In order to support continuous progress towards the learning goals for the course, I have assigned problem sets for each class. These problem sets include some textbook exercises, some programming challenges, and some other questions. Problem sets and these may incorporate several kinds of questions: * '''Statistics questions''' about statistical concepts and principles. * '''Programming challenges''' that you should solve using R. * '''Empirical paper questions''' about other assigned readings. <!-- For the problem sets, I ask that you submit your work [https://canvas.uw.edu/courses/1434003/assignments via Canvas 24 hours before class] (i.e., Monday afternoon for our Tuesday class sessions). Details of exactly how this will work will be elaborated during the first class. --> Although you will never hand these in to be graded, I will ''randomly'' call on students to share your answers to these questions and I will assess your preparedness after every single class meeting. I will ''not'' grade you on whether you get these answers correct or incorrect. Although the problem sets will not be assigned a letter grade, they are the central focus of the course and completing them will support your mastery of the material in multiple ways. These assignments will provide the basis on which I will assess and provide feedback on your participation and engagement with the course material. For the programming challenges, be ready to share code and text for your solutions via screen share. If you get completely stuck on a problem, that's okay, but be ready to provide whatever you have and describe what tripped you up. In general, we will cover the problem sets in the first session of the week and the textbook materials in the second session. === Research project === As a demonstration of your learning in this course, you will design and carry out a quantitative research project, start to finish. This means you will all: * '''Design and describe a plan for a study''' β The study you design should involve quantitative analysis and should be something you can complete at least a first pass on during this quarter. * '''Find a dataset''' β Very quickly, you should identify a dataset you will use to complete this project. For most of you, I suspect you will be engaging in secondary data analysis or a analysis of a previously collected dataset. * '''Engage in descriptive data analysis''' β Use R to calculate descriptive statistics and visualizations to describe your data. * '''Motivate and test at least one hypothesis about relationships between two or more variables''' β I'm happy to discuss alternatives to formal hypothesis testing procedures (even if some of them are beyond the scope of this course). * '''Report and interpret your findings''' β You will do this in both a short paper and a short (recorded) presentation. * '''Ensure that your work is replicable''' β You will need to provide code and data for your analysis in a way that makes your work replicable by other researchers. ''I strongly urge you'' to produce a project that will further your academic career outside of the class. There are many ways that this can happen. Some obvious options are to prepare a project that you can submit for publication, use as pilot analysis that you can report in a grant or thesis proposal, and/or use to fulfill a degree requirement. The last time I taught a statistical course, a majority of students in the class used their course projects either to satisfy a general examination requirement, as a published paper, or both. There are several intermediate milestones, deliverables, and deadlines to help you accomplish a successful research project. Unless otherwise noted, all deliverables should be submitted via Canvas at 11:59pm Seattle time on the day they are due. ==== Research project plan and dataset identification ==== ;Due date: Friday January 15, 2021 ;Maximum length: 500 words (~1-2 pages) Very early on, I want you to identify and describe your final project. Your description should be short and can be either paragraphs or bullets. It should include the following: * An abstract of the proposed study including the topic, research question, theoretical motivation, object(s) of study, and anticipated research contribution. * An identification of the dataset you will use and a description of the rows and columns or type(s) of data it will include. If you do not currently have access to these data, explain why and when you will. * A short (several sentences?) description of how the project will fit into your career trajectory. ===== Notes on finding a dataset ===== In order to complete your final project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, fear not! There are many datasets to draw from. Some ideas are below (please suggest others, provide updated links, or report problems). The teaching team will also be available to help you brainstorm/find resources if needed: * Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze? * If there's an important study you loved, you can send a polite email to the author(s) asking if they are willing and able to share an archival or replication version of the dataset used in their paper. Be very polite and make it clear that this is starting as a class project, but that it might turn into a paper for publication. Make your timeline clear. In Communication and HCI, replication datasets are still very rare, so be prepared for a negative answer and/or questions about your motives in conducting the analysis. * Do some Google Scholar and normal internet searching for datasets in your research area. You'll probably be surprised at what's available. * Take a look at datasets available in the [https://dataverse.harvard.edu/ Harvard Dataverse] (a very large collection of social science research data) or one of the other members of the [http://dataverse.org/ Dataverse network]. * Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (UW is a member). There are an enormous number of very rich datasets. * Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. * The City of Seattle has one of the best [https://data.seattle.gov/ data portal sites] of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring. * [http://fivethirtyeight.com FiveThirtyEight.com] has published a [https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html GitHub repository and an R package] with pre-processed and cleaned versions of many of the datasets they use for articles published on their website. * If you interested in studying online communities, there are some great resources for accessing data from Reddit, Wikipedia, and StackExchange. See [https://files.pushshift.io/reddit/ pushshift] for dumps of Reddit data, [https://meta.wikimedia.org/wiki/Research:Data here] for an overview of Wikipedia's data resources, and [https://data.stackexchange.com/ Stack Exchange's data portal]. * The NY Times is publishing a [https://github.com/nytimes/covid-19-data COVID-19 data repository] that includes county-level metrics for deaths, mask usage, and other pandemic-related data. The release a lot of it as frequently updated .csv files and the repository includes documentation of the measurements, data collection details, and more. * The Community Data Science Collective and colleagues have created a [[COVID-19_Digital_Observatory| COVID-19 digital observatory]] (hosted in part right here on this wiki!) that publishes a bunch of pandemic-related data as csv and json files. * The [https://openpolicing.stanford.edu Stanford Open Policing project] has published a huge archive of policing data related mostly to traffic stops in states and many cities of the U.S. We'll use at least one of these files for a problem set. ==== Research project planning document ==== ;Due date: February 12, 2021 ;Suggested length: ~5 pages The project planning document is a shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) (Null) hypotheses; (d) Conceptual diagram and explanation of the relationship(s) you plan to test; (e) Measures; (f) Dummy tables/figures; (g) anticipated finding(s) and research contribution(s). Longer descriptions of each of these planning document sections (as well as a few others) can be found [[CommunityData:Planning document|on this wiki page]]. I will also provide example planning documents via our Canvas site: * [https://canvas.northwestern.edu/files/9439380/download?download_frd=1 One by public health researcher Mika Matsuzaki]. The first planning document I ever saw and still one of the best. It's missing a measures section. It's also focused on a research context that is probably very different from yours, but try not to get bogged down by that and imagine how you might map the structure of the document to your own work. * [One provided as an appendix to Gerber and Green's excellent textbook, ''Field Experiments: Design, Analysis, and Interpretation'' (FEDAI)]. It's over-detailed and over-long for the purposes of this assignment, but nevertheless an exemplary approach to planning empirical quantitative research in a careful, intentional way that is worthy of imitation. ==== Research project presentation ==== ;Presentation due date: March 10, 2021 ;Maximum length: 15 minutes <!-- TODO revisit old presentations page to update/adapt [[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations]] ---> You will also create and record a short presentation of your final project. The presentation will provide an opportunity to share a brief overview of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.uw.edu/files/74392679/download?download_frd=1 Creating a Successful Scholarly Presentation] (file posted to Canvas) may be useful. ==== Research project paper ==== ;Paper due date: March 19, 2021 ;Maximum length: 6000 words (~20 pages) I expect you to produce a short, high quality research paper that you might revise, extend, and submit for publication and/or a dissertation milestone like a methods general examination. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]]. As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution. Because the emphasis in this class is on statistics and methods and because I'm probably not an expert in the substance of your research domain, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work. I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper. I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., [https://www.apastyle.org/index APA 6th edition] ([https://templates.office.com/en-us/APA-style-report-6th-edition-TM03982351 Word] and [https://www.overleaf.com/latex/templates/sample-apa-paper/fswjbwygndyq LaTeX] templates) or [https://cscw.acm.org/2019/submit-papers.html ACM SIGCHI CSCW format]) that is applicable for a peer-reviewed journal or conference proceedings in which you might aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software like Zotero to handle your bibliographic sources. ==== Human subjects research, IRB, and ethics ==== In general, you are responsible for making sure that you're on the right side of the IRB requirements and that your work meets applicable ethical norms and standards. Class projects generally do not need IRB approval, but research for publications, dissertations, and sometimes even pilot studies do fall under IRB purview. You should ''not'' plan to seek IRB approval/determination retroactively. If your study may involve human subjects and you may ever publish it in any form, you will need IRB oversight of some sort. Secondary analysis of anonymized data is generally not considered human subjects research, but I strongly suggest that you get a determination from [https://www.washington.edu/research/hsd/ Human Subjects Division] (the UW IRB) before you start. For work that is not considered human subjects research, this can often happen in a few hours or days. If you need to list a faculty sponsor or Principal Investigator, that should ideally be your advisor. If that doesn't make sense for some reason, please talk to me. Research ethics are broad and complex topic. We'll talk about issues related to ethics and quantitative empirical research a bit more during class, but will likely only scratch the surface. I strongly encourage you to pursue further reading, conversation, coursework, and reflection as you consider how to understand and apply ethical principles in the context of your own research and teaching. === Grading and assessment === I will assign grades (typically on the UW 4.0 grade scale) for each of the following aspects of your performance. The percentage values in parentheses are weights that will be applied to calculate your overall grade for the course. * Problem set discussion: 40% * Project identification: 5% * Final project planning document: 5% * Final project presentation: 15% * Final project paper: 35% I will jointly and holistically evaluate your participation in problem set discussions along four dimensions: participation, preparation, engagement, and contribution. These are quite similar to the dimensions described in the "Participation Rubric" section of [[User:Benjamin Mako Hill/Assessment|my assessment page]]. Exceptional participation means excelling along all four dimensions. Please note that participation β talking/typing more and I encourage all of us to seek balance in our discussions. My assessment of your final project proposal, planning document, presentation, and paper will reflect the clarity of the work, the effective execution and presentation of quantitative empirical analysis, as well as the quality and originality of the analysis. Throughout the quarter, we will talk about the qualities of exemplary quantitative research. In general, I expect your final project to embody these exemplary qualities.
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information