Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Page
Discussion
Edit
View history
Editing
Statistics and Statistical Programming (Fall 2020)
(section)
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Research project (major) assignments === ==== Overview ==== As a demonstration of your learning in this course, you will design and carry out a quantitative research project, start to finish. This means you will all: * '''Design and describe a plan for a study''' β The study you design should involve quantitative analysis and should be something you can complete at least a first pass on during this quarter. * '''Find a dataset''' β Very quickly, you should identify a dataset you will use to complete this project. For most of you, I suspect you will be engaging in secondary data analysis or a analysis of a previously collected dataset. * '''Engage in descriptive data analysis''' β Use R to calculate descriptive statistics and visualizations to describe your data. * '''Motivate and test at least one hypothesis about relationships between two or more variables''' β I'm happy to discuss alternatives to formal hypothesis testing procedures (even if some of them are beyond the scope of this course). * '''Report and interpret your findings''' β You will do this in both a short paper and a short (recorded) presentation. * '''Ensure that your work is replicable''' β You will need to provide code and data for your analysis in a way that makes your work replicable by other researchers. ''I strongly urge you'' to produce a project that will further your academic career outside of the class. There are many ways that this can happen. Some obvious options are to prepare a project that you can submit for publication, use as pilot analysis that you can report in a grant or thesis proposal, and/or use to fulfill a degree requirement. There are several intermediate milestones, deliverables, and deadlines to help you accomplish a successful research project. Unless otherwise noted, all deliverables should be submitted via Canvas by 5pm CT on the day they are due. ==== Research project plan and dataset identification ==== ;Due date: October 9, 2020, 5pm CT ;Maximum length: 500 words (~1-2 pages) Early on, I want you to identify and describe your final project. Your description should be short and can be either paragraphs or bullets. It should include the following: * An abstract of the proposed study including the topic, research question, theoretical motivation, object(s) of study, and anticipated research contribution. * An identification of the dataset you will use and a description of the rows and columns or type(s) of data it will include. If you do not currently have access to these data, explain why and when you will. * A short (several sentences?) description of how the project will fit into your career trajectory. ===== Notes on finding a dataset ===== In order to complete your final project, you will each need a dataset. If you already have a dataset for the project you plan to conduct, great! If not, fear not! There are many datasets to draw from. Some ideas are below (please suggest others, provide updated links, or report problems). The teaching team will also be available to help you brainstorm/find resources if needed: * Ask your advisor for a dataset they have collected and used in previous papers. Are there other variables you could use? Other relationships you could analyze? * If there's an important study you loved, you can send a polite email to the author(s) asking if they are willing and able to share an archival or replication version of the dataset used in their paper. Be very polite and make it clear that this is starting as a class project, but that it might turn into a paper for publication. Make your timeline clear. In Communication and HCI, replication datasets are still very rare, so be prepared for a negative answer and/or questions about your motives in conducting the analysis. * Do some Google Scholar and normal internet searching for datasets in your research area. You'll probably be surprised at what's available. * Take a look at datasets available in the [https://dataverse.harvard.edu/ Harvard Dataverse] (a very large collection of social science research data) or one of the other members of the [http://dataverse.org/ Dataverse network]. * Look at the collection of social scientific datasets at [https://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR at the University of Michigan] (NU is a member). There are an enormous number of very rich datasets. * Use the [http://scientificdata.isa-explorer.org/index.html ISA Explorer] to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences. * The City of Chicago has one of the best [https://data.cityofchicago.org/ data portal sites] of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring. * [http://fivethirtyeight.com FiveThirtyEight.com] has published a [https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html GitHub repository and an R package] with pre-processed and cleaned versions of many of the datasets they use for articles published on their website. * If you interested in studying online communities, there are some great resources for accessing data from Reddit, Wikipedia, and StackExchange. See [https://files.pushshift.io/reddit/ pushshift] for dumps of Reddit data, [https://meta.wikimedia.org/wiki/Research:Data here] for an overview of Wikipedia's data resources, and [https://data.stackexchange.com/ Stack Exchange's data portal]. * The NY Times is publishing a [https://github.com/nytimes/covid-19-data COVID-19 data repository] that includes county-level metrics for deaths, mask usage, and other pandemic-related data. The release a lot of it as frequently updated .csv files and the repository includes documentation of the measurements, data collection details, and more. * The Community Data Science Collective and colleagues have created a [[COVID-19_Digital_Observatory| COVID-19 digital observatory]] (hosted in part right here on this wiki!) that publishes a bunch of pandemic-related data as csv and json files. * The [https://openpolicing.stanford.edu Stanford Open Policing project] has published a huge archive of policing data related mostly to traffic stops in states and many cities of the U.S. We'll use at least one of these files for a problem set. ==== Research project planning document ==== ;Due date: October 30, 2020, 5pm CT ;Suggested length: ~5 pages The project planning document is a shell/outline of an empirical quantitative research paper. Your planning document should should have the following sections: (a) Rationale, (b) Objectives; (b.1) General objectives; (b.2) Specific objectives; (c) (Null) hypotheses; (d) Conceptual diagram and explanation of the relationship(s) you plan to test; (e) Measures; (f) Dummy tables/figures; (g) anticipated finding(s) and research contribution(s). Longer descriptions of each of these planning document sections (as well as a few others) can be found [[CommunityData:Planning document|on this wiki page]]. I will also provide three example planning documents via our Canvas site (links to-be-updated for 2020 edition of the course): * [https://canvas.northwestern.edu/files/9439380/download?download_frd=1 One by public health researcher Mika Matsuzaki]. The first planning document I ever saw and still one of the best. It's missing a measures section. It's also focused on a research context that is probably very different from yours, but try not to get bogged down by that and imagine how you might map the structure of the document to your own work. * [https://canvas.northwestern.edu/files/9421229/download?download_frd=1 One by Jim Maddock] created as part of a qualifying exam early in 2019. Jim doesn't provide dummy tables or anticipated findings/contributions, but he has an especially phenomenal explanation of the conceptual relationships and processes he wants to test. * [https://canvas.northwestern.edu/files/9439379/download?download_frd=1 One provided as an appendix to Gerber and Green's excellent textbook, ''Field Experiments: Design, Analysis, and Interpretation'' (FEDAI)]. It's over-detailed and over-long for the purposes of this assignment, but nevertheless an exemplary approach to planning empirical quantitative research in a careful, intentional way that is worthy of imitation. ==== Research project presentation ==== ;Presentation due date: December 3, 2020, 5pm CT ;Maximum length: 10 minutes <!-- TODO revisit old presentations page to update/adapt [[Statistics_and_Statistical_Programming_(Spring_2019)/Final_project_presentations]] ---> You will also create and record a short presentation of your final project. The presentation will provide an opportunity to share a brief overview of your project and findings with the other members of the class. Since you will all give other research presentations throughout your career, I strongly encourage you to take the opportunity to refine your academic presentation skills. The document [https://canvas.northwestern.edu/files/9439377/download?download_frd=1 Creating a Successful Scholarly Presentation] (file posted to Canvas) may be useful. Additional details about the presentation goals, format suggestions, resources, and more will be provided later in the quarter. ==== Research project paper ==== ;Paper due date: December 10, 2020, 5pm CT ;Maximum length: 6000 words (~20 pages) I expect you to produce a short, high quality research paper that you might revise, extend, and submit for publication and/or a dissertation milestone. I do not expect the paper to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the [[structure of a quantitative empirical research paper]]. As noted above, you should also provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. If that is not possible/appropriate for some reason, please talk to me so that we can find another solution. Because the emphasis in this class is on statistics and methods and because I'm probably not an expert in the substance of your research domain, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you need not focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work. I have a strong preference for you to write the paper individually, but I'm open to the idea that you may want to work with others in the class. Please contact me ''before'' you attempt to pursue a collaborative final paper. I do not have strong preferences about the style or formatting guidelines you follow for the paper and its bibliography. However, ''your paper must follow a standard format'' (e.g., [https://cscw.acm.org/2019/submit-papers.html ACM SIGCHI CSCW format] or [https://www.apastyle.org/index APA 6th edition] ([https://templates.office.com/en-us/APA-style-report-6th-edition-TM03982351 Word] and [https://www.overleaf.com/latex/templates/sample-apa-paper/fswjbwygndyq LaTeX] templates)) that is applicable for a peer-reviewed journal or conference proceedings in which you might aim to publish the work (they all have formatting or submission guidelines published online and you should follow them). This includes the references. I also strongly recommend that you use reference management software like Zotero to handle your bibliographic sources. ==== Human subjects research, IRB, and ethics ==== In general, you are responsible for making sure that you're on the right side of the IRB requirements and that your work meets applicable ethical norms and standards. Class projects generally do not need IRB approval, but research for publications, dissertations, and sometimes even pilot studies do fall under IRB purview. You should ''not'' plan to seek IRB approval/determination retroactively. If your study may involve human subjects and you may ever publish it in any form, you will need IRB oversight of some sort. Secondary analysis of anonymized data is generally not considered human subjects research, but I strongly suggest that you get a determination from [https://irb.northwestern.edu/ the Northwestern IRB] before you start. For work that is not considered human subjects research, this can often happen in a few hours or days. If you need to list a faculty sponsor or Principal Investigator, that should ideally be your advisor. If that doesn't make sense for some reason, please talk to me. Research ethics are broad and complex topic. We'll talk about issues related to ethics and quantitative empirical research a bit more during class, but will likely only scratch the surface. I strongly encourage you to pursue further reading, conversation, coursework, and reflection as you consider how to understand and apply ethical principles in the context of your own research and teaching.
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information