Editing Statistics and Statistical Programming (Winter 2021)/Problem set 7
From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
== Programming Challenges == | == Programming Challenges == | ||
'''Do police in the United States engage in discriminatory behavior on the basis of race and ethnicity?''' For this problem set, you will investigate the relationship between traffic stops, vehicle searches and driver attributes (especially race as recorded by police officers conducting traffic stops). Doing so will involve some more advanced data wrangling, visualization, and analysis. We'll use data from [https://openpolicing.stanford.edu The Stanford Open Policing Project] (SOPP) that looks at records of traffic stops in Washington state between | '''Do police in the United States engage in discriminatory behavior on the basis of race and ethnicity?''' For this problem set, you will investigate the relationship between traffic stops, vehicle searches and driver attributes (especially race as recorded by police officers conducting traffic stops). Doing so will involve some more advanced data wrangling, visualization, and analysis. We'll use data from [https://openpolicing.stanford.edu The Stanford Open Policing Project] (SOPP) that looks at records of traffic stops in Washington state between 2012-2017. The full SOPP dataset for Washington is about 11 million rows, so I've created a 1% random sample for us to work with here. | ||
Overall, the dataset is well-documented and pretty "clean" (as far as these things go) but there are still a number of features that may be confusing, weird, and/or ill-organized to help answer the questions I've asked you below. Thank goodness you know how to use R to address these issues... | Overall, the dataset is well-documented and pretty "clean" (as far as these things go) but there are still a number of features that may be confusing, weird, and/or ill-organized to help answer the questions I've asked you below. Thank goodness you know how to use R to address these issues... | ||
Line 19: | Line 19: | ||
Review the project overview on the [https://openpolicing.stanford.edu/ SOPP homepage], the [https://openpolicing.stanford.edu/data/ overview of the data], the [https://github.com/stanford-policylab/opp/blob/master/data_readme.md#description-of-standardized-data description of the standardized data], the [https://github.com/stanford-policylab/opp/blob/master/data_readme.md#statewide-wa codebook/notes for the Washington data] from the [https://github.com/stanford-policylab/opp/blob/master/data_readme.md data_readme.md], as well as any other ancillary materials that you can find that seem likely to help you get oriented with the data. | Review the project overview on the [https://openpolicing.stanford.edu/ SOPP homepage], the [https://openpolicing.stanford.edu/data/ overview of the data], the [https://github.com/stanford-policylab/opp/blob/master/data_readme.md#description-of-standardized-data description of the standardized data], the [https://github.com/stanford-policylab/opp/blob/master/data_readme.md#statewide-wa codebook/notes for the Washington data] from the [https://github.com/stanford-policylab/opp/blob/master/data_readme.md data_readme.md], as well as any other ancillary materials that you can find that seem likely to help you get oriented with the data. | ||
For the questions below we'll focus on the following measures recorded for each traffic stop in Washington | For the questions below we'll focus on the following measures recorded for each traffic stop in Washington 2012-2017: <code>date</code>, <code>subject_age</code>, <code>subject_race</code>, <code>subject_sex</code>, and <code>search_conducted</code>. | ||
Record any questions or issues you might notice related to these measures as you review the information about the project and dataset. | Record any questions or issues you might notice related to these measures as you review the information about the project and dataset. | ||
Line 25: | Line 25: | ||
=== PC2. Import, explore, clean === | === PC2. Import, explore, clean === | ||
As I mentioned above, the full WA-SOPP dataset is over 11 million rows, so I have created a random 1% subset for us to work with in this assignment | As I mentioned above, the full WA-SOPP dataset is over 11 million rows, so I have created a random 1% subset for us to work with in this assignment. [FIXME ME] (and it's about XXMB). | ||
To get started, you'll want to import the data and explore its structure as well as key variables that we'll be focusing on in this analysis (<code>date</code>, <code>subject_age</code>, <code>subject_race</code>, <code>subject_sex</code>, and <code>search_conducted</code>). Inspect a random sample of rows to get a sense of the data. What (if anything) is missing? You may also want to clean/recode some of the key variables. Make sure to explain and justify any data cleanup and/or recoding steps you decide to take. | To get started, you'll want to import the data and explore its structure as well as key variables that we'll be focusing on in this analysis (<code>date</code>, <code>subject_age</code>, <code>subject_race</code>, <code>subject_sex</code>, and <code>search_conducted</code>). Inspect a random sample of rows to get a sense of the data. What (if anything) is missing? You may also want to clean/recode some of the key variables. Make sure to explain and justify any data cleanup and/or recoding steps you decide to take. |