Difference between revisions of "Seattle open data"

From CommunityData
(new evergreen page for seattle open data stuff)
 
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[File:SeattleGovLogoHome.png|right|250px]]
+
[[File:Burke_gilman.jpg|thumb|right|250px|Who's riding on the Burke Gilman trail this week?]]
  
In this project, we will explore a few ways to gather data from [https://data.seattle.gov data.seattle.gov]. Once we've done that, we will extend this to code to create our own datasets of civic data that will allow us to ask and answer questions about the Emerald City!
+
In this project, we will gather civic data from [https://data.seattle.gov data.seattle.gov] and use it to ask and answer important questions about the Emerald City!. We will start with a series of analyses of bike and pedestrian traffic patterns on the [https://en.wikipedia.org/wiki/Burke-Gilman_Trail Burke-Gilman Trail].
  
TO FILL IN [[User:Jtmorgan|Jtmorgan]] ([[User talk:Jtmorgan|talk]]) 16:57, 20 January 2020 (EST)
+
We will learn how to collect that data from the Seattle's open data portal's API, filter and transform this data, and create timeseries graphs that show daily, weekly, and yearly traffic trends.
  
 
== Goals ==
 
== Goals ==
 +
[[File:SeattleGovLogoHome.png|right|250px]]
 +
 +
In this session, we will focus on...
  
* Learn how to gather datasets from data.seattle.gov with the Socrata API and the Open Data Portal
+
* Learn how to pose useful research questions that can be asked and answered with civic data
* Identify interesting datasets and research questions that can be asked and answered with those datasets
+
* Learn how to filter, bucket, and format data for building timeseries graphs in a spreadsheet program
 +
* Familiarizing ourselves with a new API
 
* Practice reading and extending other people's code
 
* Practice reading and extending other people's code
* Create a few collections of different types of data from data.seattle.gov that you can do research with in the final section
 
  
 
== Setup ==
 
== Setup ==
Line 16: Line 19:
  
 
=== Download the Seattle open data project ===
 
=== Download the Seattle open data project ===
# Click the following link and save the file to your Desktop directory: TODOLINK
+
# Click the following link and save the file to your computer: https://github.com/jtmorgan/cdsw-2020/archive/master.zip
# Unzip FIXME.zip file
+
# Unzip <tt>cdsw-2020-master.zip</tt> folder and place the folder in your CDSW working directory (or just your desktop)
  
=== Test the Seattle open data project ===
+
=== Test the Seattle open data API ===
 
;Test an API call to data.seattle.gov
 
;Test an API call to data.seattle.gov
  
Open the Jupyter notebook FOO
+
#Open the Jupyter notebook <tt>SODA_API_demo.ipynb</tt>
 
+
#Run the first code cell in the notebook
Run the first X cells in the notebook in order
 
 
 
The output of cell FIXME should be
 
example output
 
 
 
;Test downloading a CSV file and opening it in a notebook
 
 
 
Open FIXMELINK in your browser
 
  
CLICK on DOWNLOADBUTTON FIXME
+
The output of cell should look like:
 +
 +
"https://data.seattle.gov/resource/76t5-zqzr.json?$where=(PermitNum='6531736-PH')"
 +
[{'applieddate': '2016-10-07',
 +
  'contractorcompanyname': 'M A MORTENSON COMPANY',
 +
  'description': 'Construct institutional building (University of Washington, '
 +
                'Computer Science and Engineering Dept.), occupy per plan.',
 +
  'estprojectcost': '23886804',
 +
  'expiresdate': '2020-04-03',
 +
  'housingunitsadded': '0',
 +
  'housingunitsremoved': '0',
 +
  'issueddate': '2017-04-03',
 +
  'latitude': '47.65300378',
 +
  'link': {'url': 'https://cosaccela.seattle.gov/portal/customize/LinkToRecord.aspx?altId=6531736-PH'},
 +
  'location1': {'human_address': '{"address": "3800 EAST STEVENS WAY NE", '
 +
                                '"city": "SEATTLE", "state": "WA", "zip": '
 +
                                '"98195"}',
 +
                'latitude': '47.65300378',
 +
                'longitude': '-122.30500427'},
 +
  'longitude': '-122.30500427',
 +
  'originaladdress1': '3800 EAST STEVENS WAY NE',
 +
  'originalcity': 'SEATTLE',
 +
  'originalstate': 'WA',
 +
  'originalzip': '98195',
 +
  'permitclass': 'Institutional',
 +
  'permitclassmapped': 'Non-Residential',
 +
  'permitnum': '6531736-PH',
 +
  'permittype': 'Building',
 +
  'permittypedesc': 'New',
 +
  'statuscurrent': 'Completed'}]
  
SAVE FIXME.csv to the FIXME directory with your notebooks
+
== Analyzing traffic on the Burke-Gilman trail ==
 +
[[File:Bgt_bikes_and_peds_2019.png|thumb|right|250px|In this session we'll learn how to analyze and transform data about traffic on the Burke-Gilman trail over time, and create useful timeseries visualizations like this one!]]
 +
We will spend the first part of the session today walking through the included notebook <tt>Burke-Gilman_commuter_traffic.ipynb</tt>. We will be reproducing this notebook section by section, coding as we go, until we culminate in exporting a CSV file that can be used to build the timeseries visualization above.
  
OPEN the Juypyter notebook FOO
+
After that, you'll have time to explore next steps on your own, either tackling the "Challenge questions" below, exploring the capabilities of the SODA API, or asking your own research questions with any of the other datasets on data.seattle.gov!
  
Run the first X cells of the notebook in order
+
=== Research questions we will answer in this session ===
 +
# How many people used the Burke Gilman during commute hours in 2019?
 +
# What were the busiest hours on the Burke Gilman in 2019?
 +
# What are the busiest hours for bikes vs pedestrians?
 +
# What are the busiest hours for bikes vs. peds AND northbound vs. southbound?
  
The output of the cell FIXME should be
+
=== Challenge questions to apply what you've learned ===
example output
+
''These are questions you now have the basic tools to answer using the BGT dataset (potentially in combination with other open datasets listed below):''
 +
# What day of the week is busiest on the Burke Gilman?
 +
# What day of the week is busiest for bikes? Is it the same as the busiest day for pedestrians?
 +
# What month of the year is busiest? (aka do Seattlites really like to ride in the rain?)
 +
# Has the Burke Gilman gotten busier over time? (the dataset we have goes back to 2014!)
 +
# Do fewer people commute on the Burke Gilman when it's cold out? (hint: try combining this dataset with the dataset on road temperature over time!)
 +
# Do more people commute into Seattle in the mornings by bike on the Burke Gilman, or on the the Mountain to Sound Trail?
  
 +
== SODA API tutorial ==
 +
The included notebook <tt>SODA_API_demo.ipynb</tt> can help you familiarize yourself with the [https://dev.socrata.com/ Socrata Open Data API] (which is used on data.seattle.gov). This API allows you to write powerful queries to get exactly the data you want from any of these Seattle Open Data portal sites (as well as any other site that uses the SODA API!). If you'd like to spend more time in the session practicing with this API, grab a mentor!
  
== Socrata API tutorial ==
+
=== Data sources that use this API ===
 +
* https://data.medicare.gov/
 +
* https://opendata.cityofnewyork.us/
 +
* https://data.cityofchicago.org/
 +
* Most (all?) of the sites listed at https://www.opendatanetwork.com/
  
== Datasets to explore ==
+
== Other open Seattle datasets to explore ==
 +
* Fremont bridge bicycle counter: https://data.seattle.gov/Transportation/Fremont-Bridge-Bicycle-Counter/65db-xm6k
 +
* Spokane Street bridge bicycle counter: https://data.seattle.gov/Transportation/Spokane-St-Bridge-Bicycle-Counter/upms-nr8w
 +
* Mountain to Sound trail bicycle + pedestrian counter: https://data.seattle.gov/Transportation/MTS-Trail-west-of-I-90-Bridge-Bicycle-and-Pedestri/u38e-ybnc
 +
* Seattle police [https://en.wikipedia.org/wiki/Terry_stop Terry stops]: https://data.seattle.gov/Public-Safety/Terry-Stops/28ny-9ts8
 +
* Seattle building permits: https://data.seattle.gov/Permitting/Building-Permits/76t5-zqzr
 +
* Seattle road temperature: https://data.seattle.gov/Public-Safety/Road-Weather-Information-Stations/egc4-d24i/data
  
 
== External links ==
 
== External links ==

Latest revision as of 20:15, 15 February 2020

Who's riding on the Burke Gilman trail this week?

In this project, we will gather civic data from data.seattle.gov and use it to ask and answer important questions about the Emerald City!. We will start with a series of analyses of bike and pedestrian traffic patterns on the Burke-Gilman Trail.

We will learn how to collect that data from the Seattle's open data portal's API, filter and transform this data, and create timeseries graphs that show daily, weekly, and yearly traffic trends.

Goals[edit]

SeattleGovLogoHome.png

In this session, we will focus on...

  • Learn how to pose useful research questions that can be asked and answered with civic data
  • Learn how to filter, bucket, and format data for building timeseries graphs in a spreadsheet program
  • Familiarizing ourselves with a new API
  • Practice reading and extending other people's code

Setup[edit]

If you are confused by these steps, go back and refresh your memory with the Day 0 setup and tutorial.

Download the Seattle open data project[edit]

  1. Click the following link and save the file to your computer: https://github.com/jtmorgan/cdsw-2020/archive/master.zip
  2. Unzip cdsw-2020-master.zip folder and place the folder in your CDSW working directory (or just your desktop)

Test the Seattle open data API[edit]

Test an API call to data.seattle.gov
  1. Open the Jupyter notebook SODA_API_demo.ipynb
  2. Run the first code cell in the notebook

The output of cell should look like:

"https://data.seattle.gov/resource/76t5-zqzr.json?$where=(PermitNum='6531736-PH')"
[{'applieddate': '2016-10-07',
 'contractorcompanyname': 'M A MORTENSON COMPANY',
 'description': 'Construct institutional building (University of Washington, '
                'Computer Science and Engineering Dept.), occupy per plan.',
 'estprojectcost': '23886804',
 'expiresdate': '2020-04-03',
 'housingunitsadded': '0',
 'housingunitsremoved': '0',
 'issueddate': '2017-04-03',
 'latitude': '47.65300378',
 'link': {'url': 'https://cosaccela.seattle.gov/portal/customize/LinkToRecord.aspx?altId=6531736-PH'},
 'location1': {'human_address': '{"address": "3800 EAST STEVENS WAY NE", '
                                '"city": "SEATTLE", "state": "WA", "zip": '
                                '"98195"}',
               'latitude': '47.65300378',
               'longitude': '-122.30500427'},
 'longitude': '-122.30500427',
 'originaladdress1': '3800 EAST STEVENS WAY NE',
 'originalcity': 'SEATTLE',
 'originalstate': 'WA',
 'originalzip': '98195',
 'permitclass': 'Institutional',
 'permitclassmapped': 'Non-Residential',
 'permitnum': '6531736-PH',
 'permittype': 'Building',
 'permittypedesc': 'New',
 'statuscurrent': 'Completed'}]

Analyzing traffic on the Burke-Gilman trail[edit]

In this session we'll learn how to analyze and transform data about traffic on the Burke-Gilman trail over time, and create useful timeseries visualizations like this one!

We will spend the first part of the session today walking through the included notebook Burke-Gilman_commuter_traffic.ipynb. We will be reproducing this notebook section by section, coding as we go, until we culminate in exporting a CSV file that can be used to build the timeseries visualization above.

After that, you'll have time to explore next steps on your own, either tackling the "Challenge questions" below, exploring the capabilities of the SODA API, or asking your own research questions with any of the other datasets on data.seattle.gov!

Research questions we will answer in this session[edit]

  1. How many people used the Burke Gilman during commute hours in 2019?
  2. What were the busiest hours on the Burke Gilman in 2019?
  3. What are the busiest hours for bikes vs pedestrians?
  4. What are the busiest hours for bikes vs. peds AND northbound vs. southbound?

Challenge questions to apply what you've learned[edit]

These are questions you now have the basic tools to answer using the BGT dataset (potentially in combination with other open datasets listed below):

  1. What day of the week is busiest on the Burke Gilman?
  2. What day of the week is busiest for bikes? Is it the same as the busiest day for pedestrians?
  3. What month of the year is busiest? (aka do Seattlites really like to ride in the rain?)
  4. Has the Burke Gilman gotten busier over time? (the dataset we have goes back to 2014!)
  5. Do fewer people commute on the Burke Gilman when it's cold out? (hint: try combining this dataset with the dataset on road temperature over time!)
  6. Do more people commute into Seattle in the mornings by bike on the Burke Gilman, or on the the Mountain to Sound Trail?

SODA API tutorial[edit]

The included notebook SODA_API_demo.ipynb can help you familiarize yourself with the Socrata Open Data API (which is used on data.seattle.gov). This API allows you to write powerful queries to get exactly the data you want from any of these Seattle Open Data portal sites (as well as any other site that uses the SODA API!). If you'd like to spend more time in the session practicing with this API, grab a mentor!

Data sources that use this API[edit]

Other open Seattle datasets to explore[edit]

External links[edit]