Editing Intro to Programming and Data Science (Spring 2020)/Day 8 Coding Challenges

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 3: Line 3:
__NOTOC__
__NOTOC__


In this project, we will explore a few ways to gather data using the Twitter API. Once we've done that, we will extend the example code to create our own dataset of tweets.
In this project, we will explore a few ways to gather data using the Twitter API. Once we've done that, we will extend the example code to create our own dataset of tweets. In next week's assignment, we will ask and answer questions with the data we've collected.


== Goals ==
== Goals ==
Line 18: Line 18:


===Download the Twitter API project===
===Download the Twitter API project===
We will be building on material created for the [Community Data Science Workshops].


* Download the following zip file: https://github.com/CommunityDataScienceCollective/twitter-cdsw/archive/master.zip
* Download the following zip file: https://github.com/CommunityDataScienceCollective/twitter-cdsw/archive/master.zip
Line 38: Line 36:




=== Making your own notebooks ===
===Making your own notebooks===


we are using [http://www.tweepy.org/ tweepy], a python library that simplifies accessing the Twitter API.
we are using [http://www.tweepy.org/ tweepy], a python library that simplifies accessing the Twitter API.
Line 54: Line 52:
This will enable your authenticated Twitter API calls via the variable <code>api</code>
This will enable your authenticated Twitter API calls via the variable <code>api</code>


== Potential exercises ==


== Exercises ==
Read through the example notebooks and try to figure out what they are doing. It may also be helpful to look at the [http://docs.tweepy.org/en/latest/index.html Tweepy documentation].


'''Topics and Trends'''
'''Topics and Trends'''
Line 70: Line 66:
This section will require you to investigate the filter function in example 2 in more detail.
This section will require you to investigate the filter function in example 2 in more detail.


# Get the last 50 tweets from West Lafayette.
# Get the last 50 tweets from Ballard.
# Get the last 50 tweets from Times Square.
# Get the last 50 tweets from Times Square.
# Using timestamps, can you estimate whether people tweet more often in West Lafayette or Times Square?
# Using timestamps, can you estimate whether people tweet more often in Ballard or Times Square?
# A Premier League soccer game happened today between Liverpool and Chelsea. Using two geo searches, see if you can tell which city hosted the game. Note: if you do this some other day, you should pick a new sporting event.
# A baseball game happened today (May 11) between the Seattle Mariners and the Tampa Bay Rays. Using two geo searches, see if you can tell which city hosted the game. Note: if you do this some other day, you should pick a new sporting event.


'''Geolocation in the streaming API'''
'''Geolocation in the streaming APi'''


# Alter the streaming algorithm to include a "locations" filter. You need to use the order sw_lng, sw_lat, ne_lng, ne_lat for the four coordinates.  (Recall the stop button will stop an active process like the stream.)
# Alter the streaming algorithm to include a "locations" filter. You need to use the order sw_lng, sw_lat, ne_lng, ne_lat for the four coordinates.  (Recall the stop button will stop an active process like the stream.)
# What are people tweeting about in Times Square today? (Bonus points: set up a bounding box around TS and around NYC as a whole.)
# What are people tweeting about in Times Square today? (Bonus points: set up a bounding box around TS and around NYC as a whole.)
# Can you find words that are more likely to appear in Times Square (hint: you'll need two bounding boxes)?
# Can you find words that are more likely to appear in Time's Square (hint: you'll need two bounding boxes)?
# Purdue is playing basketball against Iowa tonight. Set up a bounding box around West Lafayette and Iowa City, Iowa. Can you identify tweets about basketball? Who tweets more about the game? Can you tell which team is the home team?  
# Oregon State is playing basketball against UC Berkeley. Set up a bounding box around Berkeley and Corvallis, Oregon. Can you identify tweets about basketball? Who tweets more about the game? Can you tell which team is the home team?  


Geolocation hint: You can use <code>d = api.search(geocode='[lng],[lat],5mi)</code> to get Tweets from a 5 mile radius around a point. Use Google or Bing maps to get a similar bounding box around Fenway Park.
Geolocation hint: You can use <code>d = api.search(geocode='[lng],[lat],5mi)</code> to get Tweets from a 5 mile radius around a point. Use Google or Bing maps to get a similar bounding box around Fenway Park.
Line 91: Line 87:
# Identify the follower you have that also follows the most of your followers.
# Identify the follower you have that also follows the most of your followers.
# How many handles follow you but none of your followers?
# How many handles follow you but none of your followers?
# Repeat this for people you follow, rather than those that follow you.
# Repeat this for people you follow, rather than that follow you.


== Congratulations!!!!==
== Congratulations!!!!==


You now know how to capture data from Twitter that you can use in your research!!! Next workshop we'll play with some fun analytical tools. In the meantime, here are [[Twitter words of warning|a few words of caution about using Twitter data for science]].
You now know how to capture data from Twitter that you can use in your research!!! Next workshop we'll play with some fun analytical tools. In the meantime, here are [[Twitter words of warning|a few words of caution about using Twitter data for science]].
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)