Community Data Science Workshops (Fall 2014)/Day 2 Twitter project: Difference between revisions

From CommunityData
(Initial commit)
 
(added our exercises)
 
(4 intermediate revisions by one other user not shown)
Line 1: Line 1:
[[File:Twitter.png|right|250px]]
[[File:Twitter.png|right|260px]]


__NOTOC__
__NOTOC__
Line 16: Line 16:
=== Prerequisite ===
=== Prerequisite ===


To participate in the Twitter afternoon session, you ''must'' have registered with Twitter as a developer before the session by following the [[Community Data Science Workshops/Twitter authentication setup|Twitter authentication setup instructions]]. If you did not do this, or if you tried but did not succeed, please attend one of the other two sessions instead.
To participate in the Twitter afternoon session, you ''must'' have registered with Twitter as a developer before the session by following the [[Twitter authentication setup|Twitter authentication setup instructions]]. If you did not do this, or if you tried but did not succeed, please attend one of the other two sessions instead.


=== Download and test the Twitter project ===
=== Download and test the Twitter project ===


If you are confused by these steps, go back and refresh your memory with the [[Community Data Science Workshops/Friday April 4th setup and tutorial|Friday April 4th setup and tutorial]] and [[Community Data Science Workshops/Friday April 4th Tutorial|Friday April 4th Tutorial]]
If you are confused by these steps, go back and refresh your memory with the [[Community_Data_Science_Workshops_(Fall_2014)/Day_0_setup_and_tutorial|Friday Nov 7th Tutorial]]


(Estimated time: 10 minutes)
(Estimated time: 10 minutes)


* [[Community Data Science Workshops/May 3rd Twitter project Windows setup|Windows]]
* [[Twitter project Windows setup|Windows]]
* [[Community Data Science Workshops/May 3rd Twitter project OS X setup|OS X]]
* [[Twitter project OS X setup|OS X]]
* [[Community Data Science Workshops/May 3rd Twitter project Linux setup|Linux]]
* [[Twitter project Linux setup|Linux]]
 
=== Potential exercises ===
 
'''Who are my followers?'''
 
1) Use sample 2 to get your followers.
 
2) For each of your followers, get *their* followers (investigate time.sleep to throttle your computation)
 
3) Identify the follower you have that also follows the most of your followers.
 
4) How many handles follow you but none of your followers?
 
5) Repeat this for people you follow, rather than that follow you.
 
 
'''Topics and Trends'''
 
1) Use sample 3 to produce a list of 1000 tweets about a topic.
 
2) Look at those tweets. How does twitter interpret a two word query like "data science"
 
3) Eliminate retweets [hint: look at the tweet object!]
 
4) For each tweet original tweet, list the number of times you see it retweeted.
 
5) Get a list of the URLs that are associated with your topic.
 
'''Geolocation'''
 
1) Alter the streaming algorithm to include a "locations" filter. You need to use the order sw_lng, sw_lat, ne_lng, ne_lat for the four coordinates.
 
2) What are people tweeting about in Times Square today?
 
2.5) Bonus points: set up a bounding box around TS and around NYC as a whole.
Can you find words that are more likely to appear in TS?
 
3) UW is playing Arizona in football today. Set up a bounding box around the Arizona stadium and around UW. Can you identify tweets about football? Who tweets more about the game?
 
# you can use d = api.search(geocode='37.781157,-122.398720,1mi')  to do
# static geo search.

Latest revision as of 22:46, 15 November 2014

Twitter.png


Building a Dataset using the Twitter API[edit]

In this project, we will explore a few ways to gather data using the Twitter API. Once we've done that done, we will extend this to code to create our own datasets of tweets that we might be able to use to ask and answer questions in the final session.

Goals[edit]

  • Get set up to build datasets with the Twitter API
  • Have fun collecting different types of tweets using a variety of ways to search
  • Pratice reading and extending other people's code
  • Create a few collections of Tweets you can do research with in the final section

Prerequisite[edit]

To participate in the Twitter afternoon session, you must have registered with Twitter as a developer before the session by following the Twitter authentication setup instructions. If you did not do this, or if you tried but did not succeed, please attend one of the other two sessions instead.

Download and test the Twitter project[edit]

If you are confused by these steps, go back and refresh your memory with the Friday Nov 7th Tutorial

(Estimated time: 10 minutes)

Potential exercises[edit]

Who are my followers?

1) Use sample 2 to get your followers.

2) For each of your followers, get *their* followers (investigate time.sleep to throttle your computation)

3) Identify the follower you have that also follows the most of your followers.

4) How many handles follow you but none of your followers?

5) Repeat this for people you follow, rather than that follow you.


Topics and Trends

1) Use sample 3 to produce a list of 1000 tweets about a topic.

2) Look at those tweets. How does twitter interpret a two word query like "data science"

3) Eliminate retweets [hint: look at the tweet object!]

4) For each tweet original tweet, list the number of times you see it retweeted.

5) Get a list of the URLs that are associated with your topic.

Geolocation

1) Alter the streaming algorithm to include a "locations" filter. You need to use the order sw_lng, sw_lat, ne_lng, ne_lat for the four coordinates.

2) What are people tweeting about in Times Square today?

2.5) Bonus points: set up a bounding box around TS and around NYC as a whole. Can you find words that are more likely to appear in TS?

3) UW is playing Arizona in football today. Set up a bounding box around the Arizona stadium and around UW. Can you identify tweets about football? Who tweets more about the game?

  1. you can use d = api.search(geocode='37.781157,-122.398720,1mi') to do
  2. static geo search.