Community Data Science Course (Spring 2019)/Day 3 Notes

From CommunityData
Jump to: navigation, search

Online Data Sets: An Important Question[edit]

Can you get bulk access to data?

Bad Signs

You must authenticate as a particular user in order to access data, and you can only see data for that user.

For example: you must log into instagram's api as a particular user

Look at this link!

Good signs

The organization owning the data wants everyone to access it. Like wikipedia or most government data.

You may have to authenticate as a particular user, but you can access general data.

For example: once you log into Twitter, you can get all tweets about a place

Twitter API Docs



Dictionaries[edit]

  • Use dictionaries to store key/value pairs.
  • Dictionaries do not guarantee ordering.
  • A given key can only have one value, but multiple keys can have the same value.

Initialization[edit]

>>> my_dict = {}
>>> my_dict
{}
>>> your_dict = {"Alice" : "chocolate", "Bob" : "strawberry", "Cara" : "mint chip"}
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}

Adding elements to a dictionary[edit]

>>> your_dict["Dora"] = "vanilla"
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}

Accessing elements of a dictionary[edit]

>>> your_dict["Alice"]
'chocolate'
>>> your_dict.get("Alice")
'chocolate'
>>> your_dict["Eve"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Eve'
>>> "Eve" in your_dict
False
>>> "Alice" in your_dict
True
>>> your_dict.get("Eve")
>>> person = your_dict.get("Eve")
>>> print(person)
None
>>> print(type(person))
<type 'NoneType'>
>>> your_dict.get("Alice")
'chocolate'

Changing elements of a dictionary[edit]

>>> your_dict["Alice"] = "coconut"
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}

Histograms[edit]

Challenge: using wordplay example from last week, count the number of words that start with each letter.

This kind of problem is very common Data Science, and it is easy with a dictionary.

(note: I will post the solution after class)

For-loops and dictionaries[edit]

There are two common ways to iterate through dictionaries:

>>> ages = {'Tommy': 34, Heather: 30, 'Joanna': 20}
>>> for key in ages:
>>>     print(key + " is " + str(ages[key]) + " years old")
>>> for key, value in ages.items():
>>>     print(key + " is " + str(value) + " years old")