Community Data Science Course (Spring 2023)/Week 3 lecture notes

Online Data Sets: An Important Question
Can you get bulk access to data?

Bad Signs:


 * You must authenticate as a particular user in order to access data, and you can only see data for that user. (For example: you must log into Instagram's API as a particular user as per this link!)

Good signs:


 * The organization owning the data wants everyone to access it. Like wikipedia or most government data.
 * You may have to authenticate as a particular user, but you can access general data.
 * For example: once you log into Reddit, you can get all posts about almost anything (Twitter API Docs)

Programming lecture outline

 * Dictionaries!
 * Purpose (use dictionaries to store key/value pairs)
 * Initialization
 * Accessing elements
 * Adding elements
 * Changing elements
 * and
 * using  to look into dictionaries
 * using  loops to iterate over dictionaries (e.g., lets build a list of every letter in the alphabet using wordplay data)
 * A few notes about dictionaries:
 * A given key can only have one value, but multiple keys can have the same value.
 * Dictionaries do not guarantee ordering (although if you are using new versions of Python, order will be preserved).
 * Additional loop control
 * Note: These can be useful in combination to if statements and can also be super useful for debugging!
 * Writing to files:
 * Using the  function
 * Using the  statement
 * Writing to a file with
 * Writing a tab-separated value file using "\t" (make sure we leave a header!)
 * Now lets open it up and make a little graph
 * Defining our own functions!
 * A little bit on looking for help (if it hasn't come up already)
 * Looking at StackOverflow
 * Walking through the Python API documentation
 * Using a reference card or cheatsheet
 * Walking through the Python API documentation
 * Using a reference card or cheatsheet

Initialization
>>> my_dict = {} >>> my_dict {} >>> your_dict = {"Alice" : "chocolate", "Bob" : "strawberry", "Cara" : "mint chip"} >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}

Adding elements to a dictionary
>>> your_dict["Dora"] = "vanilla" >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}

Accessing elements of a dictionary
>>> your_dict["Alice"] 'chocolate' >>> your_dict.get("Alice") 'chocolate'

>>> your_dict["Eve"] Traceback (most recent call last): File " ", line 1, in KeyError: 'Eve' >>> "Eve" in your_dict False >>> "Alice" in your_dict True >>> your_dict.get("Eve") >>> person = your_dict.get("Eve") >>> print(person) None >>> print(type(person))  >>> your_dict.get("Alice") 'chocolate'

Changing elements of a dictionary
>>> your_dict["Alice"] = "coconut" >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}

"Histograms"
Challenge: using wordplay example from last week, count the number of words that start with each letter.

This kind of problem is very common Data Science, and it is easy with a dictionary.

(note: I will post the solution after class)

For-loops and dictionaries
There are two common ways to iterate through dictionaries:

>>> ages = {'Tommy': 34, Heather: 30, 'Joanna': 20} >>> for key in ages: >>>    print(key + " is " + str(ages[key]) + " years old")

>>> for key, value in ages.items: >>>    print(key + " is " + str(value) + " years old")