Community Data Science Course (Spring 2023)/Week 3 lecture notes
From CommunityData
Online Data Sets: An Important Question[edit]
Can you get bulk access to data?
Bad Signs:
- You must authenticate as a particular user in order to access data, and you can only see data for that user. (For example: you must log into Instagram's API as a particular user as per this link!)
Good signs:
- The organization owning the data wants everyone to access it. Like wikipedia or most government data.
- You may have to authenticate as a particular user, but you can access general data.
- For example: once you log into Reddit, you can get all posts about almost anything (Twitter API Docs)
Programming lecture outline[edit]
- Dictionaries!
- Purpose (use dictionaries to store key/value pairs)
- Initialization
{}
- Accessing elements
- Adding elements
- Changing elements
.values()
and.values()
- using
in
to look into dictionaries - using
for
loops to iterate over dictionaries (e.g., lets build a list of every letter in the alphabet using wordplay data) - A few notes about dictionaries:
- A given key can only have one value, but multiple keys can have the same value.
- Dictionaries do not guarantee ordering (although if you are using new versions of Python, order will be preserved).
- Additional loop control
break
continue
- Note: These can be useful in combination to if statements and can also be super useful for debugging!
- Writing to files:
- Using the
open("whatever.tsv", "w")
function - Using the
with open() as my_file:
statement - Writing to a file with
print(file=my_file)
- Writing a tab-separated value file using "\t" (make sure we leave a header!)
- Now lets open it up and make a little graph
- Using the
- Defining our own functions!
- A little bit on looking for help (if it hasn't come up already)
- Looking at StackOverflow
- Walking through the Python API documentation
- Using a reference card or cheatsheet
Resources and Example Code[edit]
Initialization[edit]
>>> my_dict = {} >>> my_dict {} >>> your_dict = {"Alice" : "chocolate", "Bob" : "strawberry", "Cara" : "mint chip"} >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}
Adding elements to a dictionary[edit]
>>> your_dict["Dora"] = "vanilla" >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}
Accessing elements of a dictionary[edit]
>>> your_dict["Alice"] 'chocolate' >>> your_dict.get("Alice") 'chocolate'
>>> your_dict["Eve"] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'Eve' >>> "Eve" in your_dict False >>> "Alice" in your_dict True >>> your_dict.get("Eve") >>> person = your_dict.get("Eve") >>> print(person) None >>> print(type(person)) <type 'NoneType'> >>> your_dict.get("Alice") 'chocolate'
Changing elements of a dictionary[edit]
>>> your_dict["Alice"] = "coconut" >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}
"Histograms"[edit]
Challenge: using wordplay example from last week, count the number of words that start with each letter.
This kind of problem is very common Data Science, and it is easy with a dictionary.
(note: I will post the solution after class)
For-loops and dictionaries[edit]
There are two common ways to iterate through dictionaries:
>>> ages = {'Tommy': 34, Heather: 30, 'Joanna': 20} >>> for key in ages: >>> print(key + " is " + str(ages[key]) + " years old")
>>> for key, value in ages.items(): >>> print(key + " is " + str(value) + " years old")