Community Data Science Course (Spring 2023)/Week 3 lecture notes: Difference between revisions
From CommunityData
(2 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
* For example: once you log into Reddit, you can get all posts about almost anything ([https://developer.twitter.com/en/docs/geo/place-information/api-reference/get-geo-id-place_id Twitter API Docs]) | * For example: once you log into Reddit, you can get all posts about almost anything ([https://developer.twitter.com/en/docs/geo/place-information/api-reference/get-geo-id-place_id Twitter API Docs]) | ||
==Programming lecture== | ==Programming lecture outline== | ||
* Dictionaries! | * Dictionaries! | ||
Line 32: | Line 32: | ||
** ''Note:'' These can be useful in combination to if statements and can also be super useful for debugging! | ** ''Note:'' These can be useful in combination to if statements and can also be super useful for debugging! | ||
* Writing to files: | * Writing to files: | ||
** Using the open() function | ** Using the <code>open("whatever.tsv", "w")</code> function | ||
** Using the with statement | ** Using the <code>with open() as my_file:</code> statement | ||
** Writing to a file with print(file=my_file) | ** Writing to a file with <code>print(file=my_file)</code> | ||
** Writing a tab-separated value file using "\t" (make sure we leave a header!) | ** Writing a tab-separated value file using "\t" (make sure we leave a header!) | ||
** Now lets open it up and make a little graph | ** Now lets open it up and make a little graph | ||
Line 41: | Line 41: | ||
** Looking at StackOverflow | ** Looking at StackOverflow | ||
** Walking through the Python API documentation | ** Walking through the Python API documentation | ||
** Using a | ** Using a reference card or cheatsheet | ||
== Resources and Example Code == | == Resources and Example Code == |
Latest revision as of 23:48, 10 April 2023
Online Data Sets: An Important Question[edit]
Can you get bulk access to data?
Bad Signs:
- You must authenticate as a particular user in order to access data, and you can only see data for that user. (For example: you must log into Instagram's API as a particular user as per this link!)
Good signs:
- The organization owning the data wants everyone to access it. Like wikipedia or most government data.
- You may have to authenticate as a particular user, but you can access general data.
- For example: once you log into Reddit, you can get all posts about almost anything (Twitter API Docs)
Programming lecture outline[edit]
- Dictionaries!
- Purpose (use dictionaries to store key/value pairs)
- Initialization
{}
- Accessing elements
- Adding elements
- Changing elements
.values()
and.values()
- using
in
to look into dictionaries - using
for
loops to iterate over dictionaries (e.g., lets build a list of every letter in the alphabet using wordplay data) - A few notes about dictionaries:
- A given key can only have one value, but multiple keys can have the same value.
- Dictionaries do not guarantee ordering (although if you are using new versions of Python, order will be preserved).
- Additional loop control
break
continue
- Note: These can be useful in combination to if statements and can also be super useful for debugging!
- Writing to files:
- Using the
open("whatever.tsv", "w")
function - Using the
with open() as my_file:
statement - Writing to a file with
print(file=my_file)
- Writing a tab-separated value file using "\t" (make sure we leave a header!)
- Now lets open it up and make a little graph
- Using the
- Defining our own functions!
- A little bit on looking for help (if it hasn't come up already)
- Looking at StackOverflow
- Walking through the Python API documentation
- Using a reference card or cheatsheet
Resources and Example Code[edit]
Initialization[edit]
>>> my_dict = {} >>> my_dict {} >>> your_dict = {"Alice" : "chocolate", "Bob" : "strawberry", "Cara" : "mint chip"} >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}
Adding elements to a dictionary[edit]
>>> your_dict["Dora"] = "vanilla" >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}
Accessing elements of a dictionary[edit]
>>> your_dict["Alice"] 'chocolate' >>> your_dict.get("Alice") 'chocolate'
>>> your_dict["Eve"] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'Eve' >>> "Eve" in your_dict False >>> "Alice" in your_dict True >>> your_dict.get("Eve") >>> person = your_dict.get("Eve") >>> print(person) None >>> print(type(person)) <type 'NoneType'> >>> your_dict.get("Alice") 'chocolate'
Changing elements of a dictionary[edit]
>>> your_dict["Alice"] = "coconut" >>> your_dict {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}
"Histograms"[edit]
Challenge: using wordplay example from last week, count the number of words that start with each letter.
This kind of problem is very common Data Science, and it is easy with a dictionary.
(note: I will post the solution after class)
For-loops and dictionaries[edit]
There are two common ways to iterate through dictionaries:
>>> ages = {'Tommy': 34, Heather: 30, 'Joanna': 20} >>> for key in ages: >>> print(key + " is " + str(ages[key]) + " years old")
>>> for key, value in ages.items(): >>> print(key + " is " + str(value) + " years old")