Community Data Science Course (Spring 2023)/Week 3 lecture notes: Difference between revisions

From CommunityData
(Created page with "== Online Data Sets: An Important Question == Can you get bulk access to data? '''Bad Signs:''' * You must authenticate as a particular user in order to access data, and you can only see data for that user. (For example: you must log into Instagram's API as a particular user as per [https://www.instagram.com/developer/endpoints/users/ this link!]) '''Good signs:''' * The organization owning the data wants everyone to access it. Like wikipedia or most government data...")
 
 
(3 intermediate revisions by the same user not shown)
Line 13: Line 13:
* For example: once you log into Reddit, you can get all posts about almost anything ([https://developer.twitter.com/en/docs/geo/place-information/api-reference/get-geo-id-place_id Twitter API Docs])
* For example: once you log into Reddit, you can get all posts about almost anything ([https://developer.twitter.com/en/docs/geo/place-information/api-reference/get-geo-id-place_id Twitter API Docs])


==Programming lecture==
==Programming lecture outline==


* Dictionaries!
* Dictionaries!
Line 32: Line 32:
** ''Note:'' These can be useful in combination to if statements and can also be super useful for debugging!
** ''Note:'' These can be useful in combination to if statements and can also be super useful for debugging!
* Writing to files:
* Writing to files:
** Using the open() function
** Using the <code>open("whatever.tsv", "w")</code> function
** Using the with statement
** Using the <code>with open() as my_file:</code> statement
** Writing to a file with print(file=my_file)
** Writing to a file with <code>print(file=my_file)</code>
** Writing a tab-separated value file using "\t" (make sure we leave a header!)
** Writing a tab-separated value file using "\t" (make sure we leave a header!)
** Now lets open it up and make a little graph
** Now lets open it up and make a little graph
Line 41: Line 41:
** Looking at StackOverflow
** Looking at StackOverflow
** Walking through the Python API documentation
** Walking through the Python API documentation
** Using a referece card or cheatsheet
** Using a reference card or cheatsheet


====Initialization====
== Resources and Example Code ==
===Initialization===


  >>> my_dict = {}
  >>> my_dict = {}
Line 52: Line 53:
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}


====Adding elements to a dictionary====
===Adding elements to a dictionary===


  >>> your_dict["Dora"] = "vanilla"
  >>> your_dict["Dora"] = "vanilla"
Line 58: Line 59:
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}


====Accessing elements of a dictionary====
===Accessing elements of a dictionary===


  >>> your_dict["Alice"]
  >>> your_dict["Alice"]
Line 82: Line 83:
  'chocolate'
  'chocolate'


====Changing elements of a dictionary====
===Changing elements of a dictionary===


  >>> your_dict["Alice"] = "coconut"
  >>> your_dict["Alice"] = "coconut"
Line 88: Line 89:
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}


====Histograms====
==="Histograms"===


'''Challenge''': using wordplay example from last week, count the number of words that start with each letter.  
'''Challenge''': using wordplay example from last week, count the number of words that start with each letter.  
Line 96: Line 97:
(note: I will post the solution after class)
(note: I will post the solution after class)


====For-loops and dictionaries====
===For-loops and dictionaries===


There are two common ways to iterate through dictionaries:
There are two common ways to iterate through dictionaries:

Latest revision as of 01:48, 11 April 2023

Online Data Sets: An Important Question[edit]

Can you get bulk access to data?

Bad Signs:

  • You must authenticate as a particular user in order to access data, and you can only see data for that user. (For example: you must log into Instagram's API as a particular user as per this link!)

Good signs:

  • The organization owning the data wants everyone to access it. Like wikipedia or most government data.
  • You may have to authenticate as a particular user, but you can access general data.
  • For example: once you log into Reddit, you can get all posts about almost anything (Twitter API Docs)

Programming lecture outline[edit]

  • Dictionaries!
    • Purpose (use dictionaries to store key/value pairs)
    • Initialization {}
    • Accessing elements
    • Adding elements
    • Changing elements
    • .values() and .values()
    • using in to look into dictionaries
    • using for loops to iterate over dictionaries (e.g., lets build a list of every letter in the alphabet using wordplay data)
    • A few notes about dictionaries:
      • A given key can only have one value, but multiple keys can have the same value.
      • Dictionaries do not guarantee ordering (although if you are using new versions of Python, order will be preserved).
  • Additional loop control
    • break
    • continue
    • Note: These can be useful in combination to if statements and can also be super useful for debugging!
  • Writing to files:
    • Using the open("whatever.tsv", "w") function
    • Using the with open() as my_file: statement
    • Writing to a file with print(file=my_file)
    • Writing a tab-separated value file using "\t" (make sure we leave a header!)
    • Now lets open it up and make a little graph
  • Defining our own functions!
  • A little bit on looking for help (if it hasn't come up already)
    • Looking at StackOverflow
    • Walking through the Python API documentation
    • Using a reference card or cheatsheet

Resources and Example Code[edit]

Initialization[edit]

>>> my_dict = {}
>>> my_dict
{}
>>> your_dict = {"Alice" : "chocolate", "Bob" : "strawberry", "Cara" : "mint chip"}
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}

Adding elements to a dictionary[edit]

>>> your_dict["Dora"] = "vanilla"
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}

Accessing elements of a dictionary[edit]

>>> your_dict["Alice"]
'chocolate'
>>> your_dict.get("Alice")
'chocolate'
>>> your_dict["Eve"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Eve'
>>> "Eve" in your_dict
False
>>> "Alice" in your_dict
True
>>> your_dict.get("Eve")
>>> person = your_dict.get("Eve")
>>> print(person)
None
>>> print(type(person))
<type 'NoneType'>
>>> your_dict.get("Alice")
'chocolate'

Changing elements of a dictionary[edit]

>>> your_dict["Alice"] = "coconut"
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}

"Histograms"[edit]

Challenge: using wordplay example from last week, count the number of words that start with each letter.

This kind of problem is very common Data Science, and it is easy with a dictionary.

(note: I will post the solution after class)

For-loops and dictionaries[edit]

There are two common ways to iterate through dictionaries:

>>> ages = {'Tommy': 34, Heather: 30, 'Joanna': 20}
>>> for key in ages:
>>>     print(key + " is " + str(ages[key]) + " years old")
>>> for key, value in ages.items():
>>>     print(key + " is " + str(value) + " years old")