Community Data Science Course (Spring 2023)/Week 3 lecture notes: Difference between revisions

From CommunityData
(Created page with "== Online Data Sets: An Important Question == Can you get bulk access to data? '''Bad Signs:''' * You must authenticate as a particular user in order to access data, and you can only see data for that user. (For example: you must log into Instagram's API as a particular user as per [https://www.instagram.com/developer/endpoints/users/ this link!]) '''Good signs:''' * The organization owning the data wants everyone to access it. Like wikipedia or most government data...")
 
Line 43: Line 43:
** Using a referece card or cheatsheet
** Using a referece card or cheatsheet


====Initialization====
== Resources and Example Code ==
===Initialization===


  >>> my_dict = {}
  >>> my_dict = {}
Line 52: Line 53:
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}


====Adding elements to a dictionary====
===Adding elements to a dictionary===


  >>> your_dict["Dora"] = "vanilla"
  >>> your_dict["Dora"] = "vanilla"
Line 58: Line 59:
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}


====Accessing elements of a dictionary====
===Accessing elements of a dictionary===


  >>> your_dict["Alice"]
  >>> your_dict["Alice"]
Line 82: Line 83:
  'chocolate'
  'chocolate'


====Changing elements of a dictionary====
===Changing elements of a dictionary===


  >>> your_dict["Alice"] = "coconut"
  >>> your_dict["Alice"] = "coconut"
Line 88: Line 89:
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}
  {'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}


====Histograms====
==="Histograms"===


'''Challenge''': using wordplay example from last week, count the number of words that start with each letter.  
'''Challenge''': using wordplay example from last week, count the number of words that start with each letter.  
Line 96: Line 97:
(note: I will post the solution after class)
(note: I will post the solution after class)


====For-loops and dictionaries====
===For-loops and dictionaries===


There are two common ways to iterate through dictionaries:
There are two common ways to iterate through dictionaries:

Revision as of 01:41, 11 April 2023

Online Data Sets: An Important Question

Can you get bulk access to data?

Bad Signs:

  • You must authenticate as a particular user in order to access data, and you can only see data for that user. (For example: you must log into Instagram's API as a particular user as per this link!)

Good signs:

  • The organization owning the data wants everyone to access it. Like wikipedia or most government data.
  • You may have to authenticate as a particular user, but you can access general data.
  • For example: once you log into Reddit, you can get all posts about almost anything (Twitter API Docs)

Programming lecture

  • Dictionaries!
    • Purpose (use dictionaries to store key/value pairs)
    • Initialization {}
    • Accessing elements
    • Adding elements
    • Changing elements
    • .values() and .values()
    • using in to look into dictionaries
    • using for loops to iterate over dictionaries (e.g., lets build a list of every letter in the alphabet using wordplay data)
    • A few notes about dictionaries:
      • A given key can only have one value, but multiple keys can have the same value.
      • Dictionaries do not guarantee ordering (although if you are using new versions of Python, order will be preserved).
  • Additional loop control
    • break
    • continue
    • Note: These can be useful in combination to if statements and can also be super useful for debugging!
  • Writing to files:
    • Using the open() function
    • Using the with statement
    • Writing to a file with print(file=my_file)
    • Writing a tab-separated value file using "\t" (make sure we leave a header!)
    • Now lets open it up and make a little graph
  • Defining our own functions!
  • A little bit on looking for help (if it hasn't come up already)
    • Looking at StackOverflow
    • Walking through the Python API documentation
    • Using a referece card or cheatsheet

Resources and Example Code

Initialization

>>> my_dict = {}
>>> my_dict
{}
>>> your_dict = {"Alice" : "chocolate", "Bob" : "strawberry", "Cara" : "mint chip"}
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Alice': 'chocolate'}

Adding elements to a dictionary

>>> your_dict["Dora"] = "vanilla"
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'chocolate'}

Accessing elements of a dictionary

>>> your_dict["Alice"]
'chocolate'
>>> your_dict.get("Alice")
'chocolate'
>>> your_dict["Eve"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Eve'
>>> "Eve" in your_dict
False
>>> "Alice" in your_dict
True
>>> your_dict.get("Eve")
>>> person = your_dict.get("Eve")
>>> print(person)
None
>>> print(type(person))
<type 'NoneType'>
>>> your_dict.get("Alice")
'chocolate'

Changing elements of a dictionary

>>> your_dict["Alice"] = "coconut"
>>> your_dict
{'Bob': 'strawberry', 'Cara': 'mint chip', 'Dora': 'vanilla', 'Alice': 'coconut'}

"Histograms"

Challenge: using wordplay example from last week, count the number of words that start with each letter.

This kind of problem is very common Data Science, and it is easy with a dictionary.

(note: I will post the solution after class)

For-loops and dictionaries

There are two common ways to iterate through dictionaries:

>>> ages = {'Tommy': 34, Heather: 30, 'Joanna': 20}
>>> for key in ages:
>>>     print(key + " is " + str(ages[key]) + " years old")
>>> for key, value in ages.items():
>>>     print(key + " is " + str(value) + " years old")