DS4UX (Spring 2016)/Day 4 lecture: Difference between revisions

From CommunityData
No edit summary
(Replaced content with "==Review== {{:DS4UX_(Spring_2016)/Day_3_follow_up}}")
Line 1: Line 1:
<div style="font-family:Rockwell,'Courier Bold',Courier,Georgia,'Times New Roman',Times,serif; min-width:10em;">
==Review==
<div style="float:left; width:100%; margin-right:2%;">
{{:DS4UX_(Spring_2016)/Day_3_follow_up}}
{{Link/Graphic/Main/2
|highlight color= 27666b
|color=460c40
|link=
|image=
|text-align=left
|top font-size= 1.1em
|top color=FFF
|line color=FFF
|top text=This page is a work in progress.
|bottom font-size= 1em
|bottom color= FFF
|bottom text=
|line= none
}}</div></div>
<div style="clear:both;"></div>
 
 
<!--
*Database concepts (re-use slides)
*Introduction to Wikipedia data
*Introduction to MySQL and Quarry
*Querying with Socrata SOQL API
-->
 
== Lecture 1 ==
*reading/writing tsv files
*reading/writing csv files
 
 
 
== Lecture 2 ==
[[File:Highfivekitten.jpeg|200px|thumb|In which you learn how to use Python and web APIs to meet the likes of her!]]
 
;Introduction and context
 
* You can write some tools in Python now. Congratulations!
* Today we'll learn how to find/create data sets
* Next week we'll get into data science (asking and answering questions)
 
 
;Outline:
 
* What is an API?
* How do we use one to fetch interesting datasets?
* How do we write programs that use the internet?
* How can we use the placekitten API to fetch kitten pictures?
* Introduction to structured data (JSON)
* How do we use APIs in general?
 
 
;What is a (web) API?
 
* API: a structured way for programs to talk to each other (aka an interface for programs)
* Web APIs: like a website your programs can visit (you:a website::your program:a web API)
 
 
; How do we use an API to fetch datasets?
 
Basic idea: your program sends a request, the API sends data back
* Where do you direct your request? The site's API endpoint.
** For example: Wikipedia's web API endpoint is http://en.wikipedia.org/w/api.php
* How do I write my request? Put together a URL; it will be different for different web APIs.
** Check the documentation, look for code samples
* How do you send a request?
** Python has modules you can use, like <code>requests</code> (they make HTTP requests)
* What do you get back?
** Structured data (usually in the JSON format)
* How do you understand (i.e. parse) the data?
** There's a module for that!
 
 
; How do we write Python programs that make web requests?
 
To use APIs to build a dataset we will need:
* all our tools from last session: variables, etc
* the ability to open urls on the web
* the ability to create custom URLS
* the ability to save to files
* the ability to understand (i.e., parse) JSON data that APIs usually give us
 
 
; New programming concepts:
 
* interpolate variables into a string using % and %()s
* requests
* open files and write to them
 
 
; How do we use an API to fetch kitten pictures?
 
[http://placekitten.com/ placekitten.com]
* API that takes specially crafted URLs and gives appropriately sized picture of kittens
* Exploring placekitten in a browser:
** visit the API documentation
** kittens of different sizes
** kittens in greyscale or color
* Now we write a small program to grab an arbitrary square from placekitten by asking for the size on standard in: [http://mako.cc/teaching/2014/cdsw-autumn/placekitten_raw_input.py placekitten_raw_input.py]
 
 
; Introduction to structured data (JSON, JavaScriptObjectNotation)
 
* what is json: useful for more structured data
* import json; json.loads()
* like Python (except no single quotes)
* simple lists, dictionaries
* can reflect more complicated data structures
* Example file at http://mako.cc/cdsw.json
* You can parse data directly with <code>.json()</code> on a <code>requests</code> call
 
; Using other APIs
 
* every API is different, so read the documentation!
* If the documentation isn't helpful, search online
* for popular APIs, there are python modules that help you make requests and parse json
 
Possible issues:
* rate limiting
* authentication
* text encoding issues
 
== Other Potentially Resources ==
 
My friend Frances gave a version of this lecture last year and create slides. They are written for Python 2, so the code might not all work (remember, use <Code>print()</code> with parentheses) but the basic ideas might be helpful:
 
* [http://mako.cc/teaching/2014/cdsw-autumn/lecture2-web_apis.pdf Slides (PDF)] — For viewing
* [http://mako.cc/teaching/2014/cdsw-autumn/lecture2-web_apis.odp Slides (ODP Libreoffice Slides Format)] — For editing and modification
 
<!-- FROM http://wiki.communitydata.cc/Community_Data_Science_Workshops_(Spring_2015)/Day_3_Lecture
 
== Material for the lecture ==
 
For the lecture, you will need two files. Download both of these to your computer by using right or control click on the link and then using ''Save as'' or ''Save link as''. Keep track of where you put the files.
 
* http://mako.cc/teaching/2015/cdsw-spring/harrypotter-wikipedia-cdsw.zip
* http://communitydata.cc/~mako/hp_wiki.tsv
 
== Overview of the day ==
 
* Lecture
** Our philosophy around data visualization
** Introduce some new programming tools!
** We're going to walk through some analysis of edits to Harry Potter in Wikipedia, start to finish
** We'll focus on manipulating data in Python
** Visualizing things in Google Docs
* Project based work
** More [[Harry Potter on Wikipedia]] project (or your own topic) on doing analysis using Google Docs
** [[Matplotlib]]
** Civic Data - More interactive working on projects
 
== Lecture outline ==
 
'''Step 1: Pre-Requisites'''
 
* My philosophy about data analysis: ''use the tools you have''
* Four things in Python I have to teach you:
** while loops
*** infinite loops
*** loops with a greater than or less than
** break / continue
** "\t".join()
** defining your own functions with <code>def foo(argument):</code>
 
'''Step 2: Walking through a Program'''
 
* Walk-through of <code>get_hpwp_dataset.py</code>
* Look at dataset with <code>more</code> and/or in spreadsheet
 
'''Step 3: Loading Data Back In'''
 
* Load data into Python
** review of opening files
*** we can also open them for reading with <code>open('file', 'r', encoding="utf-8")</code>
** csv.DictReader()
* Basic counting: <code>hpwp-minor.py</code>
** Answer question: ''What proportion of edits to Wikipedia Harry Potter articles are minor?''
*** Count the number of minor edits and calculate proportion
* Looking at time series data <code>hpwp-trend.py</code>
** "Bin" data by day to generate the trend line
* Exporting and visualizing data
** Export dataset on edits over time
** Export dataset on articles over users
** Load data into Google Docs
-->
[[Category:DS4UX (Spring 2016)]]

Revision as of 00:19, 18 April 2016

Review

Here are some important concepts that we didn't have a chance to go into in great detail last week. You can use the sections below to review the concepts individually. You can also review how they work together in math_game.py, which is included in the week 4 lecture files.

Return random values with the random module

Use random.choice() to select items at random from a list.

>>> import random
>>> my_list = ["terry j.","john","parrot","michael","terry g.", "graham", "llama"]
>>> random.choice(my_list)
'graham'
>>> random.choice(my_list)
'terry j.'
>>> 

Use random.sample() to gather a given number of random items from a list. The first argument you pass to the random.sample() function is the set of items you are sampling from. The second argument is the number of items you want to gather from that set.

>>> random.sample(my_list,3)
['terry j.', 'llama', 'michael']

Use random.randint() to gather a random number from a list of numbers. You specify the list of sequential numbers by passing the starting number as the first argument, and the final number as the last argument. Unlike with range() function discussed below, when you use randint() both the first and last numbers you specify are included in the set you are sampling from.

>>> random.randint(1,10)
8
>>> random.randint(1,10)
3
>>> random.randint(1,10)
10
>>> 

Generating a list of numbers easily with range()

>>> range(5)
[0, 1, 2, 3, 4]
>>> for i in range(5):
...     print("Hi" * i)
...

Hi
HiHi
HiHiHi
HiHiHiHi

The range() function returns a list of numbers. This is handy for when you want to generate a list of numbers on the fly instead of creating the list yourself.

>>> range(5)
[0, 1, 2, 3, 4]

Use range when you want to loop over a bunch of numbers in a list, or perform an operation a certain number of times:

>>> numbers = range(5)
>>> for number in numbers:
...     print(number * number)
...
0
1
4
9
16

We could rewrite the above example like this:

>>> for number in range(5):
...     print(number * number)
...
0
1
4
9
16

You can also set the start, end, and increment value (called "step") for a range.

>>> for i in range(2,20,2):
...         print(i)
2 
4
6
8
10
12
14
16
18


Get user input with input()

>>> for i in range(100):
...     my_input = input("Please type something> ")
...     if my_input == "Quit":
...         print("Goodbye!")
...         break
...     else:
...         print("You said: " + my_input)
... 
Please type something> Hello
You said: Hello
Please type something> How are you?
You said: How are you?
Please type something> Quit
Goodbye!
>>>

Things to remember about input()

  • Input() simply asks the user to type something.
  • You can test out input() interactively. Just go into the python interpreter and type: input("What's your favorite color?")
  • The stuff that goes inside the parentheses is the "prompt". It's a string, and should be surrounded by quotes. When you run your program, the prompt text will be shown to the user right to the left of the blinking cursor where they will type their input.
  • Python will ask the user to type something at the point in the script where input() is called. Remember that Python executes scripts from top to bottom, left to right. If you put input inside a loop, it will ask the user to type something every time the loop is executed in your script.
  • What you DO with that user input is up to you. The best thing to do is to save it as a variable, i.e. user_name = input("Please type your name")
  • Python saves user input as a string, so if the user types "Daria" in the example above, then user_name will equal "Daria".
  • Once you've saved your user's input, you can use it like any other string variable. In the case of the babynames challenges, you probably want to compare it with the keys in one of the babynames dictionaries (ssadata.boys or ssadata.girls), so that you can find out how many people share that name. These keys are also strings.
  • REMEMBER: the keys in the babynames dictionaries are all in lowercase, but you can't necessarily control how a user will type their input--it's natural that people will want to capitalize their own name! Fortunately, there are string methods (https://docs.python.org/3/library/stdtypes.html#string-methods (Links to an external site.)) that will convert any string into all lowercase. You can make a string lowercase by adding .lower() to the end of the string (or the variable that holds the string)!

Iterating an indeterminate number of times with while loops

Use while loops when you don't know how many times you want to repeat ("iterate") an operation.

grocery_list = []
testAnswer = input('Press y if you want to enter more groceries: ')
while testAnswer == 'y':
    food = input('Next item:')
    grocery_list.append(food)
    testAnswer = input('Press y if you want to enter more groceries: ')
print('Your grocery list:')
for food in grocery_list:
    print(food)

Most of the time, you will find that for loops are more common for the kind of coding that you will be doing. For example, if you are reading through a CSV file, a for loop makes perfect sense: there are a set number of lines in the file, and you want to loop through the file line by line until you reach the end of the file. However, whenever your code is accepting input from a person or an API, you may find that you don't know ahead of time how many times you will need to perform an operation before stopping. In these cases, it's useful to know how to keep looping until a particular condition is met, and then stop.

Splicing list items together with .join

Use .join() when you have a list of string items that you want to join together into a single string. You specify the DELIMITER (the thing you want to separate the items) in quotes first, then call the join() function by appending a dot (".") followed by the word join and—inside the parentheses—the list that you want to join together.

>>> print("The members of Monty Python are: %s" % (", ".join(my_list)))
The members of Monty Python are: terry j., john, parrot, michael, terry g., graham, llama, eric


Putting it all together with a math game

"""
It uses the concepts that we just reviewed (random, range, input, and while) to build a math guessing game.
random.choice, range, input, while, and join.

This program asks people to add together two random numbers between 1 and 1000, and keep asking them new questions as long as they gave the answer right to the previous math problem. Once they give an incorrect answer, it prints out how many they got right, and also prints all their correct responses using join.
"""
import random

numbers_to_add = list(range(1,1001))
correct_answers = []
true_answer = 0
your_answer = 0
while true_answer == your_answer:
    num1 = random.choice(numbers_to_add)
    num2 = random.choice(numbers_to_add)
    true_answer = num1 + num2
    your_answer = int(input("%d + %d = " % (num1,num2)))
    if your_answer == true_answer:
        print("Correct! Let's try another.")
        correct_answers.append("%d + %d = %s" % (num1, num2, your_answer))
    else:
        print("Incorrect!")

print("You got %d problems right:" % (len(correct_answers)))
print(", ".join(correct_answers))