Editing Community Data Science Course (Spring 2023)/Week 3 coding challenges
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 18: | Line 18: | ||
* Open the file <code>BabyNames.ipynb</code> as a Jupyter notebook and run the first cell to make sure that it works. | * Open the file <code>BabyNames.ipynb</code> as a Jupyter notebook and run the first cell to make sure that it works. | ||
You'll be playing with data from the list of all baby names in the US (used more than five times in a year) from the last several years: | |||
# Right click the following file, click "Save Target as..." or "Save link as...", and save it to your Desktop directory: http://jtmorgan.net/ds4ux/week3/babynames.zip | |||
# The ".zip" extension on the above file indicates that it is a compressed Zip archive. We need to "extract" its contents. To do this, click on "Start", then "Computer", and navigate to your Desktop directory. Find babynames.zip on your Desktop and double-click on it to "unzip" it. That will create a folder called babynames containing several files. | |||
Each of these files begins with this line: | |||
import ssadata | |||
This imports the ssadata module which is a special Python module we created for this project that includes only two things: | This imports the ssadata module which is a special Python module we created for this project that includes only two things: | ||
Line 29: | Line 33: | ||
* <code>girls</code> - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name. | * <code>girls</code> - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name. | ||
== #1 Your own name! | == #1 Your own your name! == | ||
# | # Search for your own name. Are there both boys and girls that have your name? Is it more popular for one group than | ||
for the other? (''Hint: don't use a for loop for this one.'') | |||
== # | == #2 A sense of what's common == | ||
# What is the most common name for each gender in | # What is the most common name for each gender in 2021? | ||
# What is the least common name? | # What is the least common name? | ||
# How often do the least common names occur? (Does | # How often do the least common names occur? (Does that bother you?) | ||
# What about that start with "a"? | |||
== # | == #3 She wasn't long for this dataset == | ||
# What is the longest name in the dataset? How many boys/girls names are exactly that length? What's going on? | # What is the longest name in the dataset? How many boys/girls names are exactly that length? What's going on? | ||
== # | == #4 Sum it up for me == | ||
# | |||
# How many boys and girls are described in the dataset (i.e., how many boys and girls born in 2021 have names given to at least four others)? | |||
== #5 Name twins == | |||
# On average, how many "names twins" will a baby born in 2021 have (i.e., how many other children will share their name)? | |||
## How many "name twins" will a boy have on average? How about a girl? | |||
# Create a list of names where 90% of the children with that name are listed as girls? And the same for boys? | |||
## Now create the same list but only include names that are given to at least 1000 children total. Why are the answers different? | |||
# | == #6 Write it out == | ||
# Create a tab separated values file that includes each letter of the alphabet (a-z), the number of unique names for that letter for all girls. Be sure to include a descriptive header columns! | |||
## Now do the same for boys (be sure to save it into a different file!) | |||
## Once you've done this, load up the two files into Google Sheets or Excel. | |||
## For every letter, be ready to tell if there are more boys names or girls names. | |||
## Play around with graphing and see if you can build some instructive graph that shows us something. | |||
'''Note:''' Obviously, you won't be able to include your Google Sheet result into your notebook. That's OK but just be ready to describe what you found! | |||
== # | == #7 Something extra == | ||
# Discover at least one fact about the names that is not listed above. |