Editing Community Data Science Course (Spring 2023)/Week 3 coding challenges

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 29: Line 29:
* <code>girls</code> - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name.
* <code>girls</code> - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name.


== #1 Your own name! ==
== #1 Your own your name! ==


# Search for your own name. Are there both boys and girls that have your name? Is your name more popular for one group than for the other? (''Hint: don't use a for loop for this one.'')
# Search for your own name. Are there both boys and girls that have your name? Is your name more popular for one group than for the other? (''Hint: don't use a for loop for this one.'')
Line 36: Line 36:


# How many boy names and girl names are described in the dataset?
# How many boy names and girl names are described in the dataset?
# How many boys and girls (actual babies!) are described in the dataset?
# How many boys and girls are described in the dataset?


== #3 A sense of what's common ==
== #3 A sense of what's common ==


# What is the most common name for each gender in your data (i.e. 2021)?
# What is the most common name for each gender in 2021?
# What is the least common name?
# What is the least common name?
# How often do the least common names occur? (Does your answer to this question bother you? Why?)
# How often do the least common names occur? (Does your answer to this question bother you? Why?)
# What about boys names and girls names that start with "a"?


== #4 She wasn't long for this dataset ==
== #4 She wasn't long for this dataset ==
Line 48: Line 49:
# What is the longest name in the dataset? How many boys/girls names are exactly that length? What's going on?
# What is the longest name in the dataset? How many boys/girls names are exactly that length? What's going on?


== #5 Initials to spreadsheets ==
== #5 Name twins ==
# Make a dictionary <code>girl_initials</code> that says, for each letter of the alphabet (a-z), how many unique girl names begin with that letter.
 
# Do the same for boy names to make <code>boy_initials</code>.
# On average, how many "names twins" will a baby born in 2021 have (i.e., how many other children will share their name)?
# Create a "tab separated values" (TSV) file that reports the data in <code>girl_initials</code>. Be sure to include a descriptive header columns! You will probably want the file to end with <code>.tsv</code> so that your computer knows it's a TSV file.
# How many "name twins" will a boy have on average? How about a girl?
# Now do the same for <code>boy_initials</code> (be sure to save it into a different file!)
# Create a list of names where 90% of the children with that name are listed as girls? And the same for boys?
# Now create the same list but only include names that are given to at least 1000 children total. Why are the answers different?
 
== #6 Write it out ==
 
# Create a tab separated values file that includes each letter of the alphabet (a-z), the number of unique names for that letter for all girls. Be sure to include a descriptive header columns!
# Now do the same for boys (be sure to save it into a different file!)
# Once you've done this, load up the two files into Google Sheets or Excel.
# Once you've done this, load up the two files into Google Sheets or Excel.
# For every letter, be ready to tell if there are more boys names or girls names.
# For every letter, be ready to tell if there are more boys names or girls names.
# Play around with graphing and see if you can build some instructive graph that shows us something.  
# Play around with graphing and see if you can build some instructive graph that shows us something.  


'''Note:''' Obviously, you won't be able to include your Google Sheet result into your notebook. Instead, please put it online somewhere (e.g., in Google Drive, or OneDrive, or Dropbox or similar) create a link for sharing that doesn't requiring signing in, and put that link into your notebook so we can click through and look at it!
'''Note:''' Obviously, you won't be able to include your Google Sheet result into your notebook. That's OK but just be ready to describe what you found!
 
== #6 Concentration in names ==
 
# What percentage of boys have one of the 10 most popular boys names? What percentage of girls have one of the 10 most popular girls names?
# Take the top 10% most popular boys names. (For instance: If there were 500 boys names, we're looking for the 50 most popular ones.) How many girls were given one of those names? Take the top 10% most popular girls names. How many boys were given one of those names?


== #7 Something extra ==
== #7 Something extra ==


# Discover at least one fact about the names that is not listed above! Include the code, and a description of your observation written in English text, into your notebook.
# Discover at least one fact about the names that is not listed above.
 
== #8 Thinking about this dataset ==
 
What are some questions you have about this dataset and it was collected or created? What is at least two challenges that people creating this dataset must have faced? How did they resolve them? What are some assumptions they faced?
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)