Community Data Science Course (Spring 2023)/Week 3 coding challenges: Difference between revisions

From CommunityData
Line 61: Line 61:
== #6 Name twins ==
== #6 Name twins ==


# On average, how many "names twins" will a baby born in 2021 have (i.e., how many other children will share their name)?
# On average, how many "name twins" will a baby born in 2021 have (i.e., how many other children will share their name)?
# How many "name twins" will a boy have on average? How about a girl?
# How many "name twins" will a boy have on average? How about a girl?
# Create a list of names where 90% of the children with that name are listed as girls? And the same for boys?
# Create a list of names where 90% of the children with that name are listed as girls? And the same for boys?

Revision as of 01:49, 11 April 2023

Baby Names

Being a twin means you always have a pillow or blanket handy.jpg

Baby Names: How many babies were born last year with your name? Are a wider variety of names used for boys or for girls? What are the most popular names used for both boys and girls? This set of coding challenges will use data from the US Social Security Administration on baby names from the last several years to answer these questions and more!

Goals

  • Have fun exploring real data on baby names in the US
  • Practice manipulating and searching strings
  • Practice using dictionaries
  • Practice using numbers and doing simple arithmetic

#0 Setup

The notebook will begin with a cell that says:

 from ssadata import boys, girls

This imports the ssadata module which is a special Python module we created for this project that includes only two things:

  • boys - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name.
  • girls - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name.

#1 Your own name!

  1. Search for your own name. Are there both boys and girls that have your name? Is your name more popular for one group than for the other? (Hint: don't use a for loop for this one.)

#2 Every baby counts

  1. How many boy names and girl names are described in the dataset?
  2. How many boys and girls (actual babies!) are described in the dataset?

#3 A sense of what's common

  1. What is the most common name for each gender in 2021?
  2. What is the least common name?
  3. How often do the least common names occur? (Does your answer to this question bother you? Why?)

#4 She wasn't long for this dataset

  1. What is the longest name in the dataset? How many boys/girls names are exactly that length? What's going on?

#5 Initials to spreadsheets

  1. Make a dictionary girl_initials that says, for each letter of the alphabet (a-z), how many unique girl names begin with that letter.
  2. Do the same for boy names to make boy_initials.
  3. Create a "tab separated values" file that reports the data in girl_initials. Be sure to include a descriptive header columns!
  4. Now do the same for boy_initials (be sure to save it into a different file!)
  5. Once you've done this, load up the two files into Google Sheets or Excel.
  6. For every letter, be ready to tell if there are more boys names or girls names.
  7. Play around with graphing and see if you can build some instructive graph that shows us something.

Note: Obviously, you won't be able to include your Google Sheet result into your notebook. That's OK but just be ready to describe what you found!

#6 Name twins

  1. On average, how many "name twins" will a baby born in 2021 have (i.e., how many other children will share their name)?
  2. How many "name twins" will a boy have on average? How about a girl?
  3. Create a list of names where 90% of the children with that name are listed as girls? And the same for boys?
  4. Now create the same list but only include names that are given to at least 1000 children total. Why are the answers different?

#7 Something extra

  1. Discover at least one fact about the names that is not listed above.