Editing Community Data Science Course (Spring 2023)/Week 3 coding challenges

== Baby Names ==

[[File:Being a twin means you always have a pillow or blanket handy.jpg|350px]]

[[Baby Names]]: How many babies were born last year with your name? Are a wider variety of names used for boys or for girls? What are the most popular names used for both boys and girls? This set of coding challenges will use data from the US Social Security Administration on baby names from the last several years to answer these questions and more!

== Goals ==

* Have fun exploring real data on baby names in the US
* Practice manipulating and searching strings
* Practice using dictionaries
* Practice using numbers and doing simple arithmetic

==#0 Setup==

* Download the following file that contains the project for the week: https://github.com/CommunityDataScienceCollective/babynames-cdsw/archive/refs/heads/master.zip
* Once you have downloaded the file, extract the contents of the file into a folder on your desktop.
* Open the file <code>BabyNames.ipynb</code> as a Jupyter notebook and run the first cell to make sure that it works.


You'll be playing with data from the list of all baby names in the US (used more than five times in a year) from the last several years:

# Right click the following file, click "Save Target as..." or "Save link as...", and save it to your Desktop directory: http://jtmorgan.net/ds4ux/week3/babynames.zip
# The ".zip" extension on the above file indicates that it is a compressed Zip archive. We need to "extract" its contents. To do this, click on "Start", then "Computer", and navigate to your Desktop directory. Find babynames.zip on your Desktop and double-click on it to "unzip" it. That will create a folder called babynames containing several files.

Each of these files begins with this line:

 import ssadata

This imports the ssadata module which is a special Python module we created for this project that includes only two things:

* <code>boys</code> - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name.
* <code>girls</code> - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name.

== Challenges ==

# Search for your own name. Are there both boys and girls that have your name? Is it more popular for one group than for the other? (''Hint: don't use a for loop for this one.'')
# What is the most common name for each gender? 
# What is the least common name? 
# How often do the least common names occur? (Does that bother you?)
# Are there more boys names or girls names? 
## What about that start with "a"? 
## For every letter, tell if there are more boys names or girls names.
# What is the longest name in the dataset?
# How many boys and girls are described in the dataset (i.e., how many boys and girls born in 2013 have names given to at least four others)?
# How many boys names are also girls names? How many girls names are also boys names?
# What is the most popular girls name that is also a boys name?
# Discover at least one fact about the names that is not listed above.
# ''Challenge'' plot (in Excel) the number of people who share a name with n other people in the data set, where n is 4 to 19.
@@ Line 18: / Line 18: @@
 * Open the file <code>BabyNames.ipynb</code> as a Jupyter notebook and run the first cell to make sure that it works.
-The notebook will begin with a cell that says:
-<syntaxhighlight lang="python" line>
+You'll be playing with data from the list of all baby names in the US (used more than five times in a year) from the last several years:
-  from ssadata import boys, girls
-</syntaxhighlight>
+# Right click the following file, click "Save Target as..." or "Save link as...", and save it to your Desktop directory: http://jtmorgan.net/ds4ux/week3/babynames.zip
+# The ".zip" extension on the above file indicates that it is a compressed Zip archive. We need to "extract" its contents. To do this, click on "Start", then "Computer", and navigate to your Desktop directory. Find babynames.zip on your Desktop and double-click on it to "unzip" it. That will create a folder called babynames containing several files.
+Each of these files begins with this line:
+  import ssadata
 This imports the ssadata module which is a special Python module we created for this project that includes only two things:
@@ Line 29: / Line 33: @@
 * <code>girls</code> - A dictionary where the the keys are names of boys and the values are the number of infants born in 2021 who had that particular name.
-== #1 Your own name! ==
+== Challenges ==
-# Search for your own name. Are there both boys and girls that have your name? Is your name more popular for one group than for the other? (''Hint: don't use a for loop for this one.'')
-== #2 Every baby counts ==
-# How many boy names and girl names are described in the dataset?
-# How many boys and girls (actual babies!) are described in the dataset?
-== #3 A sense of what's common ==
-# What is the most common name for each gender in your data (i.e. 2021)?
-# What is the least common name?
-# How often do the least common names occur? (Does your answer to this question bother you? Why?)
-== #4 She wasn't long for this dataset ==
-# What is the longest name in the dataset? How many boys/girls names are exactly that length? What's going on?
-== #5 Initials to spreadsheets ==
-# Make a dictionary <code>girl_initials</code> that says, for each letter of the alphabet (a-z), how many unique girl names begin with that letter.
-# Do the same for boy names to make <code>boy_initials</code>.
-# Create a "tab separated values" (TSV) file that reports the data in <code>girl_initials</code>. Be sure to include a descriptive header columns! You will probably want the file to end with <code>.tsv</code> so that your computer knows it's a TSV file.
-# Now do the same for <code>boy_initials</code> (be sure to save it into a different file!)
-# Once you've done this, load up the two files into Google Sheets or Excel.
-# For every letter, be ready to tell if there are more boys names or girls names.
-# Play around with graphing and see if you can build some instructive graph that shows us something.
-'''Note:''' Obviously, you won't be able to include your Google Sheet result into your notebook. Instead, please put it online somewhere (e.g., in Google Drive, or OneDrive, or Dropbox or similar) create a link for sharing that doesn't requiring signing in, and put that link into your notebook so we can click through and look at it!
-== #6 Concentration in names ==
-# What percentage of boys have one of the 10 most popular boys names? What percentage of girls have one of the 10 most popular girls names?
-# Take the top 10% most popular boys names. (For instance: If there were 500 boys names, we're looking for the 50 most popular ones.) How many girls were given one of those names? Take the top 10% most popular girls names. How many boys were given one of those names?
-== #7 Something extra ==
-# Discover at least one fact about the names that is not listed above! Include the code, and a description of your observation written in English text, into your notebook.
-== #8 Thinking about this dataset ==
-What are some questions you have about this dataset and it was collected or created? What is at least two challenges that people creating this dataset must have faced? How did they resolve them? What are some assumptions they faced?
+# Search for your own name. Are there both boys and girls that have your name? Is it more popular for one group than for the other? (''Hint: don't use a for loop for this one.'')
+# What is the most common name for each gender?
+# What is the least common name?
+# How often do the least common names occur? (Does that bother you?)
+# Are there more boys names or girls names?
+## What about that start with "a"?
+## For every letter, tell if there are more boys names or girls names.
+# What is the longest name in the dataset?
+# How many boys and girls are described in the dataset (i.e., how many boys and girls born in 2013 have names given to at least four others)?
+# How many boys names are also girls names? How many girls names are also boys names?
+# What is the most popular girls name that is also a boys name?
+# Discover at least one fact about the names that is not listed above.
+# ''Challenge'' plot (in Excel) the number of people who share a name with n other people in the data set, where n is 4 to 19.