Editing Community Data Science Course (Spring 2019)

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
:'''Community Data Science: Programming and Data Science for Communicators'''
:'''Community Data Science: Programming and Data Science for Social Media'''
:'''COMMLD 520 B''' - Department of Communication
:'''COM597I''' - Department of Communication
:'''Location:''' CMU 302
:'''Instructor:''' [http://guyrt.github.com Richard Thomas (Tommy) Guy]  
:'''Instructor:''' [http://guyrt.github.com Richard Thomas (Tommy) Guy]  
:'''Course Website''': We will use Canvas for assignments and, and [https://canvas.uw.edu/courses/1272567/discussion_topics discussion]. Everything else will be linked on this page.
:'''Course Website''': We will use Canvas for assignments and, and [https://canvas.uw.edu/courses/1272567/discussion_topics discussion]. Everything else will be linked on this page.
:'''Course Catalog Description:'''
:'''Course Catalog Description:'''


This course will introduce basic programming and data science tools to give students the skills to use data to read, critique, and produce stories and insights. The class will cover the basics of the Python programming language, acquiring and processing public data, and basic tools and techniques for data analysis and visualization. We will focus on gaining access to data and basic data manipulation rather than complex statistical methods. The class will be built around student-designed independent projects and is targeted at students with no previous programming experience.
This course will introduce basic programming and data science tools to give students the skills to use data to answer questions about social media and online communities. The class will cover the basics of the Python programming language, acquiring and processing public data, and basic tools and techniques for data analysis and visualization. We will focus on gaining access to data and basic data manipulation rather than complex statistical methods. The class will be built around student-designed independent projects and is targeted at students with no previous programming experience.


== Overview and Learning Objectives ==
== Overview and Learning Objectives ==
<div style="float:right;">__TOC__</div>
<div style="float:right;">__TOC__</div>


In a world that is increasingly driven by software and data, developing a basic level of fluency with programming and the basic tools of data analysis is a crucial skill. This course will introduce basic programming and data science tools to give students the skills to operate in a data-driven environment.
In a world that is increasingly driven by software and data, developing a basic level of fluency with programming and the basic tools of data analysis is a crucial skill. This course will introduce basic programming and data science tools to give students the skills to use data to answer questions about social media and online communities.


In particular, the class will cover the basics of the Python programming language, an introduction to web APIs, and will teach basic tools and techniques for data analysis and visualization. In order to efficiently cover an end to end data analysis project, we will focus on publicly available data sets from the United States Government and the City of Seattle. Our goal is to enable you to gather and analyze data from any available source, but there are often subtle differences between data providers, and I would prefer that we see the full process once than get bogged down in data collection. Time will also be reserved to cover data access for popular social media platforms including Twitter.
In particular, the class will cover the basics of the Python programming language, an introduction to web APIs, and will teach basic tools and techniques for data analysis and visualization. In order to efficiently cover an end to end data analysis project, we will focus on publicly available data sets from the United States Government and the City of Seattle. Our goal is to enable you to gather and analyze data from any available source, but there are often subtle differences between data providers, and I would prefer that we see the full process once than get bogged down in data collection. Time will also be reserved to cover data access for popular social media platforms including Twitter.
Line 17: Line 16:
As part of the class, participants will learn to write software in Python to collect data from web APIs and process that data to produce numbers, hypothesis tests, tables, and graphical visualizations that answer real questions. The class will be built around student-designed independent projects. Every student will pick a question or issue they are interested in pursuing in the first week and will work with the instructor to build from that question toward a completed analysis of data that the student has collected using software they have written.
As part of the class, participants will learn to write software in Python to collect data from web APIs and process that data to produce numbers, hypothesis tests, tables, and graphical visualizations that answer real questions. The class will be built around student-designed independent projects. Every student will pick a question or issue they are interested in pursuing in the first week and will work with the instructor to build from that question toward a completed analysis of data that the student has collected using software they have written.


This is not a computer science class and I am not going to be training you to become professional programmers. This introduction to programming is intentionally quick and dirty and is focused on what you need to get things done. We will focus on effectively answering questions from public data sets by writing your own software and by managing and communicating more effectively with programmers.
This is not a computer science class and I am not going to be training you to become professional programmers. This introduction to programming is intentionally quick and dirty and is focused on what you need to get things done. We will focus on effectively answering questions about social media by writing your own software and by managing and communicating more effectively with programmers.




Line 48: Line 47:


* I expect you to come to class every day ''with your own laptop''. Windows, Mac OS and Linux are all fine but an iPad or Android tablet is not going to cut it. We're going to install software during the class and you'll be working on projects for homework so please bring the same laptop each time. If for some reason your laptop dies mid-course, please contact me so we can get your new one up to speed.
* I expect you to come to class every day ''with your own laptop''. Windows, Mac OS and Linux are all fine but an iPad or Android tablet is not going to cut it. We're going to install software during the class and you'll be working on projects for homework so please bring the same laptop each time. If for some reason your laptop dies mid-course, please contact me so we can get your new one up to speed.
* If you need access to a computer, please reach out to me as soon as possible. The Department has laptops you can borrow for the course, but it's important to have that laptop in the first week.
* I can be reached at the following: richardtguy84@gmail.com or guyrt@uw.edu (it all flows to the same place). Email is generally the easiest way to reach out, but Google Hangouts at richardtguy84 will also work. Like many of you, I work 9-5 but I commit to responding to any email within 24 hours of receipt and generally faster than that.
* I can be reached at the following: richardtguy84@gmail.com or guyrt@uw.edu (it all flows to the same place). Email is generally the easiest way to reach out, but Google Hangouts at richardtguy84 will also work. Like many of you, I work 9-5 but I commit to responding to any email within 24 hours of receipt and generally faster than that.


Line 60: Line 58:




In this assignment, you should identify an area of interest, at least 2 sources with relevant data, and at least 3-4 questions that you plan to explore. We will discuss appropriate data sources for your project in the first and second week of the course. I am hoping that each of you will pick an area that you are intellectually committed to and invested in (e.g., in your business or personal life). You will be successful if you describe the scope of the problem and explain why you think the data sources you've identified are relevant.  
In this assignment, you should identify an area of interest, at least 2 sources with relevant data, and at least 3-4 questions that you plan to explore. I am hoping that each of you will pick an area that you are intellectually committed to and invested in (e.g., in your business or personal life). You will be successful if you describe the scope of the problem and explain why you think the data sources you've identified are relevant.  


   
   
Line 67: Line 65:
=== Final Project Proposal ===  
=== Final Project Proposal ===  
:'''Maximum Length:''' 1500 words (~5 pages)
:'''Maximum Length:''' 1500 words (~5 pages)
:'''Due Date:''' Week 8
:'''Due Date:''' Week 7


This proposal should focus on two questions:
This proposal should focus on two questions:
Line 107: Line 105:
Finally, you should also share with me the full Python source code you used to collect the data as well as the data set itself. Your code along will not form a large portion of your final grade. Rather, I will focus on the degree to which you have been successful at answering the ''substantive'' questions you have identified.
Finally, you should also share with me the full Python source code you used to collect the data as well as the data set itself. Your code along will not form a large portion of your final grade. Rather, I will focus on the degree to which you have been successful at answering the ''substantive'' questions you have identified.


Visualization is critical to storytelling, so 25% of your grade for this project will be determined by the visualizations and tables in your report. Good visualizations should "stand alone" and motivate the core results in your paper all by themselves. A good question to keep in mind is "could I tell this story with the visualizations and a tweet?"
At least 25% of your grade for this project will be determined by the visualizations and tables in your report. Good visualizations should "stand alone" and motivate the core results in your paper all by themselves. A good question to keep in mind is "could I tell this story with the visualizations and a tweet?"


==== Presentation ====
==== Presentation ====
Line 133: Line 131:


Please do not share answers to challenges before midnight on Sunday so that everybody has a chance to work through answers on their own. After midnight on Sunday, you are all welcome and encouraged to share your solutions and/or to discuss different approaches.  We will discuss the coding challenges for a short period of time at the beginning of each class.
Please do not share answers to challenges before midnight on Sunday so that everybody has a chance to work through answers on their own. After midnight on Sunday, you are all welcome and encouraged to share your solutions and/or to discuss different approaches.  We will discuss the coding challenges for a short period of time at the beginning of each class.
== Grades ==
Assignments will accrue to your final grade in the following way:
* 10% will be class participation, including attendance, participation in discussions and group work, and significant effort towards weekly assignments.
* 5% will be the Final Project Idea.
* 10% will be the Final Project Proposal.
* 50% will be the Final Project write up including visualizations.
* 25% will be your Final Presentation including your slides and presentation.


== Schedule ==
== Schedule ==
Line 149: Line 136:
'''This section will be updated weekly'''  This section will be modified throughout the course to introduce the week's material and any hand-ins. Check back in weekly.
'''This section will be updated weekly'''  This section will be modified throughout the course to introduce the week's material and any hand-ins. Check back in weekly.


=== Week 1: April 3 ===
=== Week 1: TBD ===


'''Readings:'''
'''Readings:'''
Line 156: Line 143:


* Class overview and expectations — We'll walk through this syllabus.
* Class overview and expectations — We'll walk through this syllabus.
* [[Community_Data_Science_Course/Day_1_Exercise|Day 1 Exercise]] — You'll install software including the Python programming language and run through a series of exercises.
* [[Community_Data_Science_Course_%28Spring_2017%29/Day_1_Exercise|Day 1 Exercise]] — You'll install software including the Python programming language and run through a series of exercises.
* [[Community_Data_Science_Course_(Spring_2017)/Day_1_Tutorial|Day 1 Tutorial]] — You'll work through a self-guided tutorial introducing you to some basic concepts. When you're done, you'll meet with me and I'll check you off.
* [[Community_Data_Science_Course_%28Spring_2017%29/Day_1_Tutorial|Day 1 Tutorial]] — You'll work through a self-guided tutorial introducing you to some basic concepts. When you're done, you'll meet with me and I'll check you off.


* A few interesting links we discussed in class are [[Community_Data_Science_Course_%28Spring_2019%29/DataSources|here]]
* A few interesting links we discussed in class are [[Community_Data_Science_Course_%28Spring_2017%29/DataSources|here]]
* Hints
* Hints
** For exercise 5, look at chapter 3 of the textbook. This introduces "if" statements.
** For exercise 5, look at chapter 3 of the textbook. This introduces "if" statements.
Line 168: Line 155:
* Have written your first program in the python language.
* Have written your first program in the python language.


=== Week 2: April 10 ===
'''Assignment Due (nothing to turn in):'''
Read chapters 2 and 3 of Python for Everyone:
* Chapter 2, Variables
* Chapter 3, Conditionals
Finish setup, tutorial and code academy in the week 01 exercises.
Do the Tip Calculator exercise in Code Academy. You can access this exercise after you finish the first 14 exercises.
'''Class Schedule:'''
* Discuss a successful final project from last year.
* [[Community_Data_Science_Course_%28Spring_2019%29/Day_2_Lecture|Lecture notes]]
* Review material from last week: variables, assignments, if statements
* Introduce new material: loops and lists
* Project time — We'll begin working on the [[wordplay]] independent projects independently or in small groups.
Here are your [[Community_Data_Science_Course_(Spring_2019)/Day_2_Coding_Challenges|Exercises]]
'''By the end of class you will:'''
* Have written a program with loops and lists.
* Have a better understanding of the expectations for your final project, and be ready to hand in your initial assignment.
=== Week 3: April 17 ===
'''Assignment Due:'''
Final project idea.  Turn in on [https://canvas.uw.edu/courses/1272567/assignments/4788468 Canvas].
Finish Wordplay examples
Reading
* Read chapter 4, 5 of Python for Informatics:
** Functions (this is mostly new)
** Iteration (this is mostly review)
'''Course plan:'''
* Go over last week's assignment.
* Dictionaries and aggregations [[Community Data Science Course (Spring 2019)/Day 3 Notes|Day 3 Notes]]
* A break! Let's really aim for 7:30 this time.
* Discuss average, median using the wordplay data.
* Project time — We'll begin working on a series of project based on the [http://mako.cc/teaching/2015/cdsw-autumn/babynames.zip Baby names] project.
* [[Community Data Science Course (Spring 2019)/Day 3 Coding Challenges|Day 3 Coding Challenges]]
'''Resources:'''
* [[Python_data_types_cheat_sheet]] A cheat sheet with everything we've covered in class so far including today.
=== Week 4: April 24 ===
'''Assignment Due:'''
Finish Baby Names examples.
Reading
* Read chapters 10 and 8 of Python for Informatics: Dictionaries and Files.
'''Course Plan'''
* Let's discuss two visualizations I found.
* Discuss week of May 8. I'm in North Carolina.
* Go over last week's assignment.
* Discuss histograms in python, and build a few.
* Project time - We'll reuse the babynames code.
* [[Community Data Science Course (Spring 2019)/Day 4 Coding Challenges|Day 4 Coding Challenges]]
=== Week 5: May 1 ===
'''Assignment Due:'''
Turn in (on canvas!) solution to this problem:


List '''how many babies''' were born that share a name with 4, 6, 7, 8, ..., 19 other babies. Also, list how many babies share names with more than 20 other babies under the key "common".
'''Course Plan'''
* Let's discuss week of May 8. (Doodle poll results)
* Go over last week's assignment and review histograms.
* Discuss APIs and downloading data from the internet. Refer to [[Community Data Science Course (Spring 2019)/Day 5 Notes|Day 5 Notes]]
* Spend time on [[Community Data Science Course (Spring 2019)/Day 5 Coding Challenges|Day 5 Coding Challenges]]
=== Week 7: May 15 ===
'''Course Plan'''
* Let's discuss remaining schedule
* Discuss data downloading and cleaning. Refer to [[Community Data Science Course (Sprint 2019)/Day 7 Notes|Day 7 Notes]]
* We will be discussing this data set: https://data.seattle.gov/Transportation/Collisions/vac5-r8kk
* Spend time on [[Community Data Science Course (Spring 2019)/Day 7 Coding Challenges|Day 7 Coding Challenges]] which are group challenges.
=== Week 8: May 22 ===
'''Assignment Due:'''
Final Project Proposal. Canvas link [https://canvas.uw.edu/courses/1272567/assignments/4821879 here].
'''Course Plan'''
* Discuss pivot tables in Excel
* [[Community Data Science Course (Spring 2019)/Day 8 notes|Day 8 notes]]
=== Week 9: May 29 ===
'''Assignment Due:'''
Nothing! But I hope you are making good progress.
'''Course Plan'''
* Follow up from last week: let's discuss inference and A/B testing.
** [https://www.exp-platform.com/Documents/2016-11BestRefutedCausalClaimsFromObservationalStudies.pdf Examples of bad observational studies]
* Visualization dos and don'ts. We'll discuss the European Environmental Agency's [https://www.eea.europa.eu/data-and-maps/daviz/learn-more/chart-dos-and-donts list of advice for making charts]. **I will refer to this guide as a grade your final projects.**
* Two options for remainder of class. You can work through this introductory guide to visualization in python or you can work on your final project. I'll be here to answer any questions.
'''Optional visualization in python tutorial'''
Self-guided visualization tutorial in python. [https://raw.githubusercontent.com/guyrt/teaching/master/2019/Com520B/VisualizationNotebook.ipynb Download here]. Save the file in a new directory in your desktop and open it with jupyter notebook
If you are on Windows, you may run into an issue with missing path variables. [https://stackoverflow.com/questions/52821162/jupyter-notebook-failed-to-load-dll This SO post helped me solve it.]
=== Week 10: June 5 ===
'''Assignment Due:'''
Final Project Presentation!


== Administrative Notes ==
== Administrative Notes ==
Line 303: Line 161:
=== Attendance ===
=== Attendance ===


While we understand that as a professional program students will now and again have work or personal conflicts, it is expected that students communicate well in advance to faculty so that arrangements can be made for making up the work that was missed. It is the students' responsibility to seek out support from classmates for notes, handouts, and other information.  
Attendance in class is expected of all participants. This class is going to move very quickly and the things we learn will build on the things we've covered the week before. ''It will be extremely difficult to miss classes.'' If you need to miss class for any reason, please contact the instructor ahead of time (email is best). Multiple unexplained absences will likely result in a lower grade or (in extreme circumstances) a failing grade. In the event of an absence, you are responsible for obtaining class notes, handouts, assignments, etc.


=== Office Hours ===
=== Office Hours ===


Because this is an evening degree program and I understand you have busy schedules that keep us away from campus during the day, I will not hold regular office hours. In general, I am very happy to have a skype or hangouts session where we can share our screens and discuss your questions. I'm also happy to meet in the evenings in the University District. Please contact me on email to arrange a meeting.
Because this is an evening degree program and I understand you have busy schedules that keep us away from campus during the day, I will not hold regular office hours. In general, I am very happy to have a skype or hangouts session where we can share our screens and discuss your questions. Please contact me on email to arrange a meeting.


=== Disability Accommodations Statement ===
=== Disability Accommodations Statement ===


Your experience in this class is important to me. If you have already established accommodations with Disability Resources for Students (DRS), please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course.
To request academic accommodations due to a disability please contact Disability Resources for Students, 448 Schmitz, 206-543-8924/V, 206-5430-8925/TTY. If you have a letter from Disability Resources for Students indicating that you have a disability that requires academic accommodations, please present the letter to me so we can discuss the accommodations that you might need for the class. I am happy to work with you to maximize your learning experience.
 
If you have not yet established services through DRS, but have a temporary health condition or permanent disability that requires accommodations (conditions include but not limited to: mental health, attention-related, learning, vision, hearing, physical or health impacts), you are welcome to contact DRS at 206-543-8924 or uwdrs@uw.edu or https://disability.uw.edu.
 
DRS offers resources and coordinates reasonable accommodations for students with disabilities and/or temporary health conditions.  Reasonable accommodations are established through an interactive process between you, your instructor(s) and DRS.  It is the policy and practice of the University of Washington to create inclusive and accessible learning environments consistent with federal and state law.
 
 
=== Incomplete ===
 
An Incomplete may be given only when the student has been in attendance and has done satisfactory work to within two weeks of the end of the quarter and has furnished proof satisfactory to the instructor that the work cannot be completed because of illness or other circumstances beyond the student’s control.
To obtain credit for the course, a student must successfully complete the work and the instructor must submit a grade. In no case may an Incomplete be converted into a passing grade after a lapse of two years or more. An incomplete received by the graduate student does not automatically convert to a grade of 0.0 but the “I” will remain as a permanent part of the student’s record.
 


=== Comm Lead Electronic Mail Standards of Conduct ===
=== Comm Lead Electronic Mail Standards of Conduct ===
Line 355: Line 202:
=== Academic Misconduct ===
=== Academic Misconduct ===
   
   
Comm Lead is committed to upholding the academic standards of the University of Washington’s Student Conduct Code. It is the responsibility of each UW student to know and uphold all tenets of the code, including those regarding integrity in academic conduct (http://www.washington.edu/admin/rules/policies/SGP/SPCH209.html#7). In this course, avoiding plagiarism, falsification of fieldwork data, and inappropriate collaboration are particularly important. All assignments will be reviewed for integrity. All rules regarding academic integrity extend to electronic communication and the use of online sources. All instances of suspected dishonesty or misconduct will be reported in accordance with UW policy, and may result in failure and removal from this course.If a faculty member suspects a violation of the Student Conduct Code from one of their students, the instructor will notify the student directly and file a report with the College of Arts and Sciences Student Conduct Office, as required by the College. Comm Lead faculty (indeed, all UW faculty) may neither attempt to reach a mutually agreeable resolution with a student suspected of academic misconduct NOR unilaterally lower a student’s grade based academic misconduct without taking the necessary steps outlined above.
Comm Lead is committed to upholding the academic standards of the University of Washington’s Student Conduct Code. If I suspect a student violation of that code, I will first  engage in a conversation with that student about my concerns.
 
If we cannot successfully resolve a suspected case of academic misconduct through our conversations, I will refer the situation to the Anita Crofts, Comm Lead Associate Director of Academic Affairs. The Comm Lead Associate Director of Academic Affairs, in consultation with the Comm Lead Director, can then work with the COM Chair to seek further input and if necessary, move the case up to the Dean.
While evidence of academic misconduct may result in a lower grade, Comm Lead faculty (indeed, all UW faculty) may '''not''' unilaterally lower a grade without taking the necessary steps outlined above.
   
   
In closing, Comm Lead  students are expected to:
In closing, Comm Lead  students are expected to:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)