Community Data Science Workshops (Winter 2020)/Reflections


 * If you're interested in putting on your own CDSW, you should also see our reflections from Fall 2014.

Some of these text were copied from Fall 2014 Reflections.

Over three weekends in Winter 2020, a group of volunteers organized the Community Data Science Workshops (Winter 2020) the latest in a series of four sessions workshops designed to introduce some of the basic tools of programming and analysis of data from online communities to absolute beginners. The Winter 2020 events were held between Jan 17 and Feb 15 2020 at the University of Washington in Seattle.

This page hosts reflections on organization and curriculum and is written for anybody interested in organizing their own CDSW — including the authors!


 * add comments on overall success here after the last session

If you have any questions or issues, you can contact Benjamin Mako Hill directly or can email the whole group of mentors at cdsw-au2020-mentors@uw.edu.

Structure
The Community Data Science Workshops (Winter 2020) consisted of four sessions:


 * Session 0 (Friday January 17th): Setup and Programming Practice
 * Session 1 (Saturday January 18th): Introduction to Python
 * Session 2 (Saturday January 15th): Building data sets using web APIs
 * Session 3 (Saturday January 22nd):  Data analysis and visualization

Our organization and the curriculum for Sessions 0 and 1 were originally borrowed from the Boston Python Workshop (BPW) although our curriculum has diverged quite a bit as we've improved it and tailored it to the specific learning goals in our sessions.

Session 0 was a three hour evening session to install software. All three of the other sessions were all day-long session (10am to 4pm) sessions broken up into the following schedule:


 * Morning, 10am-12:20: A 2 hour lecture
 * Lunch, 12:20-1pm
 * Afternoon, 1pm-3:30pm: Practice working on projects in 3 breakout sessions
 * Wrap-up, 3:30pm-4pm: Wrap-up, next steps, and upcoming opportunities

We collected detailed feedback from users at three points using the following Google forms (these are copies):


 * add surveys here

We used this feedback to both evaluate what worked well and what did not and to get a sense of what students wanted to learn in the next session and which afternoon sessions they might find interesting.

Participants
We had n mentors who attended at least one of the sessions and at least n mentors at each sessions. Many of our mentors were UW students in more technical departments like Computer Science and Engineering and Human Centered Design & Engineering. Perhaps half of them worked outside of the university as software developers.

We had about n participants apply to attend the sessions. We selected on programming skill (to ensure that all attendees were complete beginners), enthusiasm, and randomly to maintain a learner to mentor ratio of between 4 and 5. We admitted n participants. n listed a UW affiliations. Affiliations listed by at least three people include the following:

other affiliations?

Retention between session and 0 and 1 was nearly 100%. Retention between sessions 1 and 2 and sessions 2 and 3 was roughly n% leaving us with perhaps n% retention between session 0 and session 3.

Once again, quite a large number of people applied were already skilled programmers. We're still not exactly sure why these people are applying because we think that the fact that the workshops are for absolute beginners is very clear. Perhaps people just want more exposure to data science?

Once again, the constraint on scaling the workshop was the number of mentors. Every mentor we added means that the workshop can accommodate four more participants.

One suggestion was allowing participants with have some programming skills — especially for the second and third workshops (given predictable rates of retention). There was not consensus among the organizers and mentors on this approach and preferred getting more newbies and invest more in them?

Organization

 * we had a little less time to prepare this time. It may be helpful to streamline what needs to happen before the first session and divide up the work.
 * we have an organization TODO list shared in Google Docs.
 * participant recruitment and selection + room reservations should be done early in the prep.

Morning Lectures
Benjamin Mako Hill gave lectures in Session 1 and 3. Tommy Guy gave the lecture in Session 2. An important future goal is getting other people to give lectures. Different faces, perspective, and backgrounds are useful to communicate the breadth of interest here. Mako does not want to be the only one giving these lectures.

Our biggest challenge with growing the workshops was with physical space for the lectures. Basically, rooms that can hold more than 100 people at UW are almost exclusively lectures halls that make it almost impossible for mentors to physically reach students in order to help them debug and solve problems.

We reserved a lecture hall that fit 200 people and filled it with 100 students in alternating rows to make it at least possible to reach each person. This worked reasonably well although it was still suboptimal.

People continue to want a record of lectures. At the very minimum, we should make sure that we turn on console logging so that we can post this after the lectures. Mako recorded the first lecture with recordmydesktop(?).

Afternoon Sessions
Projects are done in breakout sessions in a series of three rooms. The general problem was that insisted on teacher per topic and topics were very unequal in their popularity. Next time, we will likely prepare to have multiple teacher for multiple rooms on topics we know will be more popular.

Several changes we hope to make include:


 * add notes

Session 0: Python Setup
The goal of this session was to get users setup with Python and starting to learn some Python basics. We changed the curriculum originally used by BPW enormously to use Continuum's Anaconda instead of Python directly from python.org. The result was staggering. Not a single person reported "many problems with set-up" (i.e., respondents reported either "no problems" or a "few problems.")

That said, we had several major concerns:


 * add notes

Changes for next time include:


 * add notes

Session 1: Introduction to Python
The goal of this session was to teach the basics of programming in Python. The basic curriculum was originally built off the Boston Python Workshop curriculum which has been used many times and is well tested. Unsurprisingly, it worked well for us as well.

Afternoon sessions
We felt that that the new Baby Names project was excellent and feedback was overwhelmingly positive. Because it includes both dictionaries and lists of names (in the form of  methods).

Suggestions based on feedback include:


 * add notes

Issues that students had

 * Variable names on loops
 * Accessing and modifying dictionaries and lists
 * Dictionary keys versus values
 * Variable names versus strings
 * Variable names versus what they contain -- the name itself does not explain the content.

Session 2: Learning APIs
The goal of this session was to describe what web APIs were, how they worked (making HTTP requests and receiving data back), how to understand JSON Data, and how to use common web APIs from Wikipedia and Twitter.

Morning lecture
The morning lecture was given by Tommy Guy.

Afternoon sessions
There were three parallel afternoon sessions on add topic1, add topic2 and add topic3.

Topic 1:

Topic 2:

Topic 3:

Session 3: Data Analysis and Visualization
The goal of the lecture was to walk people through the actual mess of writing code from scratch and focused on a single example of code that builds a dataset from Wikipedia.

Afternoon sessions
We ran n sessions this time.

Topic 1

Topic 2

General Feedback

 * add notes

Mentorship

 * add notes

More Projects or Better Projects

 * add notes

Budget
We spent a total of $x on the CDSW. We spent approximately $94 (5 boxes)*3 sessions on coffee. About $x of this funded food and refreshments during post-session meetings among the mentors.

The rest (the large majority) was spent on food. Because were better able to model retention this time around, we did a much better job of ordering the "right" amount of food. We ordered:


 * Session 1: Pizza from Jet City Pizza: this time, we had a little too much pizza, especially special diets pizza; we should ask accepted participants to send special diet requests before Session 0 next time? We were about 10 boxes too many. Next time, fewer numbers of pizza but more toppings? $658 for pizza + $45 tip.
 * Session 2: Indian (four entrees) from Jewel of India?
 * Session 3: Greek food (e.g., salad, hummus, spinach pies, souvlaki) from Costas?

Because Mika did the ordering, everybody ate vegetarian. At least one person complained about the lack of meat.