The Community Data Science Collective is an interdisciplinary research group made up of faculty and students at the University of Washington Department of Communication, the Northwestern University Department of Communication Studies, the Carleton College Computer Science Department, and the Purdue University School of Communication.

CDSC members at the CDSC group retreat in October 2022 in Seattle. In a spiral starting from the top: Mako, Carl, Jeremy, Nick, Salt, Hazel, Yibin, Regina, Kaylea, Ellie, Aaron, Floor, Sohyeon, Molly, Emilia, Ryan, Charlie, Dyuti. Check out our other group photos!

We are social scientists applying a range of quantitative and qualitative methods to the study of online communities. We seek to understand both how and why some attempts at collaborative production — like Wikipedia and Linux — build large volunteer communities and high quality work products.

Our research is particularly focused on how the design of communication and information technologies shape fundamental social outcomes with broad theoretical and practical implications — like an individual’s decision to join a community, contribute to a public good, or a group’s ability to make decisions democratically.

Our research is deeply interdisciplinary, most frequently consists of “big data” quantitative analyses, and lies at the intersection of communication, sociology, and human-computer interaction.

In addition to research, we teach classes and run workshops. Some of that work is coordinated on this wiki. A more detailed lists of workshops and teaching material on this wiki is on our Workshops and Classes page. In this page, we only list ongoing classes and workshops.

Purdue Courses

  • [Summer 2023] Advanced Computational Communication Methods – In this class, we will investigate a number of more advanced methods or concepts not covered in the Intro to Programming and Data Science course, including SQL, computational text analysis, creating reproducible projects, and advanced visualization.
  • [Spring 2023] Quantitative Methods for Communication – This course introduces students to a range of social-scientific research methods used to investigate human communication, with a focus on research design, statistics, and statistical software. Taught by Jeremy Foote and Hazel Chiu.

University of Washington Courses

Public Data Science Workshops

Community Data Science Workshops — The Community Data Science Workshops (CDSW) are a series of workshops designed to introduce some of the basic tools of programming and analysis of data from online communities to absolute beginners. The CDSW have been held six times in Seattle between 2014 and 2020. So far, more than 100 people have volunteered their weekends to teach more than 500 people to program in Python, to build datasets from Web APIs, and to ask and answer questions using these data.

Research Resources

If you are a member of the collective, perhaps you're looking for CommunityData:Resources which includes details on email, TeX templates, documentation on our computing resources, etc.

About This Wiki

This is open to the public and hackable by all but mostly contains information that will be useful to collective members, their collaborators, people enrolled in their projects, or people interested in building off of their work. If you're interested in making a change or creating content here, generally feel empowered to Be Bold. If things don't fit, somebody who watches this wiki will be in touch.

This is mostly a normal MediaWiki although there are a few things to know:

  • There's a CAPTCHA enabled. If you create an account and then contact any collective member with the username (on or off wiki), they can turn the CAPTCHA off for you.
  • Extension:Math is installed so you can write math here. Basically you just add math by putting TeX inside <math> tags like this: <math>\frac{\sigma}{\sqrt{n}}</math> and it will write .

Research News

FOSSY Wrap-Up: Kaylea Champion’s Lightning Talk on Undermaintained Packages
Welcome to part 4 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23! Kaylea presented on her new research project to identify how packages come to be undermaintained, in particular investigating assumptions that it’s all about “the old stuff” — old …
— kaylea 2023-09-14
The State of Wikimedia Research, 2022–2023
Wikimania, the annual global conference of the Wikimedia movement, took place in Singapore last month. For the first time since 2019, the conference was held in person again. It was attended by over 670 people in-person and more than 1,500 remotely. At the conference, Benjamin Mako Hill, Tilman Bayer, and Miriam Redi presented “The State …
— Community Data Science Collective 2023-09-29
FOSSY Wrap-Up: Anita Sarma’s Lightning Talk on Inclusion Bugs
Welcome to part 3 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23! Dr. Anita Sarma gave us an excellent introduction to her and her team’s work on understanding how to make FOSS more inclusive by identifying errors in user interaction …
— kaylea 2023-09-14
FOSSY Wrap-Up: Matt Gaughan’s Kernel Dataset Lightning Talk
Welcome to part 2 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23! Matt Gaughan delivered a rapid introduction to his dataset highlighting the numerous places where the Linux Kernel is using unsafe memory practices. You can watch the talk HERE, …
— kaylea 2023-09-14
FOSSY Wrap-up – Sophia Vargas on Proactive Metrics to Combat Maintainer Burnout
Welcome to part 1 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23! Sophia Vargas presented ‘Can we combat maintainer burnout with proactive metrics?’ In this talk, Sophia takes us through her extensive investigations across multiple projects to weigh the value …
— kaylea 2023-09-14