Main Page

From CommunityData

The Community Data Science Collective is an interdisciplinary research group made up of faculty and students at the University of Washington Department of Communication, the Northwestern University Department of Communication Studies, the Carleton College Computer Science Department, and the Purdue University School of Communication.

CDSC members at the CDSC group retreat in October 2022 in Seattle. In a spiral starting from the top: Mako, Carl, Jeremy, Nick, Salt, Hazel, Yibin, Regina, Kaylea, Ellie, Aaron, Floor, Sohyeon, Molly, Emilia, Ryan, Charlie, Dyuti. Check out our other group photos!

We are social scientists applying a range of quantitative and qualitative methods to the study of online communities. We seek to understand both how and why some attempts at collaborative production — like Wikipedia and Linux — build large volunteer communities and high quality work products.

Our research is particularly focused on how the design of communication and information technologies shape fundamental social outcomes with broad theoretical and practical implications — like an individual’s decision to join a community, contribute to a public good, or a group’s ability to make decisions democratically.

Our research is deeply interdisciplinary, most frequently consists of “big data” quantitative analyses, and lies at the intersection of communication, sociology, and human-computer interaction.


In addition to research, we teach classes and run workshops. Some of that work is coordinated on this wiki. A more detailed lists of workshops and teaching material on this wiki is on our Workshops and Classes page. In this page, we only list ongoing classes and workshops.

Purdue Courses

  • [Fall 2022] Communication and Social Networks (COM 411, Fall 2022) – This class focuses on understanding how the structure of relationships between people influence communication patterns and behavior. This perspective can help us to understand a broad set of phenomena, from online communities to friendships to businesses. The course will also introduce students to using network visualizations to gain and share insights about network phenomena. Taught by Jeremy Foote.

University of Washington Courses

Public Data Science Workshops

Community Data Science Workshops — The Community Data Science Workshops (CDSW) are a series of workshops designed to introduce some of the basic tools of programming and analysis of data from online communities to absolute beginners. The CDSW have been held six times in Seattle between 2014 and 2020. So far, more than 100 people have volunteered their weekends to teach more than 500 people to program in Python, to build datasets from Web APIs, and to ask and answer questions using these data.

Research Resources

If you are a member of the collective, perhaps you're looking for CommunityData:Resources which includes details on email, TeX templates, documentation on our computing resources, etc.

About This Wiki

This is open to the public and hackable by all but mostly contains information that will be useful to collective members, their collaborators, people enrolled in their projects, or people interested in building off of their work. If you're interested in making a change or creating content here, generally feel empowered to Be Bold. If things don't fit, somebody who watches this wiki will be in touch.

This is mostly a normal MediaWiki although there are a few things to know:

  • There's a CAPTCHA enabled. If you create an account and then contact any collective member with the username (on or off wiki), they can turn the CAPTCHA off for you.
  • Extension:Math is installed so you can write math here. Basically you just add math by putting TeX inside <math> tags like this: <math>\frac{\sigma}{\sqrt{n}}</math> and it will write .

Research News

Follow us as @comdatasci on Twitter and in the Fediverse/Mastodon and subscribe to the Community Data Science Collective blog.

Recent posts from the blog include:

Effects of Algorithmic Flagging on Fairness: Quasi-experimental Evidence from Wikipedia
Many online platforms are adopting machine learning as a tool to maintain order and high quality information in the face of massive influxes of of user generated content. Of course, machine learning algorithms can be inaccurate, biased or unfair. How do signals from machine learning predictions shape the fairness of online content moderation? How can …
— Nate TeBlunthuis 2023-03-22
Community Dialogue on Accountable Governance and Data
Our fourth Community Dialogue covered topics on accountable governance and data leverage as a tool for accountable governance. It featured Amy X. Zhang (University of Washington) and recent CDSC graduate Nick Vincent (Northwestern, UC Davis). Designing and Building Governance in Online Communities (Amy X. Zhang) This session discussed different methods of engagement between communities and …
— mollydb 2023-03-09
Literature on Inequality and Discrimination in the Gig Economy
Inequality and discrimination in the labor market is a persistent and sometimes devastating problem for job seekers. Increasingly, labor is moving to online platforms, but labor inequality and discrimination research often overlooks work that happens on such platforms. Do research findings from traditional labor contexts generalize to the online realm? We have reason to think …
— Floor Fiers 2023-02-16
How to cite Wikipedia (better)
— Aaron Shaw 2023-02-07
2022 Year in Review
One of the fun things about being in a large lab is getting to celebrate everyone’s accomplishments, wins, and the good stuff that happens. Here is a brief-ish overview of some real successes from 2022. Graduations and New Positions Our lab gained SIX new grad student members, Kevin Ackermann, Yibin Fang, Ellie Ross, Dyuti Jha, …
— mollydb 2023-01-26