Community Data Science Workshops (Winter 2020)/Resources

From CommunityData
< Community Data Science Workshops (Winter 2020)
Revision as of 23:59, 15 February 2020 by St3f (talk | contribs) (update altair link)
Resources Winter 2020

This list of resources was created in order to support CDSW workshop participants to continue to develop their skills or answer some of the questions that arose during the workshops

Learning Python

LearnPython.org provides a series of self-guided tutorials for familiarizing yourself with basic Python concepts, syntax, and operations.

You can spend the afternoon session working through these lessons on your own if you want more practice. Mentors will be nearby and ready to help with any questions you might have!

If you would like to run through the same lessons in Jupyter Notebooks rather than on LearnPython.org, you can download notebooks for each of the "Learn the Basics" lessons here. Run them on your computer by unzipping the file, placing the .ipynb files on your desktop, and opening them in Jupyter.

Scraping Data from the Web

Helena-Lang.org demonstrates how to get data scraped automatically. It requires no programming and has a free Chrome Plug-in. The website has a series of tutorials available here: Tutorials


Quantitative Data Analysis

Tea-Lang.org provides a high-level specification of your data and hypothesis, and get back valid statistical test results and explanations. It requires minimal programming and comfort in Python and has a free Python package.

Data Visualization

Altair-Viz.github.io allows you to write a high-level specification about desired visualization and data. The platform allows you to get back data visualization and requires some programming and comfort in Python. The platform has a free Python package.


Finding a dataset

In case you are looking for available datasets for your projects here are some potential leads:

  • Do some Google Scholar and normal internet searching for datasets in your research area. You'll probably be surprised at what's available.
  • Take a look at datasets available in the Harvard Dataverse (a very large collection of social science research data) or one of the other members of the Dataverse network.
  • Look at the collection of social scientific datasets at ICPSR at the University of Michigan (NU is a member). There is an enormous number of very rich datasets.
  • Use the ISA Explorer to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences.
  • The City of Chicago has one of the best data portal sites of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring.
  • FiveThirtyEight.com has published a GitHub repository and an R package with pre-processed and cleaned versions of many of the datasets they use for articles published on their website.