Community Data Science Workshops (Winter 2020)/Resources: Difference between revisions

Latest revision as of 16:25, 17 March 2020

Resources Winter 2020

This list of resources was created in order to support CDSW workshop participants to continue to develop their skills or answer some of the questions that arose during the workshops

Scraping Data from the Web[edit]

Helena-Lang.org demonstrates how to get data scraped automatically. It requires no programming and has a free Chrome Plug-in. The website has a series of tutorials available here: Tutorials

Quantitative Data Analysis[edit]

Tea-Lang.org provides a high-level specification of your data and hypothesis, and get back valid statistical test results and explanations. It requires minimal programming and comfort in Python and has a free Python package.

Data Visualization[edit]

Altair-Viz.github.io allows you to write a high-level specification about desired visualization and data. The platform allows you to get back data visualization and requires some programming and comfort in Python. The platform has a free Python package.

Finding a dataset[edit]

In case you are looking for available datasets for your projects here are some potential leads:

Do some Google Scholar and normal internet searching for datasets in your research area. You'll probably be surprised at what's available.
Take a look at datasets available in the Harvard Dataverse (a very large collection of social science research data) or one of the other members of the Dataverse network.
Look at the collection of social scientific datasets at ICPSR at the University of Michigan (NU is a member). There is an enormous number of very rich datasets.
Use the ISA Explorer to find datasets. Keep in mind the large majority of datasets it will search are drawn from the natural sciences.
The City of Chicago has one of the best data portal sites of any municipality in the U.S. (and better than many federal agencies). There are also numerous administrative datasets released by other public entities (try searching!) that you might find inspiring.
FiveThirtyEight.com has published a GitHub repository and an R package with pre-processed and cleaned versions of many of the datasets they use for articles published on their website.

@@ Line 4: / Line 4: @@
 This list of resources was created in order to support CDSW workshop participants to continue to develop their skills or answer some of the questions that arose during the workshops
-== Learning Python ==
-[https://www.learnpython.org/ LearnPython.org] provides a series of self-guided tutorials for familiarizing yourself with basic Python concepts, syntax, and operations.
-You can spend the afternoon session working through these lessons on your own if you want more practice. Mentors will be nearby and ready to help with any questions you might have!
-If you would like to run through the same lessons in Jupyter Notebooks rather than on LearnPython.org, you can download notebooks for each of the "Learn the Basics" lessons [https://communitydata.science/~mako/learnpython_basics_notebooks_2020-01-18.zip here]. Run them on your computer by unzipping the file, placing the .ipynb files on your desktop, and opening them in Jupyter.
 == Scraping Data from the Web ==
 [https://Helena-lang.org/ Helena-Lang.org] demonstrates how to get data scraped automatically. It requires no programming and has a free Chrome Plug-in. The website has a series of tutorials available here: [http://helena-lang.org/demonstration/ Tutorials]
 == Quantitative Data Analysis ==
@@ Line 20: / Line 12: @@
 == Data Visualization ==
-[https://altair-viz.github.io/gallery/index.html/ Altair] allows you to write a high-level specification about desired visualization and data. The platform allows you to get back data visualization and requires some programming and comfort in Python. The platform has a free Python package.
+[https://altair-viz.github.io/gallery/index.html/ Altair-Viz.github.io] allows you to write a high-level specification about desired visualization and data. The platform allows you to get back data visualization and requires some programming and comfort in Python. The platform has a free Python package.
 == Finding a dataset ==