Latest revision |
Your text |
Line 1: |
Line 1: |
| [[File:Matplotlib-hist2d.png|right|500px]] | | [[File:Matplotlib-hist2d.png|right|500px]] |
| __NOTOC__ | | __NOTOC__ |
| == Getting Started == | | == Visualizing data with Matplotlib and Wiki-bios == |
| Download the CDSW Matplotlib code from [https://github.com/makoshark/matplotlib-cdsw/archive/master.zip here] and unzip it on your machine.
| |
|
| |
|
| We will also need the data file that is [http://communitydata.cc/~mako/hp_wiki.tsv here]
| | In this project, we will explore how to produce clear, informative charts, graphs, and plots with [http://matplotlib.org/ Matplotlib], the most popular toolkit for scientific data visualization in Python. |
|
| |
|
| == Visualizing data with Matplotlib ==
| | We'll be focusing on a dataset drawn from Wikipedia and [http://dbpedia.org DBpedia], containing the names, birth dates, genders, article creation dates, and number of edits, of over 180,000 Wikipedia biography articles. |
| | |
| In this session, we will explore how to produce clear, informative charts, graphs, and plots with [http://matplotlib.org/ Matplotlib], the most popular toolkit for scientific data visualization in Python.
| |
| | |
| We'll start with the data-set created [[Community_Data_Science_Workshops_(Fall_2015)/Day_3_Lecture|this morning]] containing information about edits to the Harry Potter Wikipedia article.
| |
| | |
| We will then proceed to visualize different aspects of data from the [[Community_Data_Science_Workshops_(Fall_2015)/Day_2_Projects/Socrata|Socrata web API]].
| |
| | |
| | |
| === Inspiration ===
| |
| | |
| * [https://flowingdata.com/ Flowing Data]
| |
| * [http://www-01.ibm.com/software/analytics/many-eyes/ Many Eyes]
| |
| * [http://www.edwardtufte.com/tufte/ Edward Tufte]
| |
| * [http://www.visualizing.org/ visualizing.org]
| |
| * [http://idl.cs.washington.edu/ UW Interactive Data Lab]
| |
|
| |
|
| === Goals === | | === Goals === |
Line 33: |
Line 17: |
| * Exercise your creativity by making your own visualization | | * Exercise your creativity by making your own visualization |
|
| |
|
| # First plot: <tt>001-hello-plot.py</tt>
| | === Download and test the Matplotlib-with-Wiki-bios project === |
| # Subplots: <tt>002-subplots.py</tt>
| | |
| # Let's do something more interesting: <tt>003-plot-timeseries.py</tt>
| | (Estimated time: 10 minutes) |
| # Visit the [http://matplotlib.org/gallery.html Matplotlib gallery].
| | |
| # Make another kind of plot: <tt>004-plot-histogram.py</tt>
| | After installing matplotlib, and downloading and unpacking the Wikibios bundle, move into that directory with '''cd'''. You can test your installation by running '''python histograms.py'''. If matplotlib is install correcting, a chart file named '''histograms.pdf''' will appear in the current directory. |
| # Dive deeper into web APIs: <tt>005-traffic-timeseries.py</tt>
| |
| # Play around with any/all the data you've seen! You can find some more examples in the <tt>wikibios</tt> folder.
| |
|
| |
| === References ===
| |
|
| |
|
| * [http://matplotlib.org/api/pyplot_summary.html matplotlib API reference]
| | [http://mako.cc/teaching/2014/cdsw-autumn/wikibios.zip Wikibios bundle for all platforms] |
| * [http://matplotlib.org/examples/index.html matplotlib Examples] (many, with source)
| |
| * Other plotting resources
| |
| ** [http://web.stanford.edu/~mwaskom/software/seaborn/ Seaborn]: fancy matplotlib-based visualizations
| |
| ** [http://ggplot.yhathq.com/ ggplot]: port of the R language's ggplot2 library to python
| |
| ** [http://d3js.org/ D3.js]: interactive data visualization for the browser (javascript)
| |
|
| |
|
| === Example topics to cover in Lecture === | | === Example topics to cover in Lecture === |
Line 61: |
Line 36: |
|
| |
|
| [[File:Wikipedia.png|right|250px]] | | [[File:Wikipedia.png|right|250px]] |
| [[Category:CDSW]]
| |