Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Page
Discussion
Edit
View history
Editing
Statistics and Statistical Programming (Winter 2017)
(section)
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Schedule == When reading the schedule below, the following key might help resolve ambiguity: §n denotes chapter n; §n.x denotes section x of chapter; §n.x-y denotes sections x through y of chapter n. === Week 1: Tuesday January 3: Introduction, Setup, and Data and Variables === Hopefully, the material in OpenIntro feels very familiar from COM520. The programming material will be new but I want you to read it before you come to class so we can work through the examples a group. '''Required Readings:''' * Diez, Barr, and Çetinkaya-Rundel: §1 (Introduction to data) * Verzani: §1 (Getting Started), §2 (Univariate data) [[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch1_ch2.pdf Available with UWNetID]] * Kramer, Adam D. I., Jamie E. Guillory, and Jeffrey T. Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” ''Proceedings of the National Academy of Sciences'' 111(24):8788–90. [[http://www.pnas.org/content/111/24/8788.full Available through UW libraries]] '''Optional Readings:''' * Verzani: §A (Programming) '''Assignment (Complete Before Class):''' * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 1]] '''Lectures:''' * [https://communitydata.cc/~mako/2017-COM521/com521-week_01-r_programming_intro-20170103.ogv Week 1 R lecture screencast (Part I): Introduction to R and univariate statistics] (~1 hour 47 minutes) * [https://communitydata.cc/~mako/2017-COM521/com521-week_01-github_rscripts-20170104.ogv Week 1 R lecture screencast (Part II): Setting up git/GitHub and saving files in RStudio] (~40 minutes) * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 1]] '''Resources:''' * [https://www.openintro.org/download.php?file=os3_slides_01&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §1 Lecture Notes] * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including some for §1 * [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 1]] === Week 2: Tuesday January 10: Probability and Visualization === '''Required Readings:''' * Diez, Barr, and Çetinkaya-Rundel: §2 (Probability) * Verzani: §3.1-2 (Bivariate data), §4 (Multivariate data), §5 (Multivariate graphics) [[https://faculty.washington.edu/makohill/com521/verzani-usingr-ch3.1-2_ch4_ch5.pdf Available with UW NetID]] * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] '''Assignment (Complete Before Class):''' * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 2]] '''Lectures:''' * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 2]] * [https://communitydata.cc/~mako/2017-COM521/com521-week_02-lists_dataframes_graphing-20170111.ogv Week 2 R lecture screencast: lists, matrixes, data frames, and beginning graphing] (~1 hour 8 minutes) '''Resources:''' * [https://www.openintro.org/download.php?file=os3_slides_02&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §2 Lecture Notes] * [https://www.openintro.org/stat/videos.phpOpenIntro Video Lectures] including 2 short videos for §2 * [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 2]] === Week 3: Tuesday January 17: Distributions === '''Required Readings:''' * Diez, Barr, and Çetinkaya-Rundel: §3.1-3.2, §3.4: You should read the rest of the chapter (§3.3 and §3.5). I won't assign problem set questions about it but it's still important to be familiar with. * Verzani: §6 (Populations) '''Assignment (Complete Before Class):''' * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 3]] '''Lectures:''' * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 3]] * [https://communitydata.cc/~mako/2017-COM521/com521-week_03-loading_data_functions_apply_misc.ogv Week 3 R lecture screencast: Loading data, functions; apply(), lapply(), sapply(); several miscellaneous functions] (~34 minutes) — This is the same material I covered in class. If you followed it, there's no reason you need to go back to this. * [https://communitydata.cc/~mako/2017-COM521/com521-week_03-dates_tapply_merge.ogv Week 3 R lecture screencast: Dates; tapply(); and merge()] (~38 minutes) [The audio seems to be broken for the last 10 minutes. Sorry about that! I've rerecorded that below.] * [https://communitydata.cc/~mako/2017-COM521/com521-week_03-merge.ogv Week 3 R lecture screencast: merge()] (~13 minutes) [Rerecording of the last few minutes of the previous video.] '''Resources:''' * [https://www.openintro.org/download.php?file=os3_slides_03&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §3 Lecture Notes] * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 2 videos for §3.1 and §3.2 * [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 3]] === Week 4: Tuesday January 24: Statistical significance and hypothesis testing === '''Required Readings:''' * Diez, Barr, and Çetinkaya-Rundel: §4 (Foundations for inference) * Verzani: §7 (Statistical inference), §8 (Confidence intervals) '''Assignment (Complete Before Class):''' * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 4]] '''Lectures:''' * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 4]] * [https://communitydata.cc/~mako/2017-COM521/com521-week_04-misc_confint_simulation-20170125.ogv Week 4 R lecture screencast: order(); confidence intervals; simulations drawn from repeated random samples] (~27 minutes) '''Resources:''' * [https://www.openintro.org/download.php?file=os3_slides_04&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §4 Lecture Notes] * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 7 videos for nearly all of §4 * [[Statistics and Statistical Programming (Winter 2017)/Session plan: Week 4]] === Week 5: Tuesday January 31: Continuous Numeric Data & ANOVA === '''Required Readings:''' * Diez, Barr, and Çetinkaya-Rundel: §5 (Inference for numerical data) * Verzani: §9 (significance tests), §12 (Analysis of variance) * Gelman, Andrew and Hal Stern. 2006. “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant.” ''The American Statistician'' 60(4):328–31. [[http://dx.doi.org/10.1198/000313006X152649 Available through UW Libraries]] * Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. https://doi.org/10.1016/j.pubrev.2007.05.016 [Available through UW Libraries] * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] '''Assignment (Complete Before Class):''' * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 5]] '''Lectures:''' * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 5]] * [https://communitydata.cc/~mako/2017-COM521/com521-week_05-ttests_and_anova.ogv Week 5 R lecture screencast: t-tests] (~22 minutes) * [https://communitydata.cc/~mako/2017-COM521/com521-week_05-for_if.ogv Week 5 R lecture screencast: for loops and if statements] (~12 minutes) '''Resources:''' * [https://www.openintro.org/download.php?file=os3_slides_05&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §5 Lecture Notes] === Week 6: Tuesday February 7: Categorical data === '''Required Readings:''' * Diez, Barr, and Çetinkaya-Rundel: §6 (Inference for categorical data) * Verzani: §3.4 (Bivariate categorical data); §10.1-10.2 (Goodness of fit) * Gelman, Andrew and Eric Loken. 2014. “The Statistical Crisis in Science Data-Dependent Analysis—a ‘garden of Forking Paths’—explains Why Many Statistically Significant Comparisons Don’t Hold Up.” ''American Scientist'' 102(6):460. [[https://www.americanscientist.org/issues/pub/2014/6/the-statistical-crisis-in-science/1 Available through UW Libraries]] (This is a reworked version of [http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf this unpublished manuscript] which provides a more detailed examples.) * Buechley, Leah and Benjamin Mako Hill. 2010. “LilyPad in the Wild: How Hardware’s Long Tail Is Supporting New Engineering and Design Communities.” Pp. 199–207 in ''Proceedings of the 8th ACM Conference on Designing Interactive Systems.'' Aarhus, Denmark: ACM. [[https://mako.cc/academic/buechley_hill_DIS_10.pdf PDF available on my personal website]] '''Assignment (Complete Before Class):''' * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 6]] '''Lectures:''' * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 6]] * [https://communitydata.cc/~mako/2017-COM521/com521-week_06-tables_chisq_debugging.ogv Week 6 R lecture screencast: Tables, <math>\chi^2</math>-tests, and debugging.] (~40 minutes) '''Resources:''' * [https://www.openintro.org/download.php?file=os3_slides_06&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §6 Lecture Notes] * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7 === Week 7: Tuesday February 14: Linear Regression === '''Required Readings:''' * Diez, Barr, and Çetinkaya-Rundel: §7 (Introduction to linear regression); §8.1-8.3 (Multiple regression) * OpenIntro eschews a mathematical instruction to correlation. Can you look over [https://en.wikipedia.org/wiki/Correlation_and_dependence the Wikipedia article on correlation and dependence] and pay attentions to the formulas. It's tedious to compute but I'd like to you to at least see what goes into it. * Verzani: §11.1-2 (Linear regression), * Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in UW libraries]] '''Assignment (Complete Before Class):''' * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 7]] '''Lectures:''' * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 7]] * [https://communitydata.cc/~mako/2017-COM521/com521-week_07-linear_regression.ogv Week 7 R lecture screencast: linear regression] (~42 minutes) '''Resources:''' * [https://www.openintro.org/download.php?file=os3_slides_07&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §7 Lecture Notes] * [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes] * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including 4 videos for §7 and 3 videos on the sections §8.1-8.3 === Week 8: Tuesday February 21: Polynomial Terms, Interactions, and Logistic Regression === '''Required Readings:''' * [https://onlinecourses.science.psu.edu/stat501/node/301 Lesson 8: Categorical Predictors] and [https://onlinecourses.science.psu.edu/stat501/node/318 Lesson 9: Data Transformations] from the PennState Eberly College of Science STAT 501 Regression Methods Course. There are several subparts (many quite short), please read them all carefully. * Diez, Barr, and Çetinkaya-Rundel: §8.4 (Multiple and logistic regression) * Verzani: §11.3 (Linear regression), §13.1 (Logistic regression) * Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” ''PLoS Medicine'' 2(8):e124. [[http://dx.doi.org/10.1371%2Fjournal.pmed.0020124 Open Access]] * Lampe, Cliff, and Paul Resnick. 2004. “Slash(Dot) and Burn: Distributed Moderation in a Large Online Conversation Space.” In ''Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '04)'', 543–550. New York, NY, USA: ACM. doi:10.1145/985692.985761. [[http://dx.doi.org/10.1145/985692.985761 Available in UW libraries]] '''Optional Readings:''' * Head, Megan L., Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” ''PLOS Biology'' 13(3):e1002106. [[http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106 Open Access]] '''Assignment (Complete Before Class):''' * [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 8]] '''Lectures:''' * [[Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 8]] * [https://communitydata.cc/~mako/2017-COM521/com521-week_08-more_regression_anova_redux.ogv Week 8 R lecture screencast: more on linear regression, including interactions, polynomials, log transformations; anova] (~28 minutes) '''Resources:''' * [https://www.openintro.org/download.php?file=os3_slides_08&referrer=/stat/slides/slides_0x.php Mine Çetinkaya-Rundel's OpenIntro §8 Lecture Notes] * [https://www.openintro.org/stat/videos.php OpenIntro Video Lectures] including a video on §8.4 * I've written this document which will likely be useful for many of you: [https://communitydata.cc/~mako/2017-COM521/logistic_regression_interpretation.html Interpreting Logistic Regression Coefficients with Examples in R] === Week 9: Tuesday February 28: Consulting Meetings === We won't meet as a group. Instead, you will each meet on-on-one with me to work through challenges and issues with your analysis. === Week 10: Tuesday March 7: Consulting Meetings === We won't meet as a group. Instead, you will each meet on-on-one with me to work through challenges and issues with your analysis. === Week 11: March 14: Final Presentations ===
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information