Editing UW Statistics Courses

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
== First Year Introduction Sequences ==
== First Year Introduction Sequences ==
A sequence is typically going to be a 2-3 quarter group of classes that will give you solid basis into statistics. These will all cover probability, introductory statistics, and statistical programming in R, Stata, SPSS, SAS, etc. They should each cover hypothesis testing and statistical inference, descriptive statistics, some visualization, and linear regression.  They might go further or touch on other things as well. Many of these sequences will also cover more basic features of quantitative social scientific research like operationalization and measure construction, experiment design, etc. A sequence like is the foundation for quantitative analysis and statistics but it is ''not'' a complete training. In almost all cases, it will likely need to be supplemented with additional classes. All CDSC students should take a sequence during their first year.

These are listed more or less in the order of recommendation although different courses will make sense for different students:
These are listed more or less in the order of recommendation although different courses will make sense for different students:

'''Biostatitics (BIOST)''' Biostats. Applied Biostats II (518) is an excellent course, well-taught, with a tight relationship between theory, method, and application. As of Summer '18, no one in the [[CDSC]] has taken Applied Biostats I (517), but it's this year's recommendation for a first class and we'll update this page when possible.
'''BIOST:''' Biostats. Applied Biostats II is an excellent course, well-taught, with a tight relationship between theory, method, and application. No one in our group has taken Applied Biostats I, but it's this year's recommendation for a first class and we'll update this page when possible.
'''Sociology (SOC):''' SOC504, SOC505, and SOC506. These are good courses but are quite applied. For [[CDSC]] members, this would be the easiest minimum option and would be slightly discouraged.
'''Communication (COM):''' COM520 + COM521. Some combination of a quantitative research design and basic social scientific epistemology and design. COM520 or COM520 taught by Mako but it is a truly introductory stats class with a strong emphasis on application in GNU R. Like the SOC sequence, this would be discouraged for CDSC folks who would be encouraged to take a more technical course.
'''Political Science (POLS):''' This sequence begins with POLS500 in the autumn which is similar to COM520/COM521 and is an introduction to quantitative research in the social sciences.
POL501/CS&SS501 focuses on "testing theories with empirical evidence. Examines current topics in research methods and statistical analysis in political science. Content varies according to recent developments in the field and with interests of instructor." 
POLS503/CS&SS503 is Advanced Quantitative Political Methodology and might be a good choice for a 2nd or 3rd quarter in statistics. It is a slightly mathematical applied statistics class which introduces regression and multi-variable techniques for developing causal arguments using statistics. The course stuck fairly closely to the two textbooks Real Stats and Mastering Metrics (an undergrad textbook) in Spr 2018 and the course sites from the last two years [https://uw-pols503.github.io/2017/][https://uw-pols503.github.io/2018/] are published on GitHub, with instructor notes at [https://jrnold.github.io/intro-methods-notes/] so take a look there if you want a preview of what will be covered. The class is sponsored by Political Science, so some of the content is influenced by their disciplinary norms.
'''Economics/Stastics (ECON/STAT):''' If you already have good linear algebra and multivariate calculus, Taking ECON 580 and 581 is a good short-cut to getting a lot of methods covered in other classes. You could take SOC 504,505,506, CS&SS 503,504 560, and 564 or you could take ECON 580, 581 and read a few books. This is the ideal class for any CDSC folks although it will likely be a poor choice until you have a relatively strong mathematics background. Details on these classes are provided below.
'''Education Psychology (EDPSY):''' EDPSY490, EDPSY491 strong focus on psychometric techniques drawn from psychology. Should be relatively easier and very applied but will not provide a good training for research using non-experimental settings. Due to the rigor and the focus on experiments, ANOVA, and SPSS, this is discouraged for CDSC members which will typically be dealing with observational data or should, at the very least, build the skills necessary to do so.
== Other First-Year Courses ==
Although they will not be taught in the sequences above, CDSC members should be comfortable with the material taught in these very short courses and camps by the end of their first year. Taking these courses is a good way to make sure that happens!
'''Math Camp:''' Math Camp is an intensive one-week introductory course offered during the summer. 
'''Review of Mathematics for Social Scientists (CS&SS 505):''' A 1-credit course covers the same material as Math Camp but at a slower pace.
Math Camp/CSSS 505 are recommended for incoming students and students who are entering their 2nd year and plan to take an advanced statistics course. It will assume basic math skills through high school algebra but nothing else. Topics reviewed are algebra, functions and limits, differentiation, maximization of functions, integration, matrix algebra, linear equations and least squares, and probability. Typically offered during winter and spring quarters.
'''Introduction to R (CS&SS 508):''' Another 1-credit class that will familiarize students with the R environment for statistical computing.
== Advanced Statistics Courses ==
There are many useful courses offered by the Center for Statistics in the Social Sciences (CSSS). Most of CSSS classes are applied and will give you a chance to apply the methods that you learn to your own projects. Try to take advantage of these opportunities to make progress on your research. CSSS 509 is an exception, and will be discussed below under econometrics.
'''CS&SS504, Applied Regression''' is an applied, but still technical course on regression. It may vary based on who is teaching the class. This will be the default option for [[CDSC]] students.

'''CS&SS 560, hierarchical modeling''' is important. Hierarchical models are the bread and butter for working with datasets that have community level variables and individual level variables, or that have longitudinal data.
'''CSSS:''' CS&SS503, Causal Inference, is a slightly mathematical applied statistics class which introduces regression and multi-variable techniques for developing causal arguments using statistics -- the course stuck fairly closely to the two textbooks Real Stats and Mastering Metrics in Spr 2018 and the course sites from the last two years are published on GitHub, so take a look there if you want a preview of what will be covered. The class is sponsored by Political Science, so some of the content is influenced by their disciplinary norms. CS&SS504 more applied, but still technical; CS&SS505 (Review Of Mathematics For Social Scientists) is important. It may vary based on who is teaching the class. This will be the default option for [[CDSC]] students. CS&SS 566 is also good. It is a more philosophical and theoretical approach to causality which corrects assumptions about causal identification that are commonly held by econometricians (e.g adding a variable to your model ''can''introduce bias (in theory).  You might also consider CS&SS 560, hierarchical modeling, but you could also just read Andrew Gelman's book. You might also take CS&SS 564 (Baysian Statistics) but if you take ECON 580 you could probably learn the material in this class on your own.

'''CSSS564, Bayesian Statistics for the Social Sciences''' CS&SS 564 is very good. This may vary by the instructor/text, but in 2023 it was taught using R/Jags/Stan with a project and no tests; the content is a nice blend of mathematical and applied perspectives. There are a lot of online resources that accompany the text so you can learn/re-learn the material a few different ways. It's a fair amount of work because you are building familiarity with doing a lot of simulation and digging your hands into how models are working, but the pre-requisites are low; it's not brain-breaking, just some solid grinding and that takes time. Probably easier than 560 because you will review basics of probability, binomial model, etc. from the intro-sequence but in a Bayesian way. That said, the R is a bit more intense in 564 than it is in 560. Taught using mostly base R -- not tidyverse!
'''Economics:''' If you are doing this sequence, ECON 580/CS&SS 509 is essential. (Kaylea recommends: If you do not have both a 2nd year college calculus and a 400-level college stats sequence under your belt, you will need them before you will be happy in this class. If you have a gap in your preparation or it's been a while since you took math, I recommend you work through all of the calculus in Kahn Academy and both of the PSU stat classes listed here (414 and 415): https://onlinecourses.science.psu.edu/stat414 as preparation for this course.) It is a "meta-methods class" ("meta-methods" in this case means that you work through proofs of various statistical methods, and do some R programming that models the behaviors of ideal functions -- this class is not applied) that is essentially a more rigorous (two-variable calculus, many proofs) introduction to statistics. It covers MLE, Baysian inference, and bivariate OLS. It is also a prereq to the most interesting and advanced CS&SS classes. The first few weeks of ECON581 generalize 580 into the multivariate case. The second part provides regression methods for when OLS assumptions are violated. It is good to take if you are (a) good at linear algebra and multivariate calculus (b) want to learn how to derive MLE, GLS, and prove consistency and asymptotics of MLE, IV, and GMM models. ECON 580 and 581 probably all the econometrics useful for applied empirical research. ECON582 is nonparametric models and ECON583, ECON 584 are "Econometric Theory I and II" and will be excellent but only really for folks building new econometric theories. 580 is a big class with quantitative methods folks from all over the social sciences. Most students in 581 are PhD students in Stats, ECON, or finance. Most students taking 583 and 584 will be PhD students in the ECON department specializing in methods. 580 is great. 581 is good, but CS&SS 503 and 510 should cover the most useful stuff in 581 except you won't do the proofs yourself. If you are seriously considering taking 583 or 584 you might also consider switching to a PhD in economics. :)

'''CSSS566, Causal Inference''' CS&SS 566 is good. It covers experimental, instrumental variable, and quasi-experimental designs, structural equation modeling, and DAGs. It takes a relatively philosophical and theoretical approach to causality and shows that common assumptions about causal identification can be wrong (e.g adding a variable to your model ''can'' introduce bias, in theory).
'''Sociology:''' SOC504, SOC505, and SOC506. These are great courses and and quite applied. For [[CDSC]] members, this would be the easiest minimum option and would be slightly discouraged.

== Mathematical Statistics and Econometrics ==
'''Communication:''' COM520 + COM521 offered in 2016-2017 and then, if all goes to plan, every other year after (e.g., 2018-2019, 2020-2021, etc.). COM520 is more about quantitative research design and basic social scientific epistemology and design. COM521 is taught by Mako but it is a truly introductory stats class with a strong emphasis on application in GNU R. Like the SOC sequence, this would be discouraged for CDSC folks who would be encouraged to take a more technical course.

Do you want to get serious about Statistics? Do you want to learn ''why'' statistical methods depend on assumptions and not just *how* to apply them? Do you already have a strong math or statistics background? Then these classes are for you.   
'''Education Psychology:''' EDPSY490, EDPSY491 strong focus on psychometric techniques drawn from psychology. Should be relatively easier and very applied but will not provide a good training for research using non-experimental settings. Due to the rigor and the focus on experiments, ANOVA, and SPSS, this will discouraged for CDSC members.

'''ECON 580/CS&SS 509, Mathematical Statistics'''  This class is great. It is a big class with quantitative methods folks from all over the social sciences. It is  a "meta-methods class."  The goal of the course is for you to understand in mathematical terms and notation how to derive statistical methods from probability theory. You work through proofs of various statistical methods. It covers probability theory, statistical tests, OLS, MLE, and Bayesian inference. There is some R programming where you use simulations to demonstrate theorems or analytical results.  This class is very much not applied. There are pretty hard tests and homework assignments where you prove theorems and derive corollaries. To enjoy this class you should have at least 2 quarters of college calculus and an introductory stats sequence under your belt, or a strong math background (e.g. you were a math or physics major).
If you already have good linear algebra and multivariate calculus, Taking ECON 580 and 581 is a good short-cut to getting a lot of methods covered in other classes. You could take SOC 504,505,506, CS&SS 503,504 560, and 564 or you could take ECON 580. 581 and read a few books :)
You can brush up on your calculus and stats to prepare for this class. Kaylea recommends Kahn Academy for calculus and  [https://onlinecourses.science.psu.edu/stat414 PSU (414 and 415)] for statistics.
'''ECON 581, Econometrics''' The first few weeks of ECON581 generalize 580 into the multivariate case. The second part provides regression methods (instrumental variables, two stage least squares, GMM) for when OLS assumptions are violated.
In addition to calculus you used in CS&SS 509, ECON 581 uses multivariate calculus (partial derivatives, gradients), and linear algebra. Most students in 581 are PhD students in Stats, ECON, or finance.
CS&SS and 581 cover pretty much all the econometrics useful for applied empirical research. Applied courses in CSSS will be more useful for learning about time series, longitudinal, count data and so on. ECON582 is on nonparametric models and ECON583 and ECON 584 are "Econometric Theory I and II" and will be excellent but only really for folks building new econometric theories. Most students taking 583 and 584 will be PhD students in the ECON department specializing in methods. 580 is great. 581 is good, but CS&SS 503 and 510 should cover the most useful stuff in 581 except you won't do the proofs yourself. If you are seriously considering taking 583 or 584 you might also consider switching to a PhD in economics. :)

== Other Topics ==
== Other Topics ==
Line 66: Line 26:

There are also 400 level introduction to machine learning classes in CSE and STAT, but STAT 588 looks better than either of these.
There are also 400 level introduction to machine learning classes in CSE and STAT, but STAT 588 looks better than either of these.
== More Courses ==
IMT 573 Data Science I. is focused on the theoretical foundations of data science and provides a nontechnical overivew of the key concepts and skills required for data science. It introduces common data science pipelines, data collection and storage, basic analytics, meachine learning and data visualization with industry standard statistical packages.
IMT 574 Data Science II. is the second course in the sequence offers theoretical and practical introduction to techniques for the analysis of large-scale data. The course does have prerequisites but depending on where you are in the program it can be a good choice.
Data 512 is Human-Centered Data Science. It introduces the fundamental principles of data science and its human implications. Data ethics, privacy, algorithmic bias, legal frameworks, intellectual property and more.
CSSS 594 is a 1 credit special topics course. Have a peek to see if whatever is being offered in the current quarter is something your interested in.
CSE 160 is a 3 credit introduction to data manipulation in Python. It is an undergraduate course but if youre coming in unfamiliar with how to manipulate your dataset this course can be helpful. *it is intended for students without prior programming experience*
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)