UW Statistics Courses: Difference between revisions

From CommunityData
(→‎Advanced Statistics Courses: adds review of 564, reorganizes order to be numeric)
Line 35: Line 35:


There are many useful courses offered by the Center for Statistics in the Social Sciences (CSSS). Most of CSSS classes are applied and will give you a chance to apply the methods that you learn to your own projects. Try to take advantage of these opportunities to make progress on your research. CSSS 509 is an exception, and will be discussed below under econometrics.  
There are many useful courses offered by the Center for Statistics in the Social Sciences (CSSS). Most of CSSS classes are applied and will give you a chance to apply the methods that you learn to your own projects. Try to take advantage of these opportunities to make progress on your research. CSSS 509 is an exception, and will be discussed below under econometrics.  
'''CSSS566, Causal Inference''' CS&SS 566 is good. It covers experimental, instrumental variable, and quasi-experimental designs, structural equation modeling, and DAGs. It takes a relatively philosophical and theoretical approach to causality and shows that common assumptions about causal identification can be wrong (e.g adding a variable to your model ''can'' introduce bias, in theory).


'''CS&SS504, Applied Regression''' is an applied, but still technical course on regression. It may vary based on who is teaching the class. This will be the default option for [[CDSC]] students.
'''CS&SS504, Applied Regression''' is an applied, but still technical course on regression. It may vary based on who is teaching the class. This will be the default option for [[CDSC]] students.


'''CS&SS 560, hierarchical modeling''' is important. Hierarchical models are the bread and butter for working with datasets that have community level variables and individual level variables, or that have longitudinal data.
'''CS&SS 560, hierarchical modeling''' is important. Hierarchical models are the bread and butter for working with datasets that have community level variables and individual level variables, or that have longitudinal data.
'''CSSS564, Bayesian Statistics for the Social Sciences''' CS&SS 564 is very good. This may vary by the instructor/text, but in 2023 it was taught using R/Jags/Stan with a project and no tests; the content is a nice blend of mathematical and applied perspectives. There are a lot of online resources that accompany the text so you can learn/re-learn the material a few different ways. It's a fair amount of work because you are building familiarity with doing a lot of simulation and digging your hands into how models are working, but the pre-requisites are low; it's not brain-breaking, just some solid grinding and that takes time. Probably easier than 560 because you will review basics of probability, binomial model, etc. from the intro-sequence but in a Bayesian way. That said, the R is a bit more intense in 564 than it is in 560.
'''CSSS566, Causal Inference''' CS&SS 566 is good. It covers experimental, instrumental variable, and quasi-experimental designs, structural equation modeling, and DAGs. It takes a relatively philosophical and theoretical approach to causality and shows that common assumptions about causal identification can be wrong (e.g adding a variable to your model ''can'' introduce bias, in theory).


== Mathematical Statistics and Econometrics ==
== Mathematical Statistics and Econometrics ==

Revision as of 19:11, 4 May 2023

First Year Introduction Sequences

A sequence is typically going to be a 2-3 quarter group of classes that will give you solid basis into statistics. These will all cover probability, introductory statistics, and statistical programming in R, Stata, SPSS, SAS, etc. They should each cover hypothesis testing and statistical inference, descriptive statistics, some visualization, and linear regression. They might go further or touch on other things as well. Many of these sequences will also cover more basic features of quantitative social scientific research like operationalization and measure construction, experiment design, etc. A sequence like is the foundation for quantitative analysis and statistics but it is not a complete training. In almost all cases, it will likely need to be supplemented with additional classes. All CDSC students should take a sequence during their first year.

These are listed more or less in the order of recommendation although different courses will make sense for different students:

Biostatitics (BIOST) Biostats. Applied Biostats II (518) is an excellent course, well-taught, with a tight relationship between theory, method, and application. As of Summer '18, no one in the CDSC has taken Applied Biostats I (517), but it's this year's recommendation for a first class and we'll update this page when possible.

Sociology (SOC): SOC504, SOC505, and SOC506. These are good courses but are quite applied. For CDSC members, this would be the easiest minimum option and would be slightly discouraged.

Communication (COM): COM520 + COM521. Some combination of a quantitative research design and basic social scientific epistemology and design. COM520 or COM520 taught by Mako but it is a truly introductory stats class with a strong emphasis on application in GNU R. Like the SOC sequence, this would be discouraged for CDSC folks who would be encouraged to take a more technical course.

Political Science (POLS): This sequence begins with POLS500 in the autumn which is similar to COM520/COM521 and is an introduction to quantitative research in the social sciences.

POL501/CS&SS501 focuses on "testing theories with empirical evidence. Examines current topics in research methods and statistical analysis in political science. Content varies according to recent developments in the field and with interests of instructor."

POLS503/CS&SS503 is Advanced Quantitative Political Methodology and might be a good choice for a 2nd or 3rd quarter in statistics. It is a slightly mathematical applied statistics class which introduces regression and multi-variable techniques for developing causal arguments using statistics. The course stuck fairly closely to the two textbooks Real Stats and Mastering Metrics (an undergrad textbook) in Spr 2018 and the course sites from the last two years [1][2] are published on GitHub, with instructor notes at [3] so take a look there if you want a preview of what will be covered. The class is sponsored by Political Science, so some of the content is influenced by their disciplinary norms.

Economics/Stastics (ECON/STAT): If you already have good linear algebra and multivariate calculus, Taking ECON 580 and 581 is a good short-cut to getting a lot of methods covered in other classes. You could take SOC 504,505,506, CS&SS 503,504 560, and 564 or you could take ECON 580, 581 and read a few books. This is the ideal class for any CDSC folks although it will likely be a poor choice until you have a relatively strong mathematics background. Details on these classes are provided below.

Education Psychology (EDPSY): EDPSY490, EDPSY491 strong focus on psychometric techniques drawn from psychology. Should be relatively easier and very applied but will not provide a good training for research using non-experimental settings. Due to the rigor and the focus on experiments, ANOVA, and SPSS, this is discouraged for CDSC members which will typically be dealing with observational data or should, at the very least, build the skills necessary to do so.

Other First-Year Courses

Although they will not be taught in the sequences above, CDSC members should be comfortable with the material taught in these very short courses and camps by the end of their first year. Taking these courses is a good way to make sure that happens!

Math Camp: Math Camp is an intensive one-week introductory course offered during the summer. Review of Mathematics for Social Scientists (CS&SS 505): A 1-credit course covers the same material as Math Camp but at a slower pace.

Math Camp/CSSS 505 are recommended for incoming students and students who are entering their 2nd year and plan to take an advanced statistics course. It will assume basic math skills through high school algebra but nothing else. Topics reviewed are algebra, functions and limits, differentiation, maximization of functions, integration, matrix algebra, linear equations and least squares, and probability. Typically offered during winter and spring quarters.

Introduction to R (CS&SS 508): Another 1-credit class that will familiarize students with the R environment for statistical computing.

Advanced Statistics Courses

There are many useful courses offered by the Center for Statistics in the Social Sciences (CSSS). Most of CSSS classes are applied and will give you a chance to apply the methods that you learn to your own projects. Try to take advantage of these opportunities to make progress on your research. CSSS 509 is an exception, and will be discussed below under econometrics.

CS&SS504, Applied Regression is an applied, but still technical course on regression. It may vary based on who is teaching the class. This will be the default option for CDSC students.

CS&SS 560, hierarchical modeling is important. Hierarchical models are the bread and butter for working with datasets that have community level variables and individual level variables, or that have longitudinal data.

CSSS564, Bayesian Statistics for the Social Sciences CS&SS 564 is very good. This may vary by the instructor/text, but in 2023 it was taught using R/Jags/Stan with a project and no tests; the content is a nice blend of mathematical and applied perspectives. There are a lot of online resources that accompany the text so you can learn/re-learn the material a few different ways. It's a fair amount of work because you are building familiarity with doing a lot of simulation and digging your hands into how models are working, but the pre-requisites are low; it's not brain-breaking, just some solid grinding and that takes time. Probably easier than 560 because you will review basics of probability, binomial model, etc. from the intro-sequence but in a Bayesian way. That said, the R is a bit more intense in 564 than it is in 560.

CSSS566, Causal Inference CS&SS 566 is good. It covers experimental, instrumental variable, and quasi-experimental designs, structural equation modeling, and DAGs. It takes a relatively philosophical and theoretical approach to causality and shows that common assumptions about causal identification can be wrong (e.g adding a variable to your model can introduce bias, in theory).

Mathematical Statistics and Econometrics

Do you want to get serious about Statistics? Do you want to learn why statistical methods depend on assumptions and not just *how* to apply them? Do you already have a strong math or statistics background? Then these classes are for you.

ECON 580/CS&SS 509, Mathematical Statistics This class is great. It is a big class with quantitative methods folks from all over the social sciences. It is a "meta-methods class." The goal of the course is for you to understand in mathematical terms and notation how to derive statistical methods from probability theory. You work through proofs of various statistical methods. It covers probability theory, statistical tests, OLS, MLE, and Bayesian inference. There is some R programming where you use simulations to demonstrate theorems or analytical results. This class is very much not applied. There are pretty hard tests and homework assignments where you prove theorems and derive corollaries. To enjoy this class you should have at least 2 quarters of college calculus and an introductory stats sequence under your belt, or a strong math background (e.g. you were a math or physics major).

You can brush up on your calculus and stats to prepare for this class. Kaylea recommends Kahn Academy for calculus and PSU (414 and 415) for statistics.

ECON 581, Econometrics The first few weeks of ECON581 generalize 580 into the multivariate case. The second part provides regression methods (instrumental variables, two stage least squares, GMM) for when OLS assumptions are violated. In addition to calculus you used in CS&SS 509, ECON 581 uses multivariate calculus (partial derivatives, gradients), and linear algebra. Most students in 581 are PhD students in Stats, ECON, or finance.

CS&SS and 581 cover pretty much all the econometrics useful for applied empirical research. Applied courses in CSSS will be more useful for learning about time series, longitudinal, count data and so on. ECON582 is on nonparametric models and ECON583 and ECON 584 are "Econometric Theory I and II" and will be excellent but only really for folks building new econometric theories. Most students taking 583 and 584 will be PhD students in the ECON department specializing in methods. 580 is great. 581 is good, but CS&SS 503 and 510 should cover the most useful stuff in 581 except you won't do the proofs yourself. If you are seriously considering taking 583 or 584 you might also consider switching to a PhD in economics. :)

Other Topics

Machine Learning

Sometimes statistical inference is very hard. Prediction is often easier and sometimes predicting an outcome can be a useful contribution. Prediction and is also useful for constructing variables (e.g. content analysis). Supervised machine learning is essentially giving up on inference and focusing on prediction. "Unsupervised machine learning" (i.e. clustering) can be very useful for operationalization.

If you do not have a computer science background, STAT 588 looks like a good place to get some quick and dirty machine learning. Fitting machine learning models can be difficult when you data is very big (as ours often is). STAT 548 is a good class to learn how to solve these problems. It mainly focuses on stochastic optimization. It isn't very difficult, but you will get more out of it if you are good at linear algebra and multivariate calculus.

There are also 400 level introduction to machine learning classes in CSE and STAT, but STAT 588 looks better than either of these.


More Courses

IMT 573 Data Science I. is focused on the theoretical foundations of data science and provides a nontechnical overivew of the key concepts and skills required for data science. It introduces common data science pipelines, data collection and storage, basic analytics, meachine learning and data visualization with industry standard statistical packages.

IMT 574 Data Science II. is the second course in the sequence offers theoretical and practical introduction to techniques for the analysis of large-scale data. The course does have prerequisites but depending on where you are in the program it can be a good choice.

Data 512 is Human-Centered Data Science. It introduces the fundamental principles of data science and its human implications. Data ethics, privacy, algorithmic bias, legal frameworks, intellectual property and more.

CSSS 594 is a 1 credit special topics course. Have a peek to see if whatever is being offered in the current quarter is something your interested in.

CSE 160 is a 3 credit introduction to data manipulation in Python. It is an undergraduate course but if youre coming in unfamiliar with how to manipulate your dataset this course can be helpful. *it is intended for students without prior programming experience*