Advanced Computational Communication Methods (Summer 2023): Difference between revisions

From CommunityData
 
(32 intermediate revisions by 9 users not shown)
Line 348: Line 348:
'''Work-in-progress Presentations:'''
'''Work-in-progress Presentations:'''
* Elizabeth Thompson
* Elizabeth Thompson
* Muqing Liu


== Week 11: Share and discuss works-in-progress (July 25) ==
== Week 11: Share and discuss works-in-progress (July 25) ==
Line 369: Line 368:
'''Work-in-progress Presentations:'''
'''Work-in-progress Presentations:'''
* Ryan Funkhouser
* Ryan Funkhouser
*
* Muqing Liu


'''Assignment Due:'''
'''Assignment Due:'''
Line 380: Line 379:
== Visualization in Python ==
== Visualization in Python ==


=== Participants ===
===Resources added by Ryan===
* Ryan Funkhouser
'''Refresher/Basic resources'''
* Quick video overview: https://www.youtube.com/watch?v=a9UrKTVEeZA
* Longer, but still simple, video course outlining visualization techniques: https://www.simplilearn.com/tutorials/python-tutorial/data-visualization-in-python
* And of course, don't forget that one of the greatest resources for getting input on how to change visualizations is ChatGPT: https://chat.openai.com/
 
'''Understanding which visualization libraries to learn/use'''
* A useful academic article suggesting Matplotlib, Seaborn, and Plotly as the best: - https://ieeexplore.ieee.org/abstract/document/8757088?casa_token=REAm2SOC93MAAAAA:fCJHaTYgHA8FXZMbVEdZcevcXKsNJBBvB83F5HGgSEh504YPfROjnI08K1f2CJ1b6ZDVhhxF
* An excellent article on Medium about what use case scenarios are best for each of Matplotlib, Seaborn, and Plotly: https://medium.com/mlearning-ai/comparing-python-libraries-for-visualization-b2eb6c862542#:~:text=Matplotlib%20is%20a%20great%20choice,choice%20for%20creating%20interactive%20visualizations.
 
'''Matplotlib'''
* Excellent general overview: https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed
* Great, more in-depth guide on how to really take visualizations to the next level: https://towardsdatascience.com/5-steps-to-build-beautiful-bar-charts-with-python-3691d434117a
* Documentation: https://matplotlib.org/stable/index.html
 
'''Seaborn'''
* Great overview of Seaborn: https://medium.com/insight-data/data-visualization-in-python-advanced-functionality-in-seaborn-20d217f1a9a6
* Third-party documentation-style site that helps make it really easy to figure out how to do each kind of visualization: https://www.geeksforgeeks.org/python-seaborn-tutorial/
* Documentation: https://seaborn.pydata.org/
 
'''Plotly'''
* Excellent quick overview of what Plotly can do and how to use it: https://towardsdatascience.com/the-next-level-of-data-visualization-in-python-dd6e99039d5e
* Third-party documentation-style site that helps make it really easy to figure out how to do each kind of visualization: https://www.geeksforgeeks.org/python-plotly-tutorial/
* Documentation: https://plotly.com/python/
 
'''Visualization for Exploratory Data Analysis'''
* Academic article that goes over objectives and processes for EDA using visualizations: https://www.researchgate.net/profile/Dr-Subhendu-Pani/publication/337146539_IJITEE/links/5dc70b124585151435fb427e/IJITEE.pdf
* Great article that shows how visualizations are really useful for EDA in even more NLP scenarios. For example, what are the distributions of sentiments?: - https://medium.com/towards-data-science/a-complete-exploratory-data-analysis-and-visualization-for-text-data-29fb1b96fb6a
* EDA applied to Machine Learning contexts: https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-1-exploratory-data-analysis-with-pandas-de57880f1a68 and visualizations applied to machine learning contexts: https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-2-visual-data-analysis-in-python-846b989675cd
* A gentle introduction to EDA: https://towardsdatascience.com/a-gentle-introduction-to-exploratory-data-analysis-f11d843b8184


== Advanced Pandas ==
== Advanced Pandas ==
[https://pandas.pydata.org/pandas-docs/stable/index.html '''Pandas Documentation''']
[https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf '''Pandas Cheatsheet''']
'''Tutorials:'''
* [https://www.packtpub.com/product/pandas-1x-cookbook-second-edition/9781839213106 Pandas Cookbook]
* [https://tomaugspurger.net/posts/modern-1-intro/ Modern Pandas]
* [https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS Video Series of Tutorials]
* [https://wesmckinney.com/book/ Python for Data Analysis]
* [https://realpython.com/pandas-project-gradebook/ Make a Gradebook with Pandas]
* [https://jakevdp.github.io/PythonDataScienceHandbook/ Python Data Science Handbook]
'''GPT & Pandas:'''
* [https://www.sharpsightlabs.com/blog/gpt-writes-bad-pandas-code/ GPT Writes Horrible Pandas Code]
* [https://github.com/rvanasa/pandas-gpt Package to have GPT Write Good Pandas Code]
'''Extra:'''
[https://towardsdatascience.com/one-word-of-code-to-stop-using-pandas-so-slowly-793e0a81343c Make Pandas Run Faster with Swifter]
'''Class Tutorial:'''


Christina (I think this is where you want me to sign up? - lol)
'''[https://drive.google.com/file/d/162nO8u2Sr3bPOqoq8KLLhKR8OhmosGjJ/view?usp=sharing Jupyter Notebook]'''


== Agent-based modeling ==
== Agent-based modeling ==
Line 400: Line 448:


* [https://doi.org/10.1080/19312458.2021.1986478 Waldherr, A., Hilbert, M., & González-Bailón, S. (2021). Worlds of agents: Prospects of agent-based modeling for communication research. Communication Methods and Measures, 15(4), 243–254. https://doi.org/10.1080/19312458.2021.1986478]
* [https://doi.org/10.1080/19312458.2021.1986478 Waldherr, A., Hilbert, M., & González-Bailón, S. (2021). Worlds of agents: Prospects of agent-based modeling for communication research. Communication Methods and Measures, 15(4), 243–254. https://doi.org/10.1080/19312458.2021.1986478]
* [https://ijoc.org/index.php/ijoc/article/view/10588 Waldherr, A., & Wettstein, M. (2019). Bridging the gaps: Using agent-based modeling to reconcile data and theory in computational communication science. International Journal of Communication, 13, 3976–3999. https://ijoc.org/index.php/ijoc/article/view/10588]
* [http://www.jstor.org/stable/3069238 Macy, M. W., & Willer, R. (2002). From Factors to Actors: Computational Sociology and Agent-Based Modeling. Annual Review of Sociology, 28, 143–166.]


'''Seminal papers'''
'''Seminal papers'''
Line 419: Line 472:


'''Examples of agent-based models in communication and other research fields'''
'''Examples of agent-based models in communication and other research fields'''
* [https://doi.org/10.1086/681254 DellaPosta, D., Shi, Y., & Macy, M. (2015). Why Do Liberals Drink Lattes? American Journal of Sociology, 120(5), 1473–1511. https://doi.org/10.1086/681254]


* [https://doi.org/10.1016/j.ecolecon.2022.107651 Foramitti, J. (2023). A framework for agent-based models of human needs and ecological limits. Ecological Economics, 204, 107651. https://doi.org/10.1016/j.ecolecon.2022.107651]
* [https://doi.org/10.1016/j.ecolecon.2022.107651 Foramitti, J. (2023). A framework for agent-based models of human needs and ecological limits. Ecological Economics, 204, 107651. https://doi.org/10.1016/j.ecolecon.2022.107651]
Line 468: Line 523:
== SQL ==
== SQL ==
Muqing Liu
Muqing Liu
Introduction to SQL:
General introduction to SQL https://www.khanacademy.org/computing/computer-programming/sql
Relational model and the foundation of SQL https://dl.acm.org/doi/10.1145/362384.362685
Principles and rules for relational database management systems https://www.dcs.warwick.ac.uk/~hugh/TTM/
Textbook Guidance to write SQL:
"The complete idiot's guide to SQL" Steven Holzner This is a beginner-friendly guide introduces SQL concepts and commands.  https://www.amazon.com/Complete-Idiots-Guide-SQL/dp/1615641092
"SQL and Relational Theory: How to write accurate SQL code" C.J. Date
This book provides a comprehensive guide to understand SQL and relational theory https://www.amazon.com/SQL-Relational-Theory-Write-Accurate/dp/1449316409
"SQL pocket guide" Jonathan Gennick
This book is a handy reference for SQL syntax and command https://www.amazon.com/SQL-Pocket-Guide-Usage/dp/1449394094
Online courses:
SQL for beginners https://www.udemy.com/course/sql-for-beginners/
This beginner-friendly course covers database design, querying with SQL, data manipulation, and database management.
SQL essential training  https://www.linkedin.com/learning/sql-essential-training/
This course covers basic SQL commands and querying techniques.
The complete SQL bootcamp  https://www.udemy.com/course/the-complete-sql-bootcamp/
This course covers both SQL fundamentals and advanced concepts. It also includes real-world projects and hands-on exercises.
SQL for data science https://www.coursera.org/learn/sql-for-data-science
This course is designed for data science professionals to use SQL for data manipulation and analysis. It covers SQL queries, joins, and aggregations for data science tasks.
Advanced SQL for query tuning  https://www.pluralsight.com/courses/advanced-sql-query-tuning
This course is for intermediate to advanced SQL users looking to optimize their SQL queries and improve database performance.
SQL tutorial videos:
MySQL tutorial for beginners https://www.youtube.com/watch?v=7S_tz1z_5bA
SQL Tutorial - Full Database Course for Beginners: https://www.youtube.com/watch?v=HXV3zeQKqGY
SQL Advanced Tutorial|Advanced SQL Tutorial With Examples https: //www.youtube.com/watch?v=M-55BmjOuXY
The use of SQL in data science:
A Comparative Analysis on different aspects of Database Management System https://www.researchgate.net/publication/352178674_A_Comparative_Analysis_on_different_aspects_of_Database_Management_System
This paper compared different database management system for handling big data storage and processing tasks.
Twitter Sentiment Analysis Approaches: A Survey https://www.learntechlib.org/p/217980/
Analysis of Healthcare Data using SQL https://www.linkedin.com/pulse/analysis-healthcare-data-using-sql-kristopher-bosch/
SQL for Stock Market Analysis https://medium.datadriveninvestor.com/sql-for-stock-market-analysis-f2145031e125


== Command line ==
== Command line ==


== Building your own language model==
== Large language models==
Dyuti
 
Resources posted by Dyuti
 
-[https://www.techtarget.com/searchenterpriseai/definition/languagemodeling#:~:text=Importance%20of%20language%20modeling&text=It%20is%20the%20reason%20that,other%20to%20a%20limited%20extent What is a language model and why do we need it?]
 
- [https://medium.com/analytics-vidhya/a-comprehensive-guide-to-build-your-own-language-model-in-python-5141b3917d6d A comprehensive guide to build your own language model]
 
LLMs and Research:
 
Large Language Models and Underrepresented Languages [https://arxiv.org/ftp/arxiv/papers/2007/2007.05872.pdf Paper]
 
-Social Biases:
 
-[http://proceedings.mlr.press/v139/liang21a.html Towards Understanding and Mitigating Social Biases in Language Models]
 
- [https://medium.com/@arpitnarain/unmasking-bias-assessing-fairness-in-large-language-models-a722624e4483 Unmasking Bias —Assessing Fairness in Large Language Models]
 
- [https://aclanthology.org/2022.bigscience-1.6.pdf Pipelines for Social Bias Testing of Large Language Models]
 
- [https://huggingface.co/blog/evaluating-llm-bias#evaluating-language-model-bias-with-%F0%9F%A4%97-evaluate Evaluating Language Model Bias with 🤗 Evaluate ]
 
Mitigating Bias:
 
- [https://www.aneesmerchant.com/personal-musings/large-language-models-and-bias-an-unresolved-issue#:~:text=Bias%20in%20LLMs%20can%20manifest,these%20models%20are%20trained%20on. LLM and Biases]
 
 
- [https://news.mit.edu/2023/large-language-models-are-biased-can-logic-help-save-them-0303 logic aware models- MIT]
 
LLM and Research:


- [https://proceedings.mlr.press/v202/aher23a/aher23a.pdf Using LLMs to Simulate Multiple Humans and Replicate Human Subject Studies] (I am a little dicey about the ethics of it? Would like to hear what everyone else thinks)


== Cluster / large-scale computing ==
== Cluster / large-scale computing ==


Elizabeth: Topic presentation and additional resources
- Google intro documentation: https://cloud.google.com/architecture/using-clusters-for-large-scale-technical-computing
- An cool example tutorial of how UCLA uses a cluster: https://github.com/chris-german/Hoffman2Tutorials
- Link for Purdue RCAC: https://www.rcac.purdue.edu/compute
- A workshop summary on reproducibility and large-scale computing: https://arxiv.org/ftp/arxiv/papers/1412/1412.5557.pdf
- Basics of high performance computing: https://hbctraining.github.io/Intro-to-shell-flipped/lessons/08_HPC_intro_and_terms.html
- RedHat and HPC: https://www.redhat.com/en/products/high-performance-computing
-


== Network analysis ==
== Network analysis ==
Hazel
* [https://youtu.be/flwcAf1_1RU Network Analysis Introduction Video]
* [https://youtu.be/flwcAf1_1RU Network Analysis]
 
Resources added by Hazel
 
NetworkX
* [https://towardsdatascience.com/network-analysis-d734cd7270f8 What is Network Analysis]
* [https://www.researchgate.net/publication/236407765_Exploring_Network_Structure_Dynamics_and_Function_Using_NetworkX Exploring Network Structure, Dynamics, and Function Using NetworkX]
* [https://youtu.be/VetBkjcm9Go Crash Course of NetworkX on Youtube]
*[https://trenton3983.github.io/files/projects/2020-05-21_intro_to_network_analysis_in_python/2020-05-21_intro_to_network_analysis_in_python.html Python Notebook Introduction of NetworkX]
 
Applications of NetworkX in academic research
*[https://doi.org/10.1080/13683500.2020.1777950 Valeri, M., & Baggio, R. (2020). Italian tourism intermediaries: A social network analysis exploration. Current Issues in Tourism, 24(9), 1270–1283.]
*[https://doi.org/10.1016/j.gloenvcha.2015.03.006 Williams, H. T. P., McMurray, J. R., Kurz, T., & Hugo Lambert, F. (2015). Network analysis reveals open forums and Echo Chambers in social media discussions of climate change. Global Environmental Change, 32, 126–138.]
 
iGraph
*[https://towardsdatascience.com/newbies-guide-to-python-igraph-4e51689c35b4 Newbies Guide to Python-igraph]
*[https://www.cs.rhul.ac.uk/home/tamas/development/igraph/tutorial/tutorial.html iGraph Tutorial]
*[https://www.youtube.com/watch?v=DuTROLV1760 iGraph with R Video Tutorial]
 
Application of iGraph in academic research
*[https://doi.org/10.1016/j.socnet.2015.07.003 González-Bailón, S., & Wang, N. (2016). Networked discontent: The anatomy of protest campaigns in social media. Social Networks, 44, 95–104]
*[https://doi.org/10.1187/cbe.13-08-0162 Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014). Understanding Classrooms through Social Network Analysis: A Primer for Social Network Analysis in Education Research. CBE—Life Sciences Education, 13(2), 167–178]
*[https://doi.org/10.1080/01292986.2018.1453849 Kokil Jaidka, Saifuddin Ahmed, Marko Skoric & Martin Hilbert (2019) Predicting elections from social media: a three-country, three-method comparative study, Asian Journal of Communication, 29:3, 252-273]


== Object-oriented programming ==
== Object-oriented programming ==

Latest revision as of 05:10, 1 August 2023

Course Information[edit]

COM 682: Advanced Computational Communication Methods
Location: Discord
Class Hours: Tuesdays, 10–12 ET in the #General channel on Matrix

Instructor[edit]

Instructor: Jeremy Foote
Email: jdfoote@purdue.edu
Office Hours: By appointment

Course Overview and Learning Objectives[edit]

I teach an Intro to Programming and Data Science course that gives students an introduction to programming in Python, and some basic skills for gathering and analyzing data from the web.

There are many, many aspects of computational communication research that we don't cover in that class. This class is intended to take the next step in providing resources for students who want to do computational social science research.

That next step typically looks different for different students, depending on what they want to research. One goal of this class is to collect useful resources for learning many of the types of tools and methods that computational social scientists typically use. Some of these will be explicitly discussed in class, but others will not.

In particular, following conversation with group members, we will focus on a fairly deep dive into computational text analysis and reproducible workflows. Other topics will be more self-organized and self-directed.

I will consider this class a complete success if, at the end, every student can:

  • Understand in and engage in creating open, reproducible workflows for their academic research
  • Understand some of the key tools for doing computational text analysis, including topic models, word embeddings, machine learning classifiers, and LLMs
  • Learn how to identify the key texts, software libraries, and resources for learning a new computational method


Required resources and texts[edit]

Laptop[edit]

Most of this class will be asynchronous, so you will need access to your own laptop. It is assumed that you have experience running Python programs on your computer, and have the needed hardware and software to do so. We will mostly be using Jupyter Notebooks, but you are welcome to use other IDEs.

In order to participate in class, you will need to sign up for our Matrix "space", using this link. If you haven't used Matrix before (most people haven't), then you will need to create an account and choose software to use to connect to the room. The default software is Element, which I highly recommend using. There is a web version of Element, but I'd recommend downloading desktop and mobile apps at https://element.io/download.

Readings[edit]

  • We will be working together to identify readings, videos, and other resources

Course logistics[edit]

Note About This Syllabus[edit]

This is a very collaborative class, and one of the outcomes of the class will be to update this wiki with resources regarding topics of interest to class members. Therefore, the syllabus will be in flux both before the class and during the class.

Lectures[edit]

Our class time will follow a "flipped" classroom model. We will identify asynchronous materials (readings, recorded lectures, assignments, etc.) which you will work on before class and we will use our class time to review concepts, identify confusion, and synthesize.

Because we only have a few hours a week, some of this work will also happen outside of class, through conversations in the Matrix space.

For the first ~half of class we will all be working on the same topics. For the second half, we will use part of class time for students to present on either a topic or a project that they are working on.

Office hours and email[edit]

  • I will be traveling for much of June and part of July. I'm happy to make time to meet, but my schedule will be much less consistent than during a typical semester.
  • I am also available by email. You can reach me at jdfoote@purdue.edu.

Assignments[edit]

There will be two primary assignments for this class: one topic exploration, and a final project.

Topic exploration[edit]

One goal of this class is to learn to identify and assess tools, libraries, and learning resources. Each student will identify a topic that they would like to be in charge of. This will involve researching the conceptual background of the topic, identify and evaluating resources related to the topic, and sharing the best resources on this wiki page.

Ideally, this will include at least:

  • one or more academic papers about the topic
  • one or more Python libraries
  • one or more walkthroughs of using the method in Python (text and/or video)

Final Project[edit]

The final project can take one of two forms: either an explainer of a computational method or a research project.

Explainer[edit]

For the explainer, you will create a polished explainer of one of the topics covered in class. This should include an explanation of the topic/method, with a focus on how and why the topic is useful for doing social science research, as well as its limitations. You should also include an explanation of how to use the associated method, probably using public data.

Ideally, this explainer will fill a gap in the resources that already exist about a topic. One gap that often exists is resources specifically designed for communication scholars or social scientists more broadly. Or, resources to help more novice programmers, etc.

The format of this explainer is up to you, but it will likely at least include a Jupyter Notebook (or similar); I think that a video walkthrough of a notebook is also nice, but I'm open to other options.

Research Project[edit]

The other option for a final project is to push forward one of your quantitative research projects that would benefit from one or more computational methods. I strongly urge you to work on a project that will further your academic career outside of the class. There are many ways that this can happen. Some obvious options are to prepare a project that you can submit for publication, that you can use as pilot analysis that you can report in a grant or thesis proposal, and/or that fulfills a degree requirement. I prefer that you do projects on your own but it may be possible to work as a small team (maximum 3 people). Team projects are expected to be more ambitious than individual projects.

Planning Document[edit]

If you would like to take this option, you will submit a 2-4 page planning document of what you would like to do within the first few weeks of class, just to make sure we are on the same page about the scope of what you want to accomplish and so that I can give some initial feedback.

The project planning document is a basic shell/outline of an empirical quantitative research paper. The planning document should focus around three big questions:

  • Why are you planning to do this analysis? Make sure to introduce any background information about the topic, the community, your business, or anything else that will be required to properly contextualize your study.
  • How will you get the data to analyze? Describe the data sources will you collect and how they will be collected.
  • How will you analyze the data? Describe the visualizations, tables, or statistical tests that you will produce.

One approach that I have found helpful is outlined on this wiki page.

Project report[edit]

Final projects will likely look different, depending on the stage of the project when beginning the class. Typically, you will write a document or a Jupyter Notebook that will ideally provide the foundation for a high quality short research paper that you might revise and submit for publication. I do not expect the report to be ready for publication, but it should contain polished drafts of all the necessary components of a scholarly quantitative empirical research study. In terms of the structure, please see the page on the structure of a quantitative empirical research paper.

The great thing about a Jupyter Notebook is that it allows you to provide data, code, and any documentation sufficient to enable the replication of all analysis and visualizations. If you choose to write the report as a Word document, then you will need to include the code in a separate file.

Because the emphasis in this class is on methods and because I'm not an expert in each of your fields, I'm happy to assume that your paper, proposal, or thesis chapter has already established the relevance and significance of your study and has a comprehensive literature review, well-grounded conceptual approach, and compelling reason why this research is important. As a result, you do not need to focus on these elements of the work in your written submission. Instead, feel free to start with a brief summary of the purpose and importance of this research followed by an introduction of your research questions or hypotheses. If you provide more detail, that's fine, but I won't give you detailed feedback on these parts and they will not figure prominently in my assessment of the work.

Jupyter Notebooks do not have all of the tools for citations that Word or LaTeX or even Google Docs have, so while I expect you to cite related work your references section does not need to be as polished as citation management software would make it.

Grades[edit]

This course will follow a "self-assessment" philosophy. I am more interested in helping you to learn things that will be useful to you than in assigning grades. The university still requires grades, so you will be leading the evaluation of your work. At the beginning of the course, I will encourage you to think about and write down what you hope to get out of the course. Halfway throughthe course you will reflect on what you have accomplished thus far, how it has met, not met, or exceeded expectations, based both on rubrics and personal goals and objectives. At each of these stages you will receive feedback on your assessments. By the end of the semester, you should have a clear vision of your accomplishments and growth, which you will turn into a grade. As the instructor-of-record, I maintain the right to disagree with your assessment and alter grades as I see fit, but any time that I do this it will be accompanied by an explanation and discussion. These personal assessments, reflecting both honest and meaningful reflection of your work will be the most important factor in final grades.

I suggest that we use the following rubric in our assessment:

  • 30%: class participation, including attendance, participation in discussions and group work
  • 30%: topic exploration
  • 40%: Final Project paper/Jupyter notebook.

My interpretation of grade levels (A, B, C, D/F) is the following:

A: Reflects work the exceeds expectations on multiple fronts and to a great degree. Students reaching this level of achievement will:

  • Do what it takes to learn the programming principles and techniques, including looking to outside sources if necessary.
  • Engage thoughtfully with an ambitious final project.
  • Take intellectual risks, offering interpretations based on synthesizing material and asking for feedback from peers.
  • Share work early allowing extra time for engagement with others.

B: Reflects strong work. Work at this level will be of consistently high quality. Students reaching this level of achievement will:

  • Be more safe or consistent than the work described above.
  • Ask meaningful questions of peers and engage them in fruitful discussion.
  • Exceed requirements, but in fairly straightforward ways
  • Compose complete and sufficiently detailed reflections.
  • Complete nearly all of the programming assignments at a high level

C: This reflects meeting the minimum expectations of the course. Students reaching this level of achievement will:

  • Turn in and complete required assignments on time.
  • Be collegial and continue discussion, through asking simple or limited questions.
  • Not complete assignments or turn some in in a hasty or incomplete manner.

D/F: These are reserved for cases in which students do not complete work or participate. Students may also be impeding the ability of others to learn.

Schedule[edit]

NOTE: This section will be modified throughout the course to meet the class's needs. Check back in often.

Each week will include the topic of the week. This is where we will gather and organize resources regarding each topic. Please be bold in editing this portion of the wiki to add or arrange resources.


Week 1: Welcome! (May 16)[edit]

Assignment Due:

  • None

Required Readings:


Agenda:

  • Class overview and expectations — We'll walk through this syllabus.
  • Make assignments for topic exploration

Slides:

Welcome slides

Week 2: Reproducible Research I (May 23)[edit]

Resources:

Slides:


Organization[edit]

Key ideas:

  • Folder structure
    • Different options, but separate code from data
    • Jeremy's approach:
my_cool_project
|
|-- README.md # Explanation of project and how to navigate it
|-- Snakefile # Or Makefile - workflow tool
|
|-- data/
|   |-- raw_data/
|   |-- processed_data/
|
|-- code/
|
|-- results/
|   |-- figures/
|
|-- papers/
|
|-- presentations/


Data Management[edit]

Key ideas:

  • Back up raw data
  • Keep raw data (and make it read-only)
  • Step one is to clean the data: create the data you wish you received
    • Name variables well
    • Use a tidy data structure
  • Share data (when possible)

Code management[edit]

Key ideas:

  • Version control
  • Don't repeat yourself (DRY)
  • Build at least a few high-level test cases


Week 3: Reproducible Research II (May 30)[edit]

Resources:

Slides:


Reproducible analyses and papers[edit]

Key ideas:

  • Some big benefits (and some drawbacks) to using text-based tools (Markdown or LaTeX)
    • Can be put in version control
    • Tools like knitr can be used to put code directly into a document
  • Make figure creation part of your workflow, have documents point to your figures directory
  • Use citation management software that integrates with your document (use Zotero)


Sharing[edit]

Key ideas:

  • Share your code and data whenever possible!
  • Lots of options - OSF.io, Harvard Dataverse, etc.
  • Share preprints online


Advanced: Workflow Management[edit]

Key ideas:

  • Tools to reproduce as much of the workflow as possible
  • README file is much better than nothing
  • Even better is a "wrapper" script that runs everything
    • Very clear exactly what is run and how
    • Some fairly simple options:


Week 4: Computational text analysis: Introduction and Key Concepts (June 6)[edit]

Resources: Text As Data: A New Framework for Machine Learning and the Social Sciences (2022). Justin Grimmer, Margaret E. Roberts, and Brandon M. Stewart.

Read chapters 1-7

Week 5: Computational text analysis: Some "traditional" approaches (June 13)[edit]

Topic modeling[edit]

Embeddings[edit]

Resources:

Classification[edit]

Semantic networks[edit]

Week 6: Computational text analysis: using LLMs for research (June 20)[edit]

Due:

  • Final project proposal (details on Brightspace)

Resources:

Intro to LLMs:

Reflections on LLMs for research:

Papers using LLMs:

Week 7: Share and discuss works-in-progress (June 27)[edit]

Assignment Due: self-assessment reflection

Topic Presentations:

  • Juan Pablo (JP) Loaiza-Ramírez

Work-in-progress Presentations:

  • Juan Pablo (JP) Loaiza-Ramírez
  • Christina Walker

Week 8: No class - July 4[edit]

Week 9: Share and discuss works-in-progress (July 11)[edit]

Topic Presentations:

  • Elizabeth Thompson
  • Ryan Funkhouser

Work-in-progress Presentations:

  • Dyuti Jha

Week 10: Share and discuss works-in-progress (July 18)[edit]

Topic Presentations:

  • Christina Walker
  • Hazel Chiu

Work-in-progress Presentations:

  • Elizabeth Thompson

Week 11: Share and discuss works-in-progress (July 25)[edit]

Visit from Bill Rand, an expert in agent-based moedeling.

Topic Presentations:

  • Dyuti Jha


Work-in-progress Presentations:

  • Hazel Chiu

Week 12: Share and discuss works-in-progress (August 1)[edit]

Topic Presentations:

  • Muqing Liu

Work-in-progress Presentations:

  • Ryan Funkhouser
  • Muqing Liu

Assignment Due:

  • Final project due

Additional Topics[edit]

This is a key goal of this page - a curated collection of resources around computational social science topics. I'll start with a list of topics, but please add others.

Visualization in Python[edit]

Resources added by Ryan[edit]

Refresher/Basic resources

Understanding which visualization libraries to learn/use

Matplotlib

Seaborn

Plotly

Visualization for Exploratory Data Analysis

Advanced Pandas[edit]

Pandas Documentation

Pandas Cheatsheet

Tutorials:

GPT & Pandas:

Extra:

Make Pandas Run Faster with Swifter

Class Tutorial:

Jupyter Notebook

Agent-based modeling[edit]

Resources added by Juan Pablo (JP) Loaiza-Ramírez

The following resources are listed in order of importance. Consider them as a "gentle" introduction to agent-based modeling.

Best papers overall


Seminal papers


Examples of agent-based models in communication and other research fields


YouTube Playlists


GitHub Repositories


Online courses


Tutorials

SQL[edit]

Muqing Liu

Introduction to SQL: General introduction to SQL https://www.khanacademy.org/computing/computer-programming/sql Relational model and the foundation of SQL https://dl.acm.org/doi/10.1145/362384.362685 Principles and rules for relational database management systems https://www.dcs.warwick.ac.uk/~hugh/TTM/

Textbook Guidance to write SQL: "The complete idiot's guide to SQL" Steven Holzner This is a beginner-friendly guide introduces SQL concepts and commands. https://www.amazon.com/Complete-Idiots-Guide-SQL/dp/1615641092 "SQL and Relational Theory: How to write accurate SQL code" C.J. Date This book provides a comprehensive guide to understand SQL and relational theory https://www.amazon.com/SQL-Relational-Theory-Write-Accurate/dp/1449316409 "SQL pocket guide" Jonathan Gennick This book is a handy reference for SQL syntax and command https://www.amazon.com/SQL-Pocket-Guide-Usage/dp/1449394094

Online courses: SQL for beginners https://www.udemy.com/course/sql-for-beginners/ This beginner-friendly course covers database design, querying with SQL, data manipulation, and database management. SQL essential training https://www.linkedin.com/learning/sql-essential-training/ This course covers basic SQL commands and querying techniques. The complete SQL bootcamp https://www.udemy.com/course/the-complete-sql-bootcamp/ This course covers both SQL fundamentals and advanced concepts. It also includes real-world projects and hands-on exercises. SQL for data science https://www.coursera.org/learn/sql-for-data-science This course is designed for data science professionals to use SQL for data manipulation and analysis. It covers SQL queries, joins, and aggregations for data science tasks. Advanced SQL for query tuning https://www.pluralsight.com/courses/advanced-sql-query-tuning This course is for intermediate to advanced SQL users looking to optimize their SQL queries and improve database performance.


SQL tutorial videos: MySQL tutorial for beginners https://www.youtube.com/watch?v=7S_tz1z_5bA SQL Tutorial - Full Database Course for Beginners: https://www.youtube.com/watch?v=HXV3zeQKqGY SQL Advanced Tutorial|Advanced SQL Tutorial With Examples https: //www.youtube.com/watch?v=M-55BmjOuXY

The use of SQL in data science: A Comparative Analysis on different aspects of Database Management System https://www.researchgate.net/publication/352178674_A_Comparative_Analysis_on_different_aspects_of_Database_Management_System This paper compared different database management system for handling big data storage and processing tasks. Twitter Sentiment Analysis Approaches: A Survey https://www.learntechlib.org/p/217980/ Analysis of Healthcare Data using SQL https://www.linkedin.com/pulse/analysis-healthcare-data-using-sql-kristopher-bosch/ SQL for Stock Market Analysis https://medium.datadriveninvestor.com/sql-for-stock-market-analysis-f2145031e125

Command line[edit]

Large language models[edit]

Resources posted by Dyuti

-What is a language model and why do we need it?

- A comprehensive guide to build your own language model

LLMs and Research:

Large Language Models and Underrepresented Languages Paper

-Social Biases:

-Towards Understanding and Mitigating Social Biases in Language Models

- Unmasking Bias —Assessing Fairness in Large Language Models

- Pipelines for Social Bias Testing of Large Language Models

- Evaluating Language Model Bias with 🤗 Evaluate

Mitigating Bias:

- LLM and Biases


- logic aware models- MIT

LLM and Research:

- Using LLMs to Simulate Multiple Humans and Replicate Human Subject Studies (I am a little dicey about the ethics of it? Would like to hear what everyone else thinks)

Cluster / large-scale computing[edit]

Elizabeth: Topic presentation and additional resources

- Google intro documentation: https://cloud.google.com/architecture/using-clusters-for-large-scale-technical-computing

- An cool example tutorial of how UCLA uses a cluster: https://github.com/chris-german/Hoffman2Tutorials

- Link for Purdue RCAC: https://www.rcac.purdue.edu/compute

- A workshop summary on reproducibility and large-scale computing: https://arxiv.org/ftp/arxiv/papers/1412/1412.5557.pdf

- Basics of high performance computing: https://hbctraining.github.io/Intro-to-shell-flipped/lessons/08_HPC_intro_and_terms.html

- RedHat and HPC: https://www.redhat.com/en/products/high-performance-computing

-

Network analysis[edit]

Resources added by Hazel

NetworkX

Applications of NetworkX in academic research

iGraph

Application of iGraph in academic research

Object-oriented programming[edit]


Screen scraping[edit]

Regular expressions[edit]

Administrative Notes[edit]

Attendance Policy[edit]

Attendance is very important and it will be difficult to make up for any classes that are missed. It is expected that students communicate well in advance to faculty so that arrangements can be made for making up the work that was missed. It is the your responsibility to seek out support from classmates for notes, handouts, and other information.

Incomplete[edit]

A grade of incomplete (I) will be given only in unusual circumstances. The request must describe the circumstances, along with a proposed timeline for completing the course work. Submitting a request does not ensure that an incomplete grade will be granted. If granted, you will be required to fill out and sign an “Incomplete Contract” form that will be turned in with the course grades. Any requests made after the course is completed will not be considered for an incomplete grade.

Academic Integrity[edit]

While I encourage collaboration, I expect that any work that you submit is your own. Basic guidelines for Purdue students are outlined here but I expect you to be exemplary members of the academic community. Please get in touch if you have any questions or concerns.

Nondiscrimination[edit]

I strongly support Purdue's policy of nondiscrimination (below). If you feel like any member of our classroom--including me--is not living up to these principles, then please come and talk to me about it.

Purdue University is committed to maintaining a community which recognizes and values the inherent worth and dignity of every person; fosters tolerance, sensitivity, understanding, and mutual respect among its members; and encourages each individual to strive to reach his or her own potential. In pursuit of its goal of academic excellence, the University seeks to develop and nurture diversity. The University believes that diversity among its many members strengthens the institution, stimulates creativity, promotes the exchange of ideas, and enriches campus life.

Students with Disabilities[edit]

Purdue University strives to make learning experiences as accessible as possible. If you anticipate or experience physical or academic barriers based on disability, you are welcome to let me know so that we can discuss options. You are also encouraged to contact the Disability Resource Center at: drc@purdue.edu or by phone: 765-494-1247.

Emergency Preparation[edit]

In the event of a major campus emergency, I will update the requirements and deadlines as needed.

Mental Health[edit]

If you or someone you know is feeling overwhelmed, depressed, and/or in need of mental health support, services are available. For help, such individuals should contact Counseling and Psychological Services (CAPS) at 765-494-6995 during and after hours, on weekends and holidays, or by going to the CAPS office of the second floor of the Purdue University Student Health Center (PUSH) during business hours.

Acknowledgements[edit]

This course is heavily based on earlier courses taught by Tommy Guy and Mako Hill at the University of Washington as well as a course taught by Laura Nelson at Northeastern University.