This module will develop
students’ understanding of quantitative analysis and impart the practical
skills necessary for carrying out advanced statistical analysis of social data
using modern statistical software and programming.
The first term of the
module is focused on statistical models and begins with simple OLS regression
and provides a framework for modelling strategy and variable selection.
Students are then taken through extensions to the basic OLS model, with
categorical predictors, interactions and non-linear terms. Next, we introduce
models for categorical outcomes: binary logistic and multinomial logit. The
term concludes with a discussion of practical topics in survey data analysis –
how to deal with complex sample designs, weighting and non-response
adjustments. The modelling framework outlined in this term builds the
foundations for advanced quantitative social science methods.
The second term of the
module introduces students to the data science concepts, techniques, and skills
necessary to perform reproducible data analysis of variety of quantitative
social data. Students will engage in hands-on reproducible data analysis
workflow using open source computational tools, including the Python
programming language, JupyterLab (and Jupyter Notebook), GitHub, and Markdown.
Prior knowledge of programming is not required and students that experience
difficulties in installing software will have the opportunity to access it
online from their laptops, tablets, or smartphones via JupyterHub. The students
will learn, in an accessible way, basic models for machine learning, causal
inference, and network analysis as well as practical data science skills,
including data wrangling and visualization of various big data sources. The
content is organized around three fundamental data science tasks—description
(and exploratory data analysis), prediction, and causal inference (which
includes experimental design). Attention is given to model evaluation and
problems of selection bias, measurement error, confounding, and overfitting.
Throughout the course are discussed issues of ethics, privacy, and fairness of
quantitative models in social sciences.
Case studies from social
sciences (e.g., decision making in criminal justice, censorship and collective
action, social networks and public health) will be used throughout the course
to provide synergy between sociological issues and data analysis. The students
will engage with data-driven exercises, which they will consolidate in research
portfolios demonstrating their data science accomplishments and employability
Transferable skills and learning outcomes
By the end of the module, you will be able to:
critically interpret, and communicate results from analysis using OLS
regression, including models with categorical predictors, interactions, and
critically interpret, and communicate results from analysis using logistic and
multinomial logit models.
with practical issues of data analysis, including complex sample designs,
weighting and non-response adjustments.
and flexibly use computational tools—Python, Jupyter, Markdown, GitHub, and
shell—to perform reproducible data analysis and communicate your results.
explore, visualize, and model your dataset using various Python libraries.
an open and reproducible research workflow ranging from raw data to research
critically interpret, and communicate results from analysis using basic models
for machine learning, causal inference, and network analysis.
and deal with issues of selection bias, measurement error, confounding, and
and address issues of ethics, privacy, and fairness of quantitative models in
the social domain.
a clean, reusable code in Python.
a personal GitHub repository to keep track of your work and collaborate with
others, as well as use GitHub to submit your final Research Portfolio.