This module will develop students’ understanding of quantitative analysis and impart the practical skills necessary for carrying out advanced statistical analysis of social data using modern statistical software and programming.

 The first term of the module is focused on statistical models and begins with simple OLS regression and provides a framework for modelling strategy and variable selection. Students are then taken through extensions to the basic OLS model, with categorical predictors, interactions and non-linear terms. Next, we introduce models for categorical outcomes: binary logistic and multinomial logit. The term concludes with a discussion of practical topics in survey data analysis – how to deal with complex sample designs, weighting and non-response adjustments. The modelling framework outlined in this term builds the foundations for advanced quantitative social science methods.

 The second term of the module introduces students to the data science concepts, techniques, and skills necessary to perform reproducible data analysis of variety of quantitative social data. Students will engage in hands-on reproducible data analysis workflow using open source computational tools, including the Python programming language, JupyterLab (and Jupyter Notebook), GitHub, and Markdown. Prior knowledge of programming is not required and students that experience difficulties in installing software will have the opportunity to access it online from their laptops, tablets, or smartphones via JupyterHub. The students will learn, in an accessible way, basic models for machine learning, causal inference, and network analysis as well as practical data science skills, including data wrangling and visualization of various big data sources. The content is organized around three fundamental data science tasks—description (and exploratory data analysis), prediction, and causal inference (which includes experimental design). Attention is given to model evaluation and problems of selection bias, measurement error, confounding, and overfitting. Throughout the course are discussed issues of ethics, privacy, and fairness of quantitative models in social sciences.

 

Case studies from social sciences (e.g., decision making in criminal justice, censorship and collective action, social networks and public health) will be used throughout the course to provide synergy between sociological issues and data analysis. The students will engage with data-driven exercises, which they will consolidate in research portfolios demonstrating their data science accomplishments and employability skills. 



Transferable skills and learning outcomes

 By the end of the module, you will be able to:

·         Perform, critically interpret, and communicate results from analysis using OLS regression, including models with categorical predictors, interactions, and non-linear terms.

·         Perform, critically interpret, and communicate results from analysis using logistic and multinomial logit models.

·         Deal with practical issues of data analysis, including complex sample designs, weighting and non-response adjustments.

·         Freely and flexibly use computational tools—Python, Jupyter, Markdown, GitHub, and shell—to perform reproducible data analysis and communicate your results.

·         Wrangle, explore, visualize, and model your dataset using various Python libraries.

·         Build an open and reproducible research workflow ranging from raw data to research report.

·         Perform, critically interpret, and communicate results from analysis using basic models for machine learning, causal inference, and network analysis.

·         Identify and deal with issues of selection bias, measurement error, confounding, and overfitting.

·         Articulate and address issues of ethics, privacy, and fairness of quantitative models in the social domain.

·         Write a clean, reusable code in Python.

·         Create a personal GitHub repository to keep track of your work and collaborate with others, as well as use GitHub to submit your final Research Portfolio.