The module contains an early introduction to the basics of probability and random variables. Major topics are distribution theory, estimation and Maximum Likelihood estimators, hypothesis testing, basic linear regression and multiple linear regression implemented in R. The module concludes with a review of Generalised Linear Models and the use of deviance and Wald statistics using examples in R.

Learning Outcomes

On completion of the module students should be able to:

1. Use the principles of probability as a function over sets of events and manipulate conditional probabilities via Bayes theorem. [(ii)]

2. Manipulate Random variables, PDFs, distribution functions and Expectation especially w.r.t. the moments of a PDF, Mean and variance. [(iii)]

3. Define and be able to apply appropriately the discrete distributions: binomial, Poisson and uniform and be capable of working with the continuous distributions: normal, exponential, gamma, chi-square, t, F and uniform. [(v, 1-2)].

4. Use the one-to-one correspondence between an mgf and a pdf for sums of RVs. [(iv); (ix),1; (xiv),3)]

5. Perform operations with bivariate distributions,

joint, marginal , conditional distributions and freely use independence arguments . [(vi)]

6. Apply the central limit theorem [(vii) ]

7. Determine maximum likelihood and least squares estimates of unknown

parameters. Be able to define the terms: efficiency, bias and mean squared error [(ix) 1-5]

8. Determine confidence intervals for means, variances and differences

between means. [(x)]

9. To work with the concepts of random sampling, statistical inference and sampling distribution, Hypothesis tests. Null and alternative hypotheses, type I and type II errors, test statistic, critical region, level of significance, probability-value and power of a test. Use tables of the t-, F-, and chi-squared distributions. [(viii) and (xi), 1]

10. Investigate linear relationships between variables using regression analysis. Use the correlation coefficient for bivariate data and the coefficient of determination. Explain what is meant by response and explanatory variables. Derive and calculate the least squares estimates of the slope and intercept parameters in a simple linear regression model. Perform multiple linear regression using R and interpret output. [(xii), 1 to 10 ]

11. Justify and use a Generalised Linear Model including inference arguments using deviance and wald statistics from R output. [RSS level 7 standards developing from level 6 standards]

12. Use R [R Core Team (2017), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/ ] for the data analysis examples of the module.

Syllabus

Probability

Sample space for an experiment, an Event and definition of probability as a set function on a collection of events. The basic properties satisfied by probability.Conditional probability and Bayes' Theorem for events. Definition of independence for two events.Random variables, means, variances and moments.

Distribution theory

Standard distributions and their use in modelling, including Bernoulli, binomial, Poisson, discrete uniform, Normal, exponential, gamma, continuous

,uniform and multivariate Normal. Expectation, variance and generating functions. Sums of IID random variables, weak law of large numbers, central limit theorem.

Joint, marginal and conditional distributions. Independence. Covariance and correlation.

Moment generating functions to find moments of the PDF and distributions of sums of random variables. .

Estimation

Sampling distributions. Bias in estimators, efficiency, bias and mean squared error. Maximum likelihood estimation and finding estimators analytically. The mean and variance of a sample mean. The distribution of the t-statistic for random samples from a normal distribution. The F distribution for the ratio of two sample variances from independent samples taken from normal distributions. Chi Square distributions for the sum of squared standard normal variates

Hypothesis testing and Confidence intervals

Confidence intervals for means, variances and differences between means. Hypothesis tests concerning means and variances. Null and alternative hypotheses, type I and type II errors, test statistic, critical region, level of significance, probability-value and power of a test. Use tables of the t-, F-, and chi-squared distributions.

Linear models

linear relationships between variables using regression analysis. The correlation coefficient for bivariate data and the coefficient of determination. Response and explanatory variables and the least squares estimates of the slope and intercept parameters in a simple linear regression model.. Multiple linear regression with IID normal errors implemented in R.

Generalised Linear Models

Definition of a Generalised Linear Model. Inference arguments using deviance and wald statistics using data examples within R.

- Module Supervisor: Hongsheng Dai
- Module Supervisor: Berthold Lausen