Who is the course for?

This is an introductory course for anyone interested in correlation and linear regression. The course will be useful to researchers who need to use correlation and linear regression analysis for data analysis, and for readers of the medical literature who need a better understanding of its meaning. Attendees will also get insights into the working principles underlying all regression techniques. The use of specific statistical packages will not be covered.

Attendees should attend the introductory sessions on statistics, or have an equivalent level of knowledge, before attending this course.

Course description

Correlation is a frequently used and reported measure of the linear relationship between two numerical variables. For example, is body weight associated with plasma volume? Apart from estimating the best-fitting straight line to describe an association between two variables, correlation also describes the strength of the linear association between these variables.

Linear regression is used to estimate the best-fitting straight line to describe the association between a numerical exposure and a numerical outcome. Simple linear regression tests the effect of one variable on an outcome, while multiple linear regression models the effect of more than one exposure on an outcome.

The course will cover the following elements:

  • Explain the concept of association between two variables
  • Use scatter diagrams to sketch the main forms of relationship: independence, positive/negative linear association, non-linear association
  • Describe and interpret simple linear regression
  • Explain the term extrapolation and describe why it can give misleading predictions
  • Describe the purpose of correlation coefficients
  • Compare ‘Pearson’ and ‘Spearman rank’ correlation, stating the situations in which they might be used
  • Compare the type of information given by correlation with that given by regression
  • Outline a graphical method of illustrating agreement between two continuous variables.

Learning objectives

By the end of the session students should be able to:

  • Display the relationship between two quantitative variables in a scatter plot;
  • Measure the degree of association using the Pearson’s or Spearman’s correlation coefficient;
  • Describe the methods used to obtain regression lines;
  • Obtain regression lines describing the relationship(s) between quantitative variables;
  • Obtain confidence intervals for regression coefficients;
  • Test the null hypothesis of no relationship;
  • Explain when it is appropriate to use correlation and linear regression.

About the trainer

Dr Yanzhong Wang is a Senior Lecturer in Medical Statistics, School of Population Health and Environmental Sciences, King’s College London. He also leads statistics consultancy service at King’s College Hospital, Guy’s and St Thomas’ Hospital and NIHR Guy’s and St Thomas’ Biomedical Research Centre.