Who is this course for?

High quality datasets are essential for statistical analysis. This course is for researchers that need to use the open source R programming language to prepare their data for statistical analysis. The course uses R Studio to illustrate the various techniques that can be used to prepare a data set that is analysable by statistical methods.

Attendees are expected to be familiar with the data collection process using an electronic tool, such as MS Excel.

Course description

The course is a four-hour practical session and will use the open source R Studio software. It will focus on the R data frame and its manipulation. Items that will be covered are:

  • Factor data types
  • Variable labelling, renaming and dropping
  • Control structures useful for data cleaning.

 Learning Objectives

  • Reading MS Excel files into R
  • Creating categorical variables for analysis
  • Recoding free text descriptions into numbers
  • Merging different datasets horizontally and vertically
  • Changing the structure of a dataset for different statistical methods.

About the trainer

Bola Coker is the Senior Data Manager for the NIHR Guy’s and St Thomas’ Biomedical Research Centre (BRC). He is based in the King’s College London Department of Population Health Sciences Unit of Medical Statistics and is involved with statistical consultancy and teaching. He runs a team that provides services for the BRC in the areas of databases development, statistical data management and statistical programming.