R For Data Science

@lrdegeest

Logo

About

A series of lecture notes and workbooks from a short, master’s level course about data science in R. The content is based mostly on the excellent book R For Data Science. (For solutions to the chapter exercises check out Jeffrey Arnold’s book.)

Emphasis is placed on using code to build models. Less attention is given to tasks like data cleaning. Assumes no background in R programming.

Notebooks

All the lectures are delivered as R Notebooks. Each lecture has two notebooks: a student copy (e.g., logit_student.Rmd) and an instructor copy (logit_completed.Rmd).

The idea is that during lecture you work through the student copy as a class or in small groups. The notebooks have checkpoints in which students answer questions. Answers are given in the instructor copy.

Code chunks are named and each checkpoint is a third-level header (<h3> or ###) so you can use R Studio’s navigation panes to jump from one exercise or section to another.

Completed notebooks

  1. Why code?
  2. Introduction to R
  3. Exploratory Data Analysis
  4. Tibbles and Tidying
  5. Relational Data
  6. Functions
  7. Iteration
  8. Linear Models
  9. Nonlinear Models
  10. Memo on visualizing models
  11. Final exam

Highlights

Data

The lectures use a few interesting data sets not found in the book, including:

The Covid-19, avocado prices, NHANES II and Boston property data sets are hosted on GitHub. The HMDA data set is imported by the AER package.

Modeling

Since model building is the main emphasis of this short course, the majority of the code exercises have something to do with stats/econometrics/modeling in general. For example:

Exams

The final exam is an R Markdown file with blank code chunks and custom-made writing chunks (so students know where to write their non-code answers).

Students submit their completed .Rmd files. I use a script to auto-render each file to HTML and log which files could or could not render. A log file is written for each submission that tells which code chunks failed to compile, if any. A summary report for all submissions is also generated.

Dependencies

You need the following packages installed to run the notebooks:

Use

These notebooks are published with an MIT License. Feel free to use and modify them for your own needs. Let me know if you find any bugs by posting an issue on GitHub.

Last updated

May 2021