A series of lecture notes and workbooks from a short, master’s level course about data science in R. The content is based mostly on the excellent book R For Data Science. (For solutions to the chapter exercises check out Jeffrey Arnold’s book.)
Emphasis is placed on using code to build models. Less attention is given to tasks like data cleaning. Assumes no background in R programming.
All the lectures are delivered as R Notebooks. Each lecture has two notebooks: a student copy (e.g., logit_student.Rmd
) and an instructor copy (logit_completed.Rmd
).
The idea is that during lecture you work through the student copy as a class or in small groups. The notebooks have checkpoints in which students answer questions. Answers are given in the instructor copy.
Code chunks are named and each checkpoint is a third-level header (<h3>
or ###
) so you can use R Studio’s navigation panes to jump from one exercise or section to another.
The lectures use a few interesting data sets not found in the book, including:
The Covid-19, avocado prices, NHANES II and Boston property data sets are hosted on GitHub. The HMDA data set is imported by the AER
package.
Since model building is the main emphasis of this short course, the majority of the code exercises have something to do with stats/econometrics/modeling in general. For example:
The final exam is an R Markdown file with blank code chunks and custom-made writing chunks (so students know where to write their non-code answers).
Students submit their completed .Rmd files. I use a script to auto-render each file to HTML and log which files could or could not render. A log file is written for each submission that tells which code chunks failed to compile, if any. A summary report for all submissions is also generated.
You need the following packages installed to run the notebooks:
tidyverse
for all things tidy (link)patchwork
to combine ggplot
s (link)sjPlot
to visualize models (link)broom
for tidy model objects (link)AER
for some data sets (link)These notebooks are published with an MIT License. Feel free to use and modify them for your own needs. Let me know if you find any bugs by posting an issue on GitHub.
May 2021