Three Frameworks for Cleaning Data

  1. Base R - R comes with extensive functions for data manipulation. These functions are not always intuitive, and require practice to use well.
  2. Tidyverse - The tidyverse was created to make many aspects of using R more intuitive. But learning the tidyverse to early may lead to a lack of understanding of base R, which is essential to mastery of the language.
  3. data.table package - This package is excellent for large data file. It is by far the fastest, and envied by other programming languages. However, it follows its own syntax, which is very different from the other to frameworks I mentioned.

Data Cleaning Resources

Online Books