Restaurant Data

Let’s get read in some fun data to play round with.

Polina found this data from Seattle open data portal about Restaurant Inspections.

I moved the data locally so it would be faster to read in. The file is about 50 MB. I like to put file locations at the top of the code so people don’t have to hunt to find them when reusing your code. Please move your file some place on your local drive and practice changing the path to it.

Let’s read in the csv now, and check the dimensions.

# We will need these libraries in the class.
library(dplyr)
library(stringr)
library(tidyr)
library(ggplot)

restaurant_file<-'C:/Users/SChildress/Documents/GitHub/intro-dplyr/data/king_county_restaurant_inspections.csv'
rest_df<-read.csv(restaurant_file, fileEncoding = 'UTF-8-BOM')
# find the dimensions of the data frame
dim(rest_df)
#look at the first five rows
head(rest_df)
#what are the column names
colnames(rest_df)

Each row in the data represents a note about a particular restaurant on a particular date. There may be multiple notes from the same restaurant on the same date. The data has 267,167 notes, of 15 variables.

Tidy data is data in which:

Every column is variable. Every row is an observation. Every cell is a single value.

Is this data tidy? How would you shape or aggregate this data?

Let’s look at some examples of non-tidy data and discusses pivot_wider and pivot_longer: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html