Grouping and Summarizing

Aggregating data

Usually when working with data, we want to aggregate it to extract meaning. The most common ways to aggregate are to count, sum, min or max.

# Let's count the records in our data.

rest_df_count <- rest_df %>% count()

# Now let's count the number of observations for each restaurant.

rest_df_count2 <- rest_df %>% count(NAME)

rest_df %>% count(NAME,sort=TRUE)

Summarizing and Grouping

Many times we need to sum data columns. We also want to group things into categories.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

# Let's sum the inspection scores.

rest_df %>% summarize(total_points=(sum(SCORE_INSPECTION, na.rm=TRUE)))
#Frequently we also want to group data and summarize it:
rest_df %>% group_by(NAME)%>%summarize(rest_points=(mean(SCORE_INSPECTION, na.rm=TRUE)))

Grouping and Summarizing

Suzanne Childress

1/6/2021

Aggregating data

Summarizing and Grouping