Given the following graph:

title <- 'Fuel Economy of Popular Cars'
legend.title <- 'Type of Car'

ggplot(mpg, aes(displ, hwy, color = class)) + 
  geom_point()

Which are ways that you can change the legend titles? Select all that apply.

  1. scale_color_discrete(name = legend.title)
  2. theme(legend.title = element_text(legend.title))
  3. theme(legend.text = element_text(title = legend.title))
  4. labs(color = legend.title)

Which are ways that you can add a title to your graph? Select all that apply.

  1. labs(title = title)
  2. ggtitle(title)
  3. annotate('text', label = title, x = min(mpg$displ) + 3.5, y = max(mpg$hwy), size = 4)
  4. theme(plot.title = element_text(title))

Create a scatterplot exploring select breakfast cereals1. What is the relationship between cereal ratings and grams of sugar?

  1. Download the cereals dataset, and read into R. Metadata can be found here
  2. Create a new column sugars_per_oz that calculates grams of sugars per ounce.
  3. Create a scatterplot using geom_point() with sugars_per_oz on the x-axis and rating on the y-axis
  4. Map the manufacturer of the cereal to color and cereal type to shape
  5. Change the shapes to anything but the default ones
    • To find out what shape options are available run the following
df_shapes <- data.frame(shape = 0:24)
ggplot(df_shapes, aes(0, 0, shape = shape)) +
  geom_point(aes(shape = shape), size = 5, fill = 'red') +
  scale_shape_identity() +
  facet_wrap(~shape) +
  theme_void()

Notice anything interesting with the sugar column in the dataset?

  1. What is the range of the sugars per ounce column?
  2. Which cereal(s) contains the greatest amount of sugars per ounce? Cereals with 0 grams? Less than 0?

On which shelf can you find the cereal with the highest rating?

  1. Facet wrap by display shelf (1, 2, or 3, counting from the floor)
  2. Rename x & y axis labels
  3. Rename legend titles
  4. Add a label using geom_text() for the cereal with the highest rating
  5. Things might look a bit squished. Try reducing the size of the label and the legend labels

Create a bar graph of cities & towns with the greatest nominal growth between 2010 and 2020 like the one below…

Using ofm_april1_population_final_tidied.xlsx

  1. Subset for cities & towns and where year is 2010 and 2020
  2. Cast the data using the dcast() function from the reshape2 package:
    • reshape2::dcast(<your data frame>, County + Jurisdiction ~ paste0("Year_", Year_chr), value.var = "Estimate")
  3. Create a new column calculating the difference between 2020 and 2010 estimates
  4. Sort the data frame so that the difference column is in descending order
  5. Take the top 10 rows (use head())
  6. Plot where cities/towns are on the x-axis and the y-axis displays total population estimate
    • Add thousands separator to the Y axis column
    • Add a title and source caption
    • Angle the cities/town axis labels and change the appearance of other text
    • Change the axis titles
    • Fill the bars to reflect which county the cities and towns are in
  7. What order are the cities/town in? Can you reorder the cities/towns based on the value of the difference column?
    • convert the Jurisdiction column to a factor
    • use reorder()

  1. Data Source: https://www.kaggle.com/crawford/80-cereals, gathered and cleaned up by Petra Isenberg, Pierre Dragicevic and Yvonne Jansen. Original source can be found here↩︎