Homework

Given the following graph:

title <- 'Fuel Economy of Popular Cars'
legend.title <- 'Type of Car'

ggplot(mpg, aes(displ, hwy, color = class)) + 
  geom_point()

Which are ways that you can change the legend titles? Select all that apply.

scale_color_discrete(name = legend.title)
theme(legend.title = element_text(legend.title))
theme(legend.text = element_text(title = legend.title))
labs(color = legend.title)

Which are ways that you can add a title to your graph? Select all that apply.

labs(title = title)
ggtitle(title)
annotate('text', label = title, x = min(mpg$displ) + 3.5, y = max(mpg$hwy), size = 4)
theme(plot.title = element_text(title))

Create a scatterplot exploring select breakfast cereals¹. What is the relationship between cereal ratings and grams of sugar?

Download the cereals dataset, and read into R. Metadata can be found here
Create a new column sugars_per_oz that calculates grams of sugars per ounce.
Create a scatterplot using geom_point() with sugars_per_oz on the x-axis and rating on the y-axis
Map the manufacturer of the cereal to color and cereal type to shape
Change the shapes to anything but the default ones
- To find out what shape options are available run the following

df_shapes <- data.frame(shape = 0:24)
ggplot(df_shapes, aes(0, 0, shape = shape)) +
  geom_point(aes(shape = shape), size = 5, fill = 'red') +
  scale_shape_identity() +
  facet_wrap(~shape) +
  theme_void()

Notice anything interesting with the sugar column in the dataset?

What is the range of the sugars per ounce column?
Which cereal(s) contains the greatest amount of sugars per ounce? Cereals with 0 grams? Less than 0?

On which shelf can you find the cereal with the highest rating?

Facet wrap by display shelf (1, 2, or 3, counting from the floor)
Rename x & y axis labels
Rename legend titles
Add a label using geom_text() for the cereal with the highest rating
Things might look a bit squished. Try reducing the size of the label and the legend labels

Create a bar graph of cities & towns with the greatest nominal growth between 2010 and 2020 like the one below…

Using ofm_april1_population_final_tidied.xlsx…

Subset for cities & towns and where year is 2010 and 2020
Cast the data using the dcast() function from the reshape2 package:
- reshape2::dcast(<your data frame>, County + Jurisdiction ~ paste0("Year_", Year_chr), value.var = "Estimate")
Create a new column calculating the difference between 2020 and 2010 estimates
Sort the data frame so that the difference column is in descending order
Take the top 10 rows (use head())
Plot where cities/towns are on the x-axis and the y-axis displays total population estimate
- Add thousands separator to the Y axis column
- Add a title and source caption
- Angle the cities/town axis labels and change the appearance of other text
- Change the axis titles
- Fill the bars to reflect which county the cities and towns are in
What order are the cities/town in? Can you reorder the cities/towns based on the value of the difference column?
- convert the Jurisdiction column to a factor
- use reorder()

Data Source: https://www.kaggle.com/crawford/80-cereals, gathered and cleaned up by Petra Isenberg, Pierre Dragicevic and Yvonne Jansen. Original source can be found here ↩︎