For this week's Tidy Tuesday challenge focused on unions, I opted to craft a dumbbell chart that contrasts the average wages of union members and non-union workers based on their educational background from 2013 to 2022.

Dumbbell charts are particularly effective when you want to emphasize changes or differences between two data points while keeping the visualization simple and easy to understand. They are commonly used in fields such as economics, finance, healthcare, and data analysis to illustrate trends, improvements, or comparisons.

Loading packages, data, fonts and difining colors

# Loading packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytuesdayR)
library(showtext)
## Lade nötiges Paket: sysfonts
## Lade nötiges Paket: showtextdb
library(glue)
library(ggtext)
library(ggchicklet)

# Loading and filtering data
wages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/wages.csv')
## Rows: 1247 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): facet
## dbl (8): year, sample_size, wage, at_cap, union_wage, nonunion_wage, union_w...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Filter the data
df <- wages %>%
  filter(facet == "demographics: college or more" & year > 2012)
df2 <- wages %>%
  filter(facet == "demographics: less than college" & year > 2012)

# Loading fonts and colors
font_add_google("Poppins", "poppins")
font_add('fa-reg', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Free-Regular-400.otf')
font_add('fa-brands', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Brands-Regular-400.otf')
font_add('fa-solid', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Free-Solid-900.otf')
showtext_auto()
bg <- "white"
col1 <- thematic::okabe_ito()[1]
col2 <- thematic::okabe_ito()[2]
col3 <- thematic::okabe_ito()[4]
col4 <- thematic::okabe_ito()[3]
grey1 <- "lightgrey"
grey2 <- "darkgrey"

Text generation

# text creation
twitter <- glue("<span style='color:{col4};font-family:fa-brands;'>&#xf099;</span>")
mastodon <- glue("<span style='color:{col4};font-family:fa-brands;'>&#xf4f6;</span>")
link <- glue("<span style='color:{col4};font-family:fa-solid;'>&#xf0c1;</span>")
data <- glue("<span style='color:{col4};font-family:fa-solid;'>&#xf1c0;</span>")
quote <- glue("<span style='color:{col4};font-family:fa-solid;'>&#xf10d;</span>")
space <- glue("<span style='color:{bg}'>-</span>")
space2 <- glue("<span style='color:{bg}'>--</span>") # can't believe I'm doing this
union <- glue("<span style='color:{col1}'><b>Union Members</b></span>")
nonunion <- glue("<span style='color:{col2}'><b>Non Union Workers</b></span>")
wage <- glue("<span style='color:{col3}'><b>overall wage</b></span>")
less <- glue("<span style='color:{grey2}'><b>less than college</b></span>")
more <- glue("<span style='color:{grey1}'><b>college or more</b></span>")

t <- glue("<b>Wages of {union} and {nonunion} by Educational Background<br>from 2013 to 2022</b>")
s <- glue("Educational Background: {less} | {more} ({wage})")
cap <- glue("{twitter}{space2}@web_design_fh{space2} 
	{space2}{mastodon}{space2}@frankhaenel @fosstodon.org{space2}
	{space2}{link}{space}{space2}www.frankhaenel.de<br>
	{data}{space2}Union{space}Membership,{space}Coverage,{space}and{space}Earnings{space}from{space}the{space}CPS{space}by{space}Barry{space}Hirsch{space}(Georgia{space}State{space}University),David{space}Macpherson{space}(Trinity{space}University),{space}and{space}William{space}Even{space}(Miami{space}University)<br>
	{quote}{space2}Macpherson,{space}David{space}A.{space}and{space}Hirsch,{space}Barry{space}T..{space}2023.{space}“{space}Five{space}decades{space}of{space}CPS{space}wages,{space}methods,{space}and{space}union-nonunion{space}wage{space}gaps{space}at{space}Unionstats.com.”<br>{space2}{space2}Industrial{space}Relations:{space}A{space}Journal{space}of{space}Economy{space}and{space}Society{space}00:{space}1–9.")

Plot

# Define bar_height
bar_height <- 0.2

# Create the plot
ggplot(data = df) +   ggchicklet:::geom_rrect(
     aes(
         xmin = union_wage,
         xmax = nonunion_wage,
         ymin = year - bar_height,
         ymax = year + bar_height,
         ),color=grey1,fill=grey1,
     # Use relative npc unit (values between 0 and 1)
     # This ensures that radius is not too large for your canvas
     r = unit(0, 'npc')
 ) +
 ggchicklet:::geom_rrect(data=df2,
     aes(
         xmin = union_wage,
         xmax = nonunion_wage,
         ymin = year - bar_height,
         ymax = year + bar_height,
     ),color= grey2,fill=grey2,
     # Use relative npc unit (values between 0 and 1)
     # This ensures that radius is not too large for your canvas
     r = unit(0, 'npc')
 ) +
 geom_point(data = df, aes(x = union_wage,y = year),color=col1,size = 6) +
 geom_point(data = df2, aes(x = union_wage,y = year),color=col1,size = 6) +
 geom_point(data = df, aes(x = nonunion_wage,y = year),color=col2,size = 6) +
 geom_point(data = df2, aes(x = nonunion_wage,y = year),color=col2,size = 6) +
 geom_point(data = df, aes(x = wage,y = year),color=col3,size = 3) +
 geom_point(data = df2, aes(x = wage,y = year),color=col3,size = 3) +
 labs(title = t, subtitle = s, caption = cap, x = "Mean hourly earnings in dollars", y = "Year") +
 theme_minimal() +
 theme(plot.margin = margin(10, 10, 10, 10),
 plot.title = element_markdown(size = 18, hjust = 0, lineheight = 1.3, family = "poppins"),
 plot.subtitle = element_markdown(size = 15, hjust = 0, lineheight = 1.3, family = "poppins"),
 plot.caption = element_markdown(size = 9, hjust = 0, lineheight = 1.3, color = grey2, family = "poppins"),
 axis.title = element_markdown(size = 8, color = grey2, family = "poppins"),
 axis.text = element_markdown(size = 8, color = grey2, family = "poppins")) +
 ylim(2012, 2023)
Comparison of union and non-union wages over the years, with a focus on demographics. The plot shows dumpbells representing wage ranges for both union and non-union workers, with points indicating mean hourly earnings. The data spans from 2013 to 2022.

R Code Documentation

Introduction

This document provides documentation for the R code used to create a data visualization plot.

Code Overview

The R code in question is used to create a data visualization plot that compares union and non-union wages over the years, with a focus on demographics. It uses the 'ggplot2' package for data visualization and 'showtext' for font handling.

Data Source

The code reads data from an external CSV file using the 'readr' package. The data source is a CSV file hosted on GitHub, containing wage-related information.

Code Components

  • Loading Libraries: The code starts by loading necessary R libraries, including 'tidyverse', 'tidytuesdayR', 'showtext', 'glue', and 'ggtext'.
  • Loading Fonts and Colors: Custom fonts and color variables are defined and loaded to be used in the plot.
  • Data Filtering: The code filters the data to create two data frames ('df' and 'df2') based on specific criteria.
  • Title and Subtitle Creation: Functions for generating the title and subtitle of the plot are defined, incorporating variables and fonts.
  • Plot Creation: The 'ggplot' function is used to create the plot, including bar-like structures representing wage ranges, points indicating mean hourly earnings, and various visual elements.
  • Styling: The code applies styles and themes to the plot, including font families, colors, and margins.

Output

The output of the code is a dumpbell chart that visually represents the comparison of union and non-union wages over the years for different demographics.