TidyTuesdays' data for this week originates from the CRAN collaboration graph, a project led by David Schoch.

The CRAN collaboration graph comprises R package developers who are linked if they share authorship of an R package as indicated in the DESCRIPTION file.

The 'Hadley number' represents the measure of distance between R developers and Hadley Wickham in the collaboration graph.

My goal is to craft a bar graph that illustrates the top 20 authors based on the number of R packages they've contributed, as well as their respective distances from Hadley Wickham within the graph.

The Code

# Load necessary libraries
library(tidyverse)            # For data manipulation and visualization
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytuesdayR)        # For accessing TidyTuesday datasets
library(showtext)            # For working with fonts
## Lade nötiges Paket: sysfonts
## Lade nötiges Paket: showtextdb
library(glue)                # For text formatting
library(ggtext)              # For enhanced text formatting in ggplot2

# Load Data
# Data is loaded from the 'tidytuesdayR' package for a specific week (2023, week 38).
tuesdata <- tidytuesdayR::tt_load(2023, week = 38)
## --- Compiling #TidyTuesday Information for 2023-09-19 ----
## --- There are 4 files available ---
## --- Starting Download ---
## 
## 	Downloading file 1 of 4: `cran_20230905.csv`
## 	Downloading file 2 of 4: `package_authors.csv`
## 	Downloading file 3 of 4: `cran_graph_nodes.csv`
## 	Downloading file 4 of 4: `cran_graph_edges.csv`
## --- Download complete ---
package_authors <- tuesdata$package_authors
cran_graph_nodes <- tuesdata$cran_graph_nodes

# Load Fonts and Define Colors
# Fonts are loaded, and colors are defined for text and symbols in visualizations.
font_add_google("Raleway", "raleway")
font_add('fa-reg', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Free-Regular-400.otf')
font_add('fa-brands', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Brands-Regular-400.otf')
font_add('fa-solid', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Free-Solid-900.otf')
showtext_auto()
bg <- "lightgrey"
col1 <- "#57375D" # for text
col2 <- "#FF3FA4"
col3 <- "#FF9B82"
col4 <- "#FFC8C8"

# Define Symbols
# Symbols are defined using HTML-style code with appropriate colors and fonts.
rproject <- glue("<span style='font-family:fa-brands;'>&#xf4f7;</span>")
twitter <- glue("<span style='color:{col2};font-family:fa-brands;'>&#xf099;</span>")
mastodon <- glue("<span style='color:{col2};font-family:fa-brands;'>&#xf4f6;</span>")
link <- glue("<span style='color:{col2};font-family:fa-solid;'>&#xf0c1;</span>")
data <- glue("<span style='color:{col2};font-family:fa-solid;'>&#xf1c0;</span>")
space <- glue("<span style='color:{bg}'>-</span>")
space2 <- glue("<span style='color:{bg}'>--</span>") # This creates horizontal lines for formatting.

# Define Title
# A formatted title for the analysis is defined using glue.
t <- glue("<b>Discovering the Key Players in the {rproject} Package Development Community</b>")
s <- glue("The top twenty authors with the highest number of {rproject} packages<br>and their distance to Hadley Wickham in the CRAN collaboration graph.")
ytitle <- glue("Number of {rproject} packages")

# Define Caption
# A formatted caption is defined, including social media icons and links.
cap <- glue("{twitter}{space2}@web_design_fh{space2} 
    {space2}{mastodon}{space2}@frankhaenel @fosstodon.org{space2}
    {space2}{link}{space}{space2}www.frankhaenel.de<br>
    {data}{space2}Schochastics (n.d.) GitHub - schochastics/CRAN_collaboration: <i>Analysing the<br>{space2}{space2}collaboration graph of R package developers on CRAN</i> [Internet].")

# create plot
package_authors %>%
    count(authorsR) %>%
    arrange(desc(n)) %>%
    slice(1:20) %>%
    left_join(cran_graph_nodes %>%
    select(name, dist2HW), by = c("authorsR" = "name")) %>%
    mutate(hwz = ifelse(is.na(dist2HW), hwz, dist2HW)) %>%
    select(-dist2HW) %>%
    ggplot(aes(x=fct_reorder(authorsR, n),y=n,fill=as.factor(hwz))) +
        geom_col() +
        scale_fill_manual(values=c(col2,col3,col4), name = "Hadley-<br>Number") +
        labs(title = t, subtitle = s,caption = cap) +
        ylab(ytitle) +
        theme(
            plot.margin = margin(10, 10, 10, 20),
            panel.background = element_rect(fill=bg, colour = bg),
            plot.background = element_rect(fill=bg),
            plot.title = element_markdown(size = 16, hjust = 0.5, lineheight = 1.3, family = "raleway", color = col1),
            plot.subtitle = element_markdown(size = 12, hjust = 0.5, lineheight = 1.3, family = "raleway", color = col1),
            plot.caption = element_markdown(size = 10, hjust = 0, lineheight = 1.3, family = "raleway", color = col1),
            legend.background = element_rect(fill=bg, colour = bg),
            legend.title = element_markdown(size = 12, hjust = 0, lineheight = 1.3, family = "raleway", color = col1),
            legend.text = element_markdown(size = 10, hjust = 0, lineheight = 1.3, family = "raleway", color = col1),
            axis.title.x = element_blank(),
            axis.text.x = element_markdown(angle=45, vjust=1, hjust=1, size = 10, color = col1, family = "raleway"),
            axis.text.y = element_markdown(size = 10, color = col1, family = "raleway"),
            axis.title.y = element_markdown(size = 10, color = col1, family = "raleway"),
            )
A bar graph visualization displaying the top twenty authors in the R package development community. The graph illustrates the number of R packages contributed by each author and their respective 'Hadley Number,' which measures their distance to Hadley Wickham in the CRAN collaboration graph. The bars are color-coded to represent different 'Hadley Numbers.' The title reads 'Discovering the Key Players in the R Project Package Development Community,' and additional information is provided in the subtitle and caption.

R Code Documentation

Overview

This R code is designed to generate an HTML document containing a data visualization plot. The plot explores the top twenty authors with the highest number of R packages in the CRAN (Comprehensive R Archive Network) ecosystem and their distance to Hadley Wickham in the CRAN collaboration graph. The code incorporates various libraries for data manipulation, fonts, and text formatting to create an informative and visually appealing output.

Libraries

The following R libraries are used in this code:

  • tidyverse: Used for data manipulation and visualization.
  • tidytuesdayR: Enables access to TidyTuesday datasets.
  • showtext: Facilitates working with fonts.
  • glue: Supports text formatting.
  • ggtext: Enhances text formatting capabilities in ggplot2.

Data Loading

The code loads data from the 'tidytuesdayR' package, specifically for the week 38 of the year 2023. Two dataframes, 'package_authors' and 'cran_graph_nodes,' are created to store the loaded data.

Font and Color Configuration

Custom fonts are loaded using the 'showtext_auto()' function, and colors are defined for text and symbols in visualizations.

Symbols and Text Styling

Various symbols, including custom fonts and colors, are defined using HTML-style code. Symbols for 'R Project,' Twitter, Mastodon, links, data, and quotes are created for use in the graphical output.

Title, Subtitle, and Caption

  • Define Title and Subtitle: A formatted title and subtitle are defined using the 'glue' package. The title highlights the analysis theme, and the subtitle provides additional context about the top authors and their packages.
  • Define Caption: A formatted caption is defined using 'glue', incorporating social media icons and links. The caption provides additional information about the data source and references.

Plot Generation

  • Create Plot: Data manipulation is performed to analyze and prepare the data for visualization. A bar plot is created using ggplot2 to display the number of R packages for the top authors. The colors of the bars are determined by their distance to Hadley Wickham in the CRAN collaboration graph.
  • Customize Plot: Various plot settings are customized, including titles, labels, colors, fonts, and styling.

Output

The primary output of this code is a png-document that includes the data visualization plot along with titles, captions, and styling elements.

References