TidyTuesdays' data for this week originates from the CRAN collaboration graph, a project led by David Schoch.
The CRAN collaboration graph comprises R package developers who are linked if they share authorship of an R package as indicated in the DESCRIPTION file.
The 'Hadley number' represents the measure of distance between R developers and Hadley Wickham in the collaboration graph.
My goal is to craft a bar graph that illustrates the top 20 authors based on the number of R packages they've contributed, as well as their respective distances from Hadley Wickham within the graph.
The Code
# Load necessary libraries library(tidyverse) # For data manipulation and visualization
library(tidytuesdayR) # For accessing TidyTuesday datasets library(showtext) # For working with fonts
library(glue) # For text formatting library(ggtext) # For enhanced text formatting in ggplot2 # Load Data # Data is loaded from the 'tidytuesdayR' package for a specific week (2023, week 38). tuesdata <- tidytuesdayR::tt_load(2023, week = 38)
## ## Downloading file 1 of 4: `cran_20230905.csv` ## Downloading file 2 of 4: `package_authors.csv` ## Downloading file 3 of 4: `cran_graph_nodes.csv` ## Downloading file 4 of 4: `cran_graph_edges.csv`
package_authors <- tuesdata$package_authors cran_graph_nodes <- tuesdata$cran_graph_nodes # Load Fonts and Define Colors # Fonts are loaded, and colors are defined for text and symbols in visualizations. font_add_google("Raleway", "raleway") font_add('fa-reg', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Free-Regular-400.otf') font_add('fa-brands', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Brands-Regular-400.otf') font_add('fa-solid', 'c:/Users/info/OneDrive/Dokumente/fonts/Font Awesome 6 Free-Solid-900.otf') showtext_auto() bg <- "lightgrey" col1 <- "#57375D" # for text col2 <- "#FF3FA4" col3 <- "#FF9B82" col4 <- "#FFC8C8" # Define Symbols # Symbols are defined using HTML-style code with appropriate colors and fonts. rproject <- glue("<span style='font-family:fa-brands;'></span>") twitter <- glue("<span style='color:{col2};font-family:fa-brands;'></span>") mastodon <- glue("<span style='color:{col2};font-family:fa-brands;'></span>") link <- glue("<span style='color:{col2};font-family:fa-solid;'></span>") data <- glue("<span style='color:{col2};font-family:fa-solid;'></span>") space <- glue("<span style='color:{bg}'>-</span>") space2 <- glue("<span style='color:{bg}'>--</span>") # This creates horizontal lines for formatting. # Define Title # A formatted title for the analysis is defined using glue. t <- glue("<b>Discovering the Key Players in the {rproject} Package Development Community</b>") s <- glue("The top twenty authors with the highest number of {rproject} packages<br>and their distance to Hadley Wickham in the CRAN collaboration graph.") ytitle <- glue("Number of {rproject} packages") # Define Caption # A formatted caption is defined, including social media icons and links. cap <- glue("{twitter}{space2}@web_design_fh{space2} {space2}{mastodon}{space2}@frankhaenel @fosstodon.org{space2} {space2}{link}{space}{space2}www.frankhaenel.de<br> {data}{space2}Schochastics (n.d.) GitHub - schochastics/CRAN_collaboration: <i>Analysing the<br>{space2}{space2}collaboration graph of R package developers on CRAN</i> [Internet].") # create plot package_authors %>% count(authorsR) %>% arrange(desc(n)) %>% slice(1:20) %>% left_join(cran_graph_nodes %>% select(name, dist2HW), by = c("authorsR" = "name")) %>% mutate(hwz = ifelse(is.na(dist2HW), hwz, dist2HW)) %>% select(-dist2HW) %>% ggplot(aes(x=fct_reorder(authorsR, n),y=n,fill=as.factor(hwz))) + geom_col() + scale_fill_manual(values=c(col2,col3,col4), name = "Hadley-<br>Number") + labs(title = t, subtitle = s,caption = cap) + ylab(ytitle) + theme( plot.margin = margin(10, 10, 10, 20), panel.background = element_rect(fill=bg, colour = bg), plot.background = element_rect(fill=bg), plot.title = element_markdown(size = 16, hjust = 0.5, lineheight = 1.3, family = "raleway", color = col1), plot.subtitle = element_markdown(size = 12, hjust = 0.5, lineheight = 1.3, family = "raleway", color = col1), plot.caption = element_markdown(size = 10, hjust = 0, lineheight = 1.3, family = "raleway", color = col1), legend.background = element_rect(fill=bg, colour = bg), legend.title = element_markdown(size = 12, hjust = 0, lineheight = 1.3, family = "raleway", color = col1), legend.text = element_markdown(size = 10, hjust = 0, lineheight = 1.3, family = "raleway", color = col1), axis.title.x = element_blank(), axis.text.x = element_markdown(angle=45, vjust=1, hjust=1, size = 10, color = col1, family = "raleway"), axis.text.y = element_markdown(size = 10, color = col1, family = "raleway"), axis.title.y = element_markdown(size = 10, color = col1, family = "raleway"), )
R Code Documentation
Overview
This R code is designed to generate an HTML document containing a data visualization plot. The plot explores the top twenty authors with the highest number of R packages in the CRAN (Comprehensive R Archive Network) ecosystem and their distance to Hadley Wickham in the CRAN collaboration graph. The code incorporates various libraries for data manipulation, fonts, and text formatting to create an informative and visually appealing output.
Libraries
The following R libraries are used in this code:
- tidyverse: Used for data manipulation and visualization.
- tidytuesdayR: Enables access to TidyTuesday datasets.
- showtext: Facilitates working with fonts.
- glue: Supports text formatting.
- ggtext: Enhances text formatting capabilities in ggplot2.
Data Loading
The code loads data from the 'tidytuesdayR' package, specifically for the week 38 of the year 2023. Two dataframes, 'package_authors' and 'cran_graph_nodes,' are created to store the loaded data.
Font and Color Configuration
Custom fonts are loaded using the 'showtext_auto()' function, and colors are defined for text and symbols in visualizations.
Symbols and Text Styling
Various symbols, including custom fonts and colors, are defined using HTML-style code. Symbols for 'R Project,' Twitter, Mastodon, links, data, and quotes are created for use in the graphical output.
Title, Subtitle, and Caption
- Define Title and Subtitle: A formatted title and subtitle are defined using the 'glue' package. The title highlights the analysis theme, and the subtitle provides additional context about the top authors and their packages.
- Define Caption: A formatted caption is defined using 'glue', incorporating social media icons and links. The caption provides additional information about the data source and references.
Plot Generation
- Create Plot: Data manipulation is performed to analyze and prepare the data for visualization. A bar plot is created using ggplot2 to display the number of R packages for the top authors. The colors of the bars are determined by their distance to Hadley Wickham in the CRAN collaboration graph.
- Customize Plot: Various plot settings are customized, including titles, labels, colors, fonts, and styling.
Output
The primary output of this code is a png-document that includes the data visualization plot along with titles, captions, and styling elements.
References
- Schochastics (n.d.) GitHub - schochastics/CRAN_collaboration: Analysing the collaboration graph of R package developers on CRAN [Internet]. Available from https://github.com/schochastics/CRAN_collaboration.