Research is Becoming More Connected

Crossref metadata adoption reveals that emerging regions are modernizing fastest — even as established systems still lead in absolute ORCID author coverage, 2025–2026

TidyTuesday

Data Visualization

R Programming

2026

A two-panel chart showing that while established publishing regions still lead in absolute ORCID author coverage, emerging regions — particularly Sub-Saharan Africa and South Asia — are modernizing fastest, with indexed ORCID growth of 32% and 23% respectively between January 2025 and April 2026. Coverage rates are normalized by total journal article DOIs to enable fair cross-regional comparison, with indexed growth computed from a January 2025 baseline. Built with R, ggplot2, and patchwork.

Author

Steven Ponce

Published

May 18, 2026

Figure 1: A two-panel chart titled “Research is Becoming More Connected.” The left panel, “Emerging Regions Still Have Lower ORCID Coverage,” shows a horizontal bar chart of seven world regions ranked by share of journal articles with ORCID author identifiers as of April 2026. Latin America & Caribbean leads at 25.4%, followed by Europe & Central Asia at 15.8% and North America at 14.0%. A visual gap separates these from the highlighted emerging regions: Sub-Saharan Africa at 9.6% and South Asia at 3.9%, shown in magenta and yellow, respectively. The right panel, “But Adoption Is Rising Faster,” shows indexed ORCID coverage from January 2025 to April 2026, with January 2025 set to 100. Sub-Saharan Africa reaches an index of 132 and South Asia reaches 123 by April 2026, both rising steeply against near-flat gray comparator lines. Data source: Crossref, Bédard-Vallée (2026). Journal articles only.

Steps to Create this Graphic

1. Load Packages & Setup

Show code

```{r}
#| label: load
#| warning: false
#| message: false      
#| results: "hide"     

## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
    tidyverse, ggtext, showtext, janitor, ggrepel,      
    scales, glue, skimr, patchwork, ggrepel
    )
})

### |- figure size ----
camcorder::gg_record(
  dir    = here::here("temp_plots"),
  device = "png",
  width  = 14,
  height = 7,
  units  = "in",
  dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))
```

2. Read in the Data

Show code

```{r}
#| label: read
#| include: true
#| eval: true
#| warning: false

tt <- tidytuesdayR::tt_load(2026, week = 20)

member_participation_stats_by_country <- tt$member_participation_stats_by_country |> clean_names()
metadata_coverage_stats_by_country <- tt$metadata_coverage_stats_by_country |> clean_names()

rm(tt)
```

3. Examine the Data

Show code

```{r}
#| label: examine
#| include: true
#| eval: true
#| results: 'hide'
#| warning: false

glimpse(member_participation_stats_by_country)
glimpse(metadata_coverage_stats_by_country)
```

4. Tidy Data

Show code

```{r}
#| label: tidy
#| warning: false

### |- Region lookup ----
region_labels <- c(
  "EAS" = "East Asia & Pacific",
  "ECS" = "Europe & Central Asia",
  "LCN" = "Latin America & Caribbean",
  "MEA" = "Middle East & North Africa",
  "NAC" = "North America",
  "SAS" = "South Asia",
  "SSF" = "Sub-Saharan Africa"
)

### |- Panel A: Current ORCID coverage by region (latest snapshot, journal articles) ----
panel_a_data <- metadata_coverage_stats_by_country |>
  filter(
    document_type == "journal-article",
    current_up_to == max(current_up_to)
  ) |>
  mutate(region_label = region_labels[region_id]) |>
  filter(!is.na(region_label)) |>
  summarise(
    with_orcid_for_authors = sum(with_orcid_for_authors),
    n_dois = sum(n_dois),
    .by = c(region_id, region_label)
  ) |>
  mutate(
    orcid_rate = with_orcid_for_authors / n_dois,
    # Narrative ordering: highlighted emerging regions at top of chart,
    # gray established regions below — groups the story visually
    narrative_order = case_when(
      region_label == "Sub-Saharan Africa" ~ 2,
      region_label == "South Asia" ~ 1,
      TRUE ~ orcid_rate + 10
    ),
    region_label = fct_reorder(region_label, narrative_order)
  ) |>
  # Insert blank row to create visual gap between gray and highlighted groups
  (\(x) {
    separator <- tibble(
      region_id              = NA_character_,
      region_label           = factor(" ", levels = c(levels(x$region_label), " ")),
      with_orcid_for_authors = NA_real_,
      n_dois                 = NA_real_,
      orcid_rate             = NA_real_,
      narrative_order        = 2.5
    )
    bind_rows(x, separator) |>
      mutate(region_label = fct_reorder(region_label, narrative_order))
  })()

### |- Panel B: ORCID acceleration (monthly 2025-2026, journal articles) ----

# Monthly window only (Jan 2025 onward)
monthly_data <- metadata_coverage_stats_by_country |>
  filter(
    document_type == "journal-article",
    current_up_to >= as.Date("2025-01-01")
  ) |>
  mutate(region_label = region_labels[region_id]) |>
  filter(!is.na(region_label)) |>
  summarise(
    with_orcid_for_authors = sum(with_orcid_for_authors),
    n_dois = sum(n_dois),
    .by = c(current_up_to, region_id, region_label)
  ) |>
  mutate(orcid_rate = with_orcid_for_authors / n_dois) |>
  arrange(region_id, current_up_to)

# Baseline: Jan 2025 rate per region
baseline <- monthly_data |>
  filter(current_up_to == min(current_up_to)) |>
  select(region_id, baseline_rate = orcid_rate)

# Filter: only regions with a meaningful baseline (>= 3%)
eligible_regions <- baseline |>
  filter(baseline_rate >= 0.03) |>
  pull(region_id)

panel_b_data <- monthly_data |>
  filter(region_id %in% eligible_regions) |>
  left_join(baseline, by = "region_id") |>
  mutate(orcid_indexed = orcid_rate / baseline_rate * 100)

# Identify fastest and slowest growing regions for annotation
region_growth <- panel_b_data |>
  filter(current_up_to == max(current_up_to)) |>
  arrange(desc(orcid_indexed))

top_regions <- region_growth |>
  slice_head(n = 2) |>
  pull(region_id)
bottom_regions <- region_growth |>
  slice_tail(n = 1) |>
  pull(region_id)

# End-of-line label data
end_labels <- panel_b_data |>
  filter(current_up_to == max(current_up_to)) |>
  mutate(label = glue("{region_label}\n({round(orcid_indexed, 0)})"))
```

5. Visualization Parameters

Show code

```{r}
#| label: params
#| include: true
#| warning: false

### |- plot aesthetics ----
colors <- get_theme_colors(
  palette = list(
    EAS = "#4E79A7",
    ECS = "#722F37",
    LCN = "#F28E2B",
    MEA = "#76B7B2",
    NAC = "#59A14F",
    SAS = "#EDC948",
    SSF = "#B07AA1",
    highlight = "#722F37",
    muted = "gray70",
    reference = "gray40"
  )
)

region_colors <- c(
  "East Asia & Pacific" = colors$palette$EAS,
  "Europe & Central Asia" = colors$palette$ECS,
  "Latin America & Caribbean" = colors$palette$LCN,
  "Middle East & North Africa" = colors$palette$MEA,
  "North America" = colors$palette$NAC,
  "South Asia" = colors$palette$SAS,
  "Sub-Saharan Africa" = colors$palette$SSF
)

### |- titles and caption ----
title_text <- str_glue("Research is Becoming More Connected")
subtitle_text <- str_glue(
    "Crossref metadata adoption reveals that emerging regions are modernizing fastest —\n",
    "even as established systems still lead in absolute ORCID author coverage, 2025\u20132026"
)

caption_text <- create_social_caption(
    tt_year = 2026,
    tt_week = 20,
    source_text = "Crossref · Bédard-Vallée (2026)"
)

panel_a_title <- "Emerging Regions Still Have Lower ORCID Coverage"
panel_a_sub <- "ORCID author coverage by region · journal articles · Apr 2026"

panel_b_title <- "But Adoption Is Rising Faster"
panel_b_sub <- "Indexed ORCID author coverage by region, Jan 2025 = 100\nJournal articles · regions below 3% ORCID baseline in Jan 2025 excluded from index"

### |- fonts ----
setup_fonts()
fonts <- get_font_families()

### |- plot theme ----
base_theme <- create_base_theme(colors)

weekly_theme <- extend_weekly_theme(
    base_theme,
    theme(
        plot.title.position   = "plot",
        plot.caption.position = "plot",
        plot.title = element_text(
            size = 18, face = "bold", family = fonts$title,
            margin = margin(b = 4)
        ),
        plot.subtitle = element_text(
            size = 9.5, family = fonts$text, color = "gray40",
            lineheight = 1.3, margin = margin(b = 16)
        ),
        plot.caption = element_markdown(
            size = 8, family = fonts$text, color = "gray50",
            margin = margin(t = 12)
        ),
        panel.grid.major = element_line(color = "gray92", linewidth = 0.3),
        panel.grid.minor = element_blank(),
        axis.ticks = element_blank(),
        axis.title.x = element_markdown(size = 9, color = "gray30", family = fonts$text),
        axis.title.y = element_markdown(size = 9, color = "gray30", family = fonts$text),
        axis.text = element_text(size = 8, color = "gray40", family = fonts$text),
        legend.position = "bottom",
        legend.title = element_blank(),
        legend.text = element_text(size = 8, family = fonts$text),
        legend.key.size = unit(0.4, "cm"),
        strip.text = element_blank(),
        axis.text.x = element_text(angle = 0, hjust = 0.5, size = 7.5,
                                  color = "gray40", family = fonts$text)
    )
)

theme_set(weekly_theme)
```

6. Plot

Show code

```{r}
#| label: plot
#| warning: false

#### |- Panel A: Horizontal bar chart — current ORCID coverage by region ----

# Accent colors matched exactly to Panel B for cross-panel continuity
accent_regions <- c("Sub-Saharan Africa", "South Asia")

bar_fill_values <- panel_a_data |>
  mutate(bar_color = case_when(
    region_label == "Sub-Saharan Africa" ~ region_colors["Sub-Saharan Africa"],
    region_label == "South Asia" ~ region_colors["South Asia"],
    region_label == " " ~ NA_character_,
    TRUE ~ "gray78"
  )) |>
  select(region_label, bar_color) |>
  deframe()

label_color_values <- panel_a_data |>
  mutate(lbl_color = case_when(
    region_label %in% accent_regions ~ "gray15",
    region_label == " " ~ NA_character_,
    TRUE ~ "gray45"
  )) |>
  select(region_label, lbl_color) |>
  deframe()

p_a <- ggplot(
  panel_a_data,
  aes(x = orcid_rate, y = region_label, fill = region_label)
) +
  # Geoms
  geom_col(width = 0.55, alpha = 0.9, na.rm = TRUE) +
  geom_text(
    data = panel_a_data |> filter(!is.na(orcid_rate)),
    aes(
      label = percent(orcid_rate, accuracy = 0.1),
      color = region_label
    ),
    hjust = -0.15, size = 3, family = fonts$text
  ) +
  # Scale
  scale_fill_manual(values = bar_fill_values, guide = "none", na.value = NA) +
  scale_color_manual(values = label_color_values, guide = "none", na.value = NA) +
  scale_x_continuous(
    labels = percent_format(accuracy = 1),
    expand = expansion(mult = c(0, 0.20)),
    limits = c(0, NA)
  ) +
  scale_y_discrete() +
  coord_cartesian(clip = "off") +
  # Labs
  labs(
    title = panel_a_title,
    subtitle = panel_a_sub,
    x = "Share of journal articles with ORCID author identifier",
    y = NULL
  ) +
  # Theme
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.major.x = element_line(color = "gray92", linewidth = 0.3)
  )


### |- Panel B: ORCID momentum line chart ----

# Highlight lines vs muted lines
highlight_ids <- top_regions

# Keep only 3 background comparator regions to avoid noisy texture
comparator_ids <- region_growth |>
  filter(!region_id %in% highlight_ids) |>
  slice(c(1, 2, 3)) |>
  pull(region_id)

p_b <- ggplot(
  panel_b_data,
  aes(
    x = current_up_to, y = orcid_indexed,
    color = region_label, group = region_label
  )
) +

  # Geoms
  geom_hline(
    yintercept = 100, color = "gray70",
    linetype = "solid", linewidth = 0.4
  ) +
  annotate("text",
    x = as.Date("2025-01-15"), y = 101,
    label = "Jan 2025 baseline", hjust = 0, vjust = 0,
    size = 2.5, color = "gray55", fontface = "italic"
  ) +
  geom_line(
    data = panel_b_data |> filter(region_id %in% comparator_ids),
    linewidth = 0.6, alpha = 0.12, color = "gray75"
  ) +
  geom_line(
    data = panel_b_data |> filter(region_id %in% highlight_ids),
    linewidth = 1.1, alpha = 0.9
  ) +
  geom_text_repel(
    data = end_labels |> filter(region_id %in% highlight_ids),
    aes(label = label),
    hjust = 0, direction = "y", nudge_x = 45,
    size = 2.7, family = fonts$text,
    segment.size = 0.3, segment.color = "gray60",
    box.padding = 0.1
  ) +
  geom_point(
    data = panel_b_data |>
      filter(
        region_id %in% highlight_ids,
        current_up_to == max(current_up_to)
      ),
    size = 2.5, stroke = 0.3
  ) +
  # Scales
  scale_color_manual(values = region_colors, guide = "none") +
  scale_x_date(
    date_labels = "%b '%y",
    date_breaks = "3 months",
    expand = expansion(mult = c(0.02, 0.28))
  ) +
  scale_y_continuous(
    labels = function(x) glue("{x}"),
    breaks = c(100, 125, 150, 175, 200)
  ) +
  coord_cartesian(clip = "off") +
  # Labs
  labs(
    title = panel_b_title,
    subtitle = panel_b_sub,
    x = NULL,
    y = "Indexed ORCID coverage<br>(Jan 2025 = 100)"
  )

### |- Combined layout ----
p_combined <- p_a + p_b +
  plot_layout(widths = c(1, 1)) +
  plot_annotation(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text,
    theme = theme(
      plot.title = element_text(
        size = 20, face = "bold", family = fonts$title,
        margin = margin(b = 6)
      ),
      plot.subtitle = element_text(
        size = 11, family = fonts$text, color = "gray30",
        lineheight = 1.3, margin = margin(b = 20)
      ),
      plot.caption = element_markdown(
        size = 8, family = fonts$caption, color = "gray50",
        margin = margin(t = 16), linewidth = 1.2
      ),
      plot.background = element_rect(fill = colors$background, color = NA),
      plot.margin = margin(20, 20, 12, 20)
    )
  )
```

7. Save

Show code

```{r}
#| label: save
#| warning: false

### |-  plot image ----  
save_plot_patchwork(
  plot = p_combined, 
  type = "tidytuesday", 
  year = 2026, 
  week = 20, 
  width  = 14,
  height = 7
  )
```

8. Session Info

Expand for Session Info

R version 4.5.3 (2026-03-11 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] here_1.0.2      patchwork_1.3.2 skimr_2.2.2     glue_1.8.0     
 [5] scales_1.4.0    ggrepel_0.9.8   janitor_2.2.1   showtext_0.9-8 
 [9] showtextdb_3.0  sysfonts_0.8.9  ggtext_0.1.2    lubridate_1.9.5
[13] forcats_1.0.1   stringr_1.6.0   dplyr_1.2.1     purrr_1.2.2    
[17] readr_2.2.0     tidyr_1.3.2     tibble_3.3.1    ggplot2_4.0.3  
[21] tidyverse_2.0.0 pacman_0.5.1   

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1   farver_2.1.2       S7_0.2.1           fastmap_1.2.0     
 [5] gh_1.5.0           digest_0.6.39      timechange_0.4.0   lifecycle_1.0.5   
 [9] rsvg_2.7.0         magrittr_2.0.5     compiler_4.5.3     rlang_1.2.0       
[13] tools_4.5.3        yaml_2.3.12        knitr_1.51         labeling_0.4.3    
[17] htmlwidgets_1.6.4  bit_4.6.0          curl_7.0.0         xml2_1.5.2        
[21] camcorder_0.1.0    repr_1.1.7         RColorBrewer_1.1-3 tidytuesdayR_1.3.2
[25] withr_3.0.2        grid_4.5.3         gitcreds_0.1.2     cli_3.6.6         
[29] rmarkdown_2.31     crayon_1.5.3       generics_0.1.4     otel_0.2.0        
[33] rstudioapi_0.18.0  tzdb_0.5.0         commonmark_2.0.0   parallel_4.5.3    
[37] ggplotify_0.1.3    base64enc_0.1-6    vctrs_0.7.3        yulab.utils_0.2.4 
[41] jsonlite_2.0.0     litedown_0.9       gridGraphics_0.5-1 hms_1.1.4         
[45] bit64_4.6.0-1      systemfonts_1.3.2  magick_2.9.1       gifski_1.32.0-2   
[49] codetools_0.2-20   stringi_1.8.7      gtable_0.3.6       pillar_1.11.1     
[53] rappdirs_0.3.4     htmltools_0.5.9    R6_2.6.1           httr2_1.2.2       
[57] textshaping_1.0.5  rprojroot_2.1.1    vroom_1.7.1        evaluate_1.0.5    
[61] markdown_2.0       gridtext_0.1.6     snakecase_0.11.1   Rcpp_1.1.1        
[65] svglite_2.2.2      xfun_0.57          fs_2.0.1           pkgconfig_2.0.3

9. GitHub Repository

Expand for GitHub Repo

The complete code for this analysis is available in tt_2026_20.qmd.

For the full repository, click here.

10. References

Expand for References

Data Source:
- TidyTuesday 2026 Week 20: Crossref Metadata Coverage

11. Custom Functions Documentation

📦 Custom Helper Functions

This analysis uses custom functions from my personal module library for efficiency and consistency across projects.

Functions Used:

fonts.R: setup_fonts(), get_font_families() - Font management with showtext
social_icons.R: create_social_caption() - Generates formatted social media captions
image_utils.R: save_plot() - Consistent plot saving with naming conventions
base_theme.R: create_base_theme(), extend_weekly_theme(), get_theme_colors() - Custom ggplot2 themes

Why custom functions?
These utilities standardize theming, fonts, and output across all my data visualizations. The core analysis (data tidying and visualization logic) uses only standard tidyverse packages.

Source Code:
View all custom functions → GitHub: R/utils

Citation

BibTeX citation:

@online{ponce2026,
  author = {Ponce, Steven},
  title = {Research Is {Becoming} {More} {Connected}},
  date = {2026-05-18},
  url = {https://stevenponce.netlify.app/data_visualizations/TidyTuesday/2026/tt_2026_20.html},
  langid = {en}
}

For attribution, please cite this work as:

Ponce, Steven. 2026. “Research Is Becoming More Connected.” May 18. https://stevenponce.netlify.app/data_visualizations/TidyTuesday/2026/tt_2026_20.html.