• Steven Ponce
  • About
  • Data Visualizations
  • Projects
  • Resume
  • Email

On this page

  • Original
  • Makeover
  • Steps to Create this Graphic
    • 1. Load Packages & Setup
    • 2. Read in the Data
    • 3. Examine the Data
    • 4. Tidy Data
    • 5. Visualization Parameters
    • 6. Plot
    • 7. Save
    • 8. Session Info
    • 9. GitHub Repository
    • 10. References
    • 11. Custom Functions Documentation

Living WWII Veterans by State (2025)

  • Show All Code
  • Hide All Code

  • View Source

45,340 veterans remain; rates per 100k use U.S. Census 2023 population estimates

Multi-panel analysis of 45,340 living WWII veterans by state, comparing raw counts vs population-adjusted rates. New Hampshire leads per capita (42.8 per 100k) while five states hold 37% of all veterans.
Published

November 11, 2025

Original

The original visualization comes from National WWII Museum. (2025). WWII Veteran Statistics

Original visualization

Makeover

Figure 1: Four-panel dashboard showing the number of living WWII veterans by state in 2025. Panel A: diverging bar chart of veterans per 100k vs US mean, with New Hampshire highest at +29.1. Panel B: lollipop chart of the top 20 states, all above the national average of 13.7 per 100k. Panel C: histogram showing right-skewed distribution, with most states under 1,000 veterans and the top 5 states holding 37% of the total. Panel D: box plots by region showing that the Northeast has the highest concentration per capita, while the South has the most states but lower median rates.

Steps to Create this Graphic

1. Load Packages & Setup

Show code
```{r}
#| label: load
#| warning: false
#| message: false
#| results: "hide"

## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
  if (!require("pacman")) install.packages("pacman")
  pacman::p_load(
    tidyverse,   # Easily Install and Load the 'Tidyverse'
    janitor,     # Simple Tools for Examining and Cleaning Dirty Data
    skimr,       # Compact and Flexible Summaries of Data
    scales,      # Scale Functions for Visualization
    ggtext,      # Improved Text Rendering Support for 'ggplot2'
    showtext,    # Using Fonts More Easily in R Graphs
    glue,        # Interpreted String Literals
    patchwork,   # The Composer of Plots
    ggrepel,     # Automatically Position Non-Overlapping Text Labels
    tidycensus   # Load US Census Boundary and Attribute Data
  )
})

### |- figure size ----
camcorder::gg_record(
    dir    = here::here("temp_plots"),
    device = "png",
    width  = 14,
    height = 10,
    units  = "in",
    dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))
```

2. Read in the Data

Show code
```{r}
#| label: read
#| include: true
#| eval: true
#| warning: false
#|

ww2_veterans_raw <- read_csv(
  here::here("data/MakeoverMonday/2025/Living_WWII_Veterans_by_State_2025.csv")) |>
  clean_names()

### |-  Get state population data from US Census Bureau ----
# Source: US Census Bureau, Population Division
# Annual Estimates of the Resident Population: July 1, 2023
# Retrieved via tidycensus package
state_pop_census <- get_estimates(
  geography = "state",
  product = "population",
  vintage = 2023,
  year = 2023
  ) |>
  filter(variable == "POPESTIMATE") |>
  select(NAME, value) |>
  rename(state = NAME, population_2023 = value)
```

3. Examine the Data

Show code
```{r}
#| label: examine
#| include: true
#| eval: true
#| results: 'hide'
#| warning: false

glimpse(ww2_veterans_raw)
skim_without_charts(ww2_veterans_raw) |> summary()
```

4. Tidy Data

Show code
```{r}
#| label: tidy
#| warning: false

ww2_veterans_clean <- ww2_veterans_raw |>
  filter(state != "Island Areas & Foreign") |>
  left_join(state_pop_census, by = "state") |>
  mutate(
    veterans_per_100k = if_else(!is.na(population_2023),
      1e5 * living_wwii_veterans_2025 / population_2023,
      NA_real_
    ),
    region = case_when(
      state %in% c(
        "Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island",
        "Vermont", "New Jersey", "New York", "Pennsylvania"
      ) ~ "Northeast",
      state %in% c(
        "Illinois", "Indiana", "Iowa", "Kansas", "Michigan", "Minnesota", "Missouri",
        "Nebraska", "North Dakota", "Ohio", "South Dakota", "Wisconsin"
      ) ~ "Midwest",
      state %in% c(
        "Alabama", "Arkansas", "Delaware", "District of Columbia", "Florida", "Georgia",
        "Kentucky", "Louisiana", "Maryland", "Mississippi", "North Carolina", "Oklahoma",
        "South Carolina", "Tennessee", "Texas", "Virginia", "West Virginia"
      ) ~ "South",
      state %in% c(
        "Alaska", "Arizona", "California", "Colorado", "Hawaii", "Idaho", "Montana",
        "Nevada", "New Mexico", "Oregon", "Utah", "Washington", "Wyoming"
      ) ~ "West",
      state == "Puerto Rico" ~ "Territory",
      TRUE ~ "Other"
    )
  )

summary_stats <- ww2_veterans_clean |>
  summarise(
    total_veterans   = sum(living_wwii_veterans_2025, na.rm = TRUE),
    mean_per_state   = mean(living_wwii_veterans_2025, na.rm = TRUE),
    median_per_state = median(living_wwii_veterans_2025, na.rm = TRUE),
    mean_per_100k    = mean(veterans_per_100k, na.rm = TRUE)
  )

national_mean_per_100k <- as.numeric(summary_stats$mean_per_100k)
national_median_count <- as.numeric(summary_stats$median_per_state)

### |-  panel 1 data ----
# A) Veterans per 100k vs US mean
panel_1_data <- ww2_veterans_clean |>
  filter(!is.na(veterans_per_100k)) |>
  mutate(
    diff_from_mean = veterans_per_100k - national_mean_per_100k,
    state = fct_reorder(state, diff_from_mean),
    above_avg = diff_from_mean > 0
  ) |>
  slice_max(order_by = abs(diff_from_mean), n = 20)

### |-  panel 2 data ----
# B) Top states by per-capita rate
panel_2_data <- ww2_veterans_clean |>
  filter(!is.na(veterans_per_100k)) |>
  arrange(desc(veterans_per_100k)) |>
  slice_head(n = 20) |>
  mutate(state = fct_reorder(state, veterans_per_100k))

### |-  panel 3 data ----
# C) Histogram + density with labels for states ≥ 3,000 veterans
big_states <- ww2_veterans_clean |>
  filter(living_wwii_veterans_2025 >= 3000) |>
  mutate(y_pos = 0.00003)

top5_share <- ww2_veterans_clean |>
  arrange(desc(living_wwii_veterans_2025)) |>
  slice_head(n = 5) |>
  summarise(share = sum(living_wwii_veterans_2025, na.rm = TRUE) /
    sum(ww2_veterans_clean$living_wwii_veterans_2025, na.rm = TRUE)) |>
  pull(share)

### |-  panel 4 data ----
# D) Box + jitter
panel_4_data <- ww2_veterans_clean |>
  filter(!is.na(veterans_per_100k)) |>
  group_by(region) |>
  mutate(
    regional_median = median(veterans_per_100k),
    n_states = n()
  ) |>
  ungroup() |>
  mutate(region = fct_reorder(region, regional_median, .desc = TRUE))

region_levels <- levels(panel_4_data$region)
n_map <- panel_4_data |>
  distinct(region, n_states) |>
  deframe()
label_map <- setNames(
  paste0(region_levels, "\n(n=", n_map[region_levels], ")"),
  region_levels
)
```

5. Visualization Parameters

Show code
```{r}
#| label: params
#| include: true
#| warning: false

### |-  plot aesthetics ----
# Get base colors with custom palette
colors <- get_theme_colors(
  palette = list(
    below_avg = "#D97548",
    above_avg = "#4A7C8C",
    primary_accent = "#2B4C5E",
    secondary_accent = "#D97548",
    box_northeast = "#4A7C8C",
    box_west = "#8B9D57",
    box_south = "#D97548",
    box_midwest = "#6B8CAE",
    box_territory = "#999999",
    gray_dark = "#3D3D3D",
    gray_medium = "#9A9A9A",
    gray_light = "#E6E6E6"
  )
)   
 
### |-  titles and caption ----
title_text <- "Living WWII Veterans by State (2025)"

subtitle_text <- str_glue(
  "**{comma(summary_stats$total_veterans)}** veterans remain; ",
  "rates per 100k use U.S. Census 2023 population estimates"
)

# Create caption
caption_text <- create_mm_caption(
  mm_year = current_year,
  mm_week = current_week,
  source_text = "(1) National WWII Museum (2025), (2) U.S. Census Bureau (2023 Population Estimates via tidycensus, retrieved 20251111). "
)

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----

# Start with base theme
base_theme <- create_base_theme(colors)

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
  base_theme,
  theme(
    # # Text styling
    plot.title = element_text(
      size = rel(1.6), family = fonts$title, face = "bold",
      color = colors$title, lineheight = 1.1, hjust = 0,
      margin = margin(t = 5, b = 10)
    ),
    plot.subtitle = element_markdown(
      size = rel(0.95), family = fonts$subtitle, face = "italic",
      color = alpha(colors$subtitle, 0.9), lineheight = 1.1,
      margin = margin(t = 0, b = 20)
    ),
    
    # Legend formatting
    legend.position = "plot",
    legend.justification = "top",
    legend.margin = margin(l = 12, b = 5),
    legend.key.size = unit(0.8, "cm"),
    legend.box.margin = margin(b = 10),
    legend.title = element_text(face = "bold"),
    
    # Axis formatting
    axis.ticks.y = element_blank(),
    axis.ticks.x = element_line(color = "gray", linewidth = 0.5),
    axis.title.x = element_text(
      face = "bold", size = rel(0.85),
      margin = margin(t = 10)
    ),
    axis.title.y = element_text(
      face = "bold", size = rel(0.85),
      margin = margin(r = 10)
    ),
    axis.text.x = element_text(
      size = rel(0.85), family = fonts$subtitle,
      color = colors$text
    ),
    axis.text.y = element_text(
      size = rel(0.85), family = fonts$subtitle,
      color = colors$text
    ),
    
    # Grid lines
    panel.grid.minor = element_line(color = "#ecf0f1", linewidth = 0.2),
    panel.grid.major = element_line(color = "#ecf0f1", linewidth = 0.4),
    
    # Margin
    plot.margin = margin(20, 20, 20, 20)
  )
)

# Set theme
theme_set(weekly_theme)
```

6. Plot

Show code
```{r}
#| label: plot
#| warning: false

### |-  panel 1 plot ----
# A) Veterans per 100k vs US mean
panel_1 <- ggplot(panel_1_data, aes(diff_from_mean, state, fill = above_avg)) +
  # Geoms
  geom_col(width = 0.68) +
  geom_vline(xintercept = 0, linewidth = 0.8, color = colors$palette$gray_dark) +
  geom_text(aes(label = number(diff_from_mean, accuracy = 0.1)),
    hjust = ifelse(panel_1_data$diff_from_mean > 0, -0.15, 1.15),
    size = 2.8, color = colors$palette$gray_dark
  ) +
  # Scales
  scale_fill_manual(values = c(`TRUE` = colors$palette$above_avg, `FALSE` = colors$palette$below_avg)) +
  scale_x_continuous(
    labels = label_number(accuracy = 1),
    expand = expansion(mult = c(0.1, 0.12))
  ) +
  # Labs
  labs(
    title = "A. Veterans per 100k vs US Mean",
    subtitle = "Higher values indicate greater veteran concentration (†)",
    x = "Difference from US mean", y = NULL
  ) +
  # Theme
  theme(
    plot.margin = margin(5, 5, 5, 5),
    panel.grid.major.y = element_blank(),
    axis.text.y = element_text(margin = margin(r = 4))
  )

### |-  panel 2 plot ----
# B) Top states by per-capita rate
panel_2 <-
  ggplot(panel_2_data, aes(veterans_per_100k, state)) +
  # Geoms
  geom_segment(aes(x = 0, xend = veterans_per_100k, yend = state),
    color = colors$palette$gray_light, linewidth = 1.1
  ) +
  geom_point(size = 3.2, color = colors$palette$primary_accent) +
  geom_vline(
    xintercept = national_mean_per_100k, linetype = "dashed",
    color = colors$palette$gray_dark, linewidth = 0.6
  ) +
  geom_text(aes(label = number(veterans_per_100k, accuracy = 0.1)),
    nudge_x = 1.0, hjust = 0, size = 2.8, color = colors$palette$gray_dark
  ) +
  # Scales
  scale_x_continuous(
    labels = label_number(accuracy = 1),
    expand = expansion(mult = c(0.01, 0.15))
  ) +
  # Labs
  labs(
    title = "B. Top States by Veterans per 100k",
    subtitle = glue(
      "Each exceeds the national average of {number(national_mean_per_100k, accuracy = 0.1)} per 100k (†)"
    ),
    x = "Veterans per 100k residents", y = NULL
  ) +
  # Theme
  theme(
    plot.margin = margin(5, 5, 5, 5),
    panel.grid.major.y = element_blank(),
    axis.text.y = element_text(margin = margin(r = 4))
  )

### |-  panel 3 plot ----
# C) Histogram + density with labels for states ≥ 3,000 veterans
panel_3 <-
  ggplot(ww2_veterans_clean, aes(living_wwii_veterans_2025)) +
  # Geoms
  geom_histogram(aes(y = after_stat(density)),
    binwidth = 250, boundary = 0,
    fill = colors$palette$primary_accent, alpha = 0.72,
    color = "white", linewidth = 0.3
  ) +
  geom_density(color = colors$palette$secondary_accent, linewidth = 1.0) +
  geom_vline(
    xintercept = national_median_count, linetype = "dotted",
    color = colors$palette$gray_dark, linewidth = 0.6
  ) +
  geom_label_repel(
    data = big_states,
    aes(x = living_wwii_veterans_2025, y = y_pos, label = state),
    seed = 2025, size = 3, label.size = 0.25,
    fill = alpha("white", 0.9), color = colors$palette$gray_dark,
    direction = "y", nudge_y = 0.00002, min.segment.length = 0,
    box.padding = 0.25
  ) +
  # Annotate
  annotate("label",
    x = national_median_count, y = 0.0009, label = "State median",
    size = 2.7, label.size = 0, fill = alpha("white", 0.92),
    color = colors$gray_dark
  ) +
  annotate("label",
    x = Inf, y = Inf, hjust = 1.02, vjust = 1.2,
    label = paste0("Top 5 hold ", percent(top5_share, accuracy = 1)),
    size = 3, label.size = 0, fill = alpha("white", 0.9),
    color = colors$gray_dark
  ) +
  # Scales
  scale_x_continuous(
    labels = label_comma(),
    expand = expansion(mult = c(0.02, 0.02))
  ) +
  scale_y_continuous(labels = label_number(accuracy = 0.0001)) +
  # Labs
  labs(
    title = "C. Distribution of Veterans Across States",
    subtitle = glue(
      "Most states have fewer than 1,000; top 5 hold {percent(top5_share, accuracy = 1)} of all veterans"
    ),
    x = "Living WWII veterans (count)", y = "Density"
  ) +
  # Theme
  theme(plot.margin = margin(5, 5, 5, 5))

### |-  panel 4 plot ----
# D) Box + jitter
panel_4 <-
  ggplot(panel_4_data, aes(region, veterans_per_100k, color = region)) +
  # Geoms
  geom_boxplot(width = 0.58, alpha = 0.28, linewidth = 0.8, outlier.shape = NA) +
  geom_jitter(width = 0.15, size = 2.2, alpha = 0.75) +
  geom_hline(
    yintercept = national_mean_per_100k, linetype = "dashed",
    color = colors$palette$gray_dark, linewidth = 0.6
  ) +
  geom_label_repel(
    data = filter(panel_4_data, state == "New Hampshire"),
    aes(label = state),
    seed = 2025, size = 3, label.size = 0.25,
    fill = alpha("white", 0.9), color = colors$palette$gray_dark,
    nudge_y = 2, nudge_x = 0.2
  ) +
  # Scales
  scale_color_manual(values = c(
    "Northeast" = colors$palette$box_northeast,
    "West" = colors$palette$box_west,
    "South" = colors$palette$box_south,
    "Midwest" = colors$palette$box_midwest,
    "Territory" = colors$palette$box_territory
  ), guide = "none") +
  scale_x_discrete(limits = region_levels, labels = label_map[region_levels]) +
  coord_cartesian(ylim = c(0, NA)) +
  # Labs
  labs(
    title = "D. Regional Distribution of Veterans per 100k",
    subtitle = "Northeast has highest rates; South has many states but lower median (†)",
    x = "Region", y = "Veterans per 100k residents"
  ) +
  # Theme
  theme(
    plot.margin = margin(5, 5, 5, 5),
    axis.text.x = element_text(lineheight = 1.1)
    )

### |-  combined plot ----
combined_plots <- (panel_1 | panel_2) / (panel_3 | panel_4) +
  plot_layout(heights = c(1.2, 0.8), widths = c(1, 1))

combined_plots <- combined_plots +
  plot_annotation(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text,
    # tag_levels = "A",
    theme = theme(
      plot.title = element_text(
        size = rel(2.4),
        family = fonts$title,
        face = "bold",
        color = colors$title,
        lineheight = 1.1,
        margin = margin(t = 5, b = 5)
      ),
      plot.subtitle = element_markdown(
        size = rel(1.2),
        family = fonts$subtitle,
        color = alpha(colors$subtitle, 0.9),
        lineheight = 1.2,
        margin = margin(t = 5, b = 15)
      ),
      plot.caption = element_markdown(
        size = rel(0.65),
        family = fonts$caption,
        color = colors$caption,
        hjust = 0,
        margin = margin(t = 10)
      )
    )
  )
```

7. Save

Show code
```{r}
#| label: save
#| warning: false

### |-  plot image ----  
save_plot_patchwork(
  plot = combined_plots, 
  type = "makeovermonday", 
  year = current_year,
  week = current_week,
  width = 14, 
  height = 10
  )
```

8. Session Info

TipExpand for Session Info
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] here_1.0.1       tidycensus_1.7.3 ggrepel_0.9.6    patchwork_1.3.0 
 [5] glue_1.8.0       showtext_0.9-7   showtextdb_3.0   sysfonts_0.8.9  
 [9] ggtext_0.1.2     scales_1.3.0     skimr_2.1.5      janitor_2.2.0   
[13] lubridate_1.9.3  forcats_1.0.0    stringr_1.5.1    dplyr_1.1.4     
[17] purrr_1.0.2      readr_2.1.5      tidyr_1.3.1      tibble_3.2.1    
[21] ggplot2_3.5.1    tidyverse_2.0.0  pacman_0.5.1    

loaded via a namespace (and not attached):
 [1] DBI_1.2.3          rlang_1.1.6        magrittr_2.0.3     snakecase_0.11.1  
 [5] e1071_1.7-16       compiler_4.4.0     systemfonts_1.1.0  vctrs_0.6.5       
 [9] rvest_1.0.4        pkgconfig_2.0.3    crayon_1.5.3       fastmap_1.2.0     
[13] magick_2.8.5       labeling_0.4.3     utf8_1.2.4         promises_1.3.0    
[17] rmarkdown_2.29     markdown_1.13      tzdb_0.5.0         ps_1.8.1          
[21] camcorder_0.1.0    bit_4.5.0          xfun_0.49          jsonlite_1.8.9    
[25] later_1.3.2        uuid_1.2-1         parallel_4.4.0     R6_2.5.1          
[29] stringi_1.8.4      Rcpp_1.0.13-1      knitr_1.49         base64enc_0.1-3   
[33] timechange_0.3.0   tidyselect_1.2.1   rstudioapi_0.17.1  yaml_2.3.10       
[37] codetools_0.2-20   websocket_1.4.2    curl_6.0.0         processx_3.8.4    
[41] withr_3.0.2        evaluate_1.0.1     gridGraphics_0.5-1 sf_1.0-19         
[45] units_0.8-5        proxy_0.4-27       xml2_1.3.6         pillar_1.9.0      
[49] tigris_2.2.1       KernSmooth_2.23-22 renv_1.0.3         generics_0.1.3    
[53] vroom_1.6.5        rprojroot_2.0.4    chromote_0.4.0     hms_1.1.3         
[57] commonmark_1.9.2   munsell_0.5.1      class_7.3-22       tools_4.4.0       
[61] fs_1.6.5           grid_4.4.0         colorspace_2.1-1   repr_1.1.7        
[65] cli_3.6.4          rappdirs_0.3.3     rsvg_2.6.1         fansi_1.0.6       
[69] svglite_2.1.3      gtable_0.3.6       yulab.utils_0.1.8  digest_0.6.37     
[73] classInt_0.4-10    ggplotify_0.1.2    gifski_1.32.0-1    htmlwidgets_1.6.4 
[77] farver_2.1.2       htmltools_0.5.8.1  lifecycle_1.0.4    httr_1.4.7        
[81] gridtext_0.1.5     bit64_4.5.2       

9. GitHub Repository

TipExpand for GitHub Repo

The complete code for this analysis is available in mm_2025_44.qmd.

For the full repository, click here.

10. References

TipExpand for References
  1. Data:
  • Makeover Monday 2025 Week 44: Living WWII Veterans by State 2025
  1. Article
  • Living WWII Veterans by State 2025
  1. Citation:
    • National WWII Museum. (2025). WWII Veteran Statistics. Retrieved from https://www.nationalww2museum.org/war/wwii-veteran-statistics
    • US Census Bureau. (2023). Annual Estimates of the Resident Population. Retrieved via tidycensus package.

11. Custom Functions Documentation

Note📦 Custom Helper Functions

This analysis uses custom functions from my personal module library for efficiency and consistency across projects.

Functions Used:

  • fonts.R: setup_fonts(), get_font_families() - Font management with showtext
  • social_icons.R: create_social_caption() - Generates formatted social media captions
  • image_utils.R: save_plot() - Consistent plot saving with naming conventions
  • base_theme.R: create_base_theme(), extend_weekly_theme(), get_theme_colors() - Custom ggplot2 themes

Why custom functions?
These utilities standardize theming, fonts, and output across all my data visualizations. The core analysis (data tidying and visualization logic) uses only standard tidyverse packages.

Source Code:
View all custom functions → GitHub: R/utils

Back to top
Source Code
---
title: "Living WWII Veterans by State (2025)"
subtitle: "45,340 veterans remain; rates per 100k use U.S. Census 2023 population estimates"
description: "Multi-panel analysis of 45,340 living WWII veterans by state, comparing raw counts vs population-adjusted rates. New Hampshire leads per capita (42.8 per 100k) while five states hold 37% of all veterans. "
date: "2025-11-11"
tags: [
  "makeover-monday",
  "data-visualization",
  "ggplot2",
  "patchwork",
  "veterans",
  "wwii",
  "demographic-analysis",
  "population-normalization",
  "diverging-bar-chart",
  "lollipop-chart",
  "distribution-analysis",
  "box-plot",
  "tidycensus",
  "us-census"
]
image: "thumbnails/mm_2025_44.png"
format:
  html:
    toc: true
    toc-depth: 5
    code-link: true
    code-fold: true
    code-tools: true
    code-summary: "Show code"
    self-contained: true
    theme: 
      light: [flatly, assets/styling/custom_styles.scss]
      dark: [darkly, assets/styling/custom_styles_dark.scss]
editor_options: 
  chunk_output_type: inline
execute: 
  freeze: true                                      
  cache: true                                       
  error: false
  message: false
  warning: false
  eval: true
---

```{r}
#| label: setup-links
#| include: false

# CENTRALIZED LINK MANAGEMENT

## Project-specific info 
current_year <- 2025
current_week <- 44
project_file <- "mm_2025_44.qmd"
project_image <- "mm_2025_44.png"

## Data Sources
data_main <- "https://data.world/makeovermonday/week-44-2025-wwii-veteran-statistics"
data_secondary <- "https://data.world/makeovermonday/week-44-2025-wwii-veteran-statistics"

## Repository Links  
repo_main <- "https://github.com/poncest/personal-website/"
repo_file <- paste0("https://github.com/poncest/personal-website/blob/master/data_visualizations/MakeoverMonday/", current_year, "/", project_file)

## External Resources/Images
chart_original <- "https://raw.githubusercontent.com/poncest/MakeoverMonday/refs/heads/master/2025/Week_44/original_chart.png"

## Organization/Platform Links
org_primary <- "https://www.nationalww2museum.org/war/wwii-veteran-statistics"
org_secondary <- "https://www.nationalww2museum.org/war/wwii-veteran-statistics"

# Helper function to create markdown links
create_link <- function(text, url) {
  paste0("[", text, "](", url, ")")
}

# Helper function for citation-style links
create_citation_link <- function(text, url, title = NULL) {
  if (is.null(title)) {
    paste0("[", text, "](", url, ")")
  } else {
    paste0("[", text, "](", url, ' "', title, '")')
  }
}
```

### Original

The original visualization comes from `r create_link("National WWII Museum. (2025). WWII Veteran Statistics", data_secondary)`

<!-- ![Original visualization](`r chart_original`) -->

![Original visualization](https://raw.githubusercontent.com/poncest/MakeoverMonday/refs/heads/master/2025/Week_44/original_chart.png)

### Makeover

![Four-panel dashboard showing the number of living WWII veterans by state in 2025. Panel A: diverging bar chart of veterans per 100k vs US mean, with New Hampshire highest at +29.1. Panel B: lollipop chart of the top 20 states, all above the national average of 13.7 per 100k. Panel C: histogram showing right-skewed distribution, with most states under 1,000 veterans and the top 5 states holding 37% of the total. Panel D: box plots by region showing that the Northeast has the highest concentration per capita, while the South has the most states but lower median rates.](mm_2025_44.png){#fig-1}

### <mark> **Steps to Create this Graphic** </mark>

#### 1. Load Packages & Setup

```{r}
#| label: load
#| warning: false
#| message: false      
#| results: "hide"     

## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
  if (!require("pacman")) install.packages("pacman")
  pacman::p_load(
    tidyverse,   # Easily Install and Load the 'Tidyverse'
    janitor,     # Simple Tools for Examining and Cleaning Dirty Data
    skimr,       # Compact and Flexible Summaries of Data
    scales,      # Scale Functions for Visualization
    ggtext,      # Improved Text Rendering Support for 'ggplot2'
    showtext,    # Using Fonts More Easily in R Graphs
    glue,        # Interpreted String Literals
    patchwork,   # The Composer of Plots
    ggrepel,     # Automatically Position Non-Overlapping Text Labels
    tidycensus   # Load US Census Boundary and Attribute Data
  )
})

### |- figure size ----
camcorder::gg_record(
    dir    = here::here("temp_plots"),
    device = "png",
    width  = 14,
    height = 10,
    units  = "in",
    dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))
```

#### 2. Read in the Data

```{r}
#| label: read
#| include: true
#| eval: true
#| warning: false
#| 

ww2_veterans_raw <- read_csv(
  here::here("data/MakeoverMonday/2025/Living_WWII_Veterans_by_State_2025.csv")) |>
  clean_names()

### |-  Get state population data from US Census Bureau ----
# Source: US Census Bureau, Population Division
# Annual Estimates of the Resident Population: July 1, 2023
# Retrieved via tidycensus package
state_pop_census <- get_estimates(
  geography = "state",
  product = "population",
  vintage = 2023,
  year = 2023
  ) |>
  filter(variable == "POPESTIMATE") |>
  select(NAME, value) |>
  rename(state = NAME, population_2023 = value)
```

#### 3. Examine the Data

```{r}
#| label: examine
#| include: true
#| eval: true
#| results: 'hide'
#| warning: false

glimpse(ww2_veterans_raw)
skim_without_charts(ww2_veterans_raw) |> summary()
```

#### 4. Tidy Data

```{r}
#| label: tidy
#| warning: false

ww2_veterans_clean <- ww2_veterans_raw |>
  filter(state != "Island Areas & Foreign") |>
  left_join(state_pop_census, by = "state") |>
  mutate(
    veterans_per_100k = if_else(!is.na(population_2023),
      1e5 * living_wwii_veterans_2025 / population_2023,
      NA_real_
    ),
    region = case_when(
      state %in% c(
        "Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island",
        "Vermont", "New Jersey", "New York", "Pennsylvania"
      ) ~ "Northeast",
      state %in% c(
        "Illinois", "Indiana", "Iowa", "Kansas", "Michigan", "Minnesota", "Missouri",
        "Nebraska", "North Dakota", "Ohio", "South Dakota", "Wisconsin"
      ) ~ "Midwest",
      state %in% c(
        "Alabama", "Arkansas", "Delaware", "District of Columbia", "Florida", "Georgia",
        "Kentucky", "Louisiana", "Maryland", "Mississippi", "North Carolina", "Oklahoma",
        "South Carolina", "Tennessee", "Texas", "Virginia", "West Virginia"
      ) ~ "South",
      state %in% c(
        "Alaska", "Arizona", "California", "Colorado", "Hawaii", "Idaho", "Montana",
        "Nevada", "New Mexico", "Oregon", "Utah", "Washington", "Wyoming"
      ) ~ "West",
      state == "Puerto Rico" ~ "Territory",
      TRUE ~ "Other"
    )
  )

summary_stats <- ww2_veterans_clean |>
  summarise(
    total_veterans   = sum(living_wwii_veterans_2025, na.rm = TRUE),
    mean_per_state   = mean(living_wwii_veterans_2025, na.rm = TRUE),
    median_per_state = median(living_wwii_veterans_2025, na.rm = TRUE),
    mean_per_100k    = mean(veterans_per_100k, na.rm = TRUE)
  )

national_mean_per_100k <- as.numeric(summary_stats$mean_per_100k)
national_median_count <- as.numeric(summary_stats$median_per_state)

### |-  panel 1 data ----
# A) Veterans per 100k vs US mean
panel_1_data <- ww2_veterans_clean |>
  filter(!is.na(veterans_per_100k)) |>
  mutate(
    diff_from_mean = veterans_per_100k - national_mean_per_100k,
    state = fct_reorder(state, diff_from_mean),
    above_avg = diff_from_mean > 0
  ) |>
  slice_max(order_by = abs(diff_from_mean), n = 20)

### |-  panel 2 data ----
# B) Top states by per-capita rate
panel_2_data <- ww2_veterans_clean |>
  filter(!is.na(veterans_per_100k)) |>
  arrange(desc(veterans_per_100k)) |>
  slice_head(n = 20) |>
  mutate(state = fct_reorder(state, veterans_per_100k))

### |-  panel 3 data ----
# C) Histogram + density with labels for states ≥ 3,000 veterans
big_states <- ww2_veterans_clean |>
  filter(living_wwii_veterans_2025 >= 3000) |>
  mutate(y_pos = 0.00003)

top5_share <- ww2_veterans_clean |>
  arrange(desc(living_wwii_veterans_2025)) |>
  slice_head(n = 5) |>
  summarise(share = sum(living_wwii_veterans_2025, na.rm = TRUE) /
    sum(ww2_veterans_clean$living_wwii_veterans_2025, na.rm = TRUE)) |>
  pull(share)

### |-  panel 4 data ----
# D) Box + jitter
panel_4_data <- ww2_veterans_clean |>
  filter(!is.na(veterans_per_100k)) |>
  group_by(region) |>
  mutate(
    regional_median = median(veterans_per_100k),
    n_states = n()
  ) |>
  ungroup() |>
  mutate(region = fct_reorder(region, regional_median, .desc = TRUE))

region_levels <- levels(panel_4_data$region)
n_map <- panel_4_data |>
  distinct(region, n_states) |>
  deframe()
label_map <- setNames(
  paste0(region_levels, "\n(n=", n_map[region_levels], ")"),
  region_levels
)
```

#### 5. Visualization Parameters

```{r}
#| label: params
#| include: true
#| warning: false

### |-  plot aesthetics ----
# Get base colors with custom palette
colors <- get_theme_colors(
  palette = list(
    below_avg = "#D97548",
    above_avg = "#4A7C8C",
    primary_accent = "#2B4C5E",
    secondary_accent = "#D97548",
    box_northeast = "#4A7C8C",
    box_west = "#8B9D57",
    box_south = "#D97548",
    box_midwest = "#6B8CAE",
    box_territory = "#999999",
    gray_dark = "#3D3D3D",
    gray_medium = "#9A9A9A",
    gray_light = "#E6E6E6"
  )
)   
 
### |-  titles and caption ----
title_text <- "Living WWII Veterans by State (2025)"

subtitle_text <- str_glue(
  "**{comma(summary_stats$total_veterans)}** veterans remain; ",
  "rates per 100k use U.S. Census 2023 population estimates"
)

# Create caption
caption_text <- create_mm_caption(
  mm_year = current_year,
  mm_week = current_week,
  source_text = "(1) National WWII Museum (2025), (2) U.S. Census Bureau (2023 Population Estimates via tidycensus, retrieved 20251111). "
)

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----

# Start with base theme
base_theme <- create_base_theme(colors)

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
  base_theme,
  theme(
    # # Text styling
    plot.title = element_text(
      size = rel(1.6), family = fonts$title, face = "bold",
      color = colors$title, lineheight = 1.1, hjust = 0,
      margin = margin(t = 5, b = 10)
    ),
    plot.subtitle = element_markdown(
      size = rel(0.95), family = fonts$subtitle, face = "italic",
      color = alpha(colors$subtitle, 0.9), lineheight = 1.1,
      margin = margin(t = 0, b = 20)
    ),
    
    # Legend formatting
    legend.position = "plot",
    legend.justification = "top",
    legend.margin = margin(l = 12, b = 5),
    legend.key.size = unit(0.8, "cm"),
    legend.box.margin = margin(b = 10),
    legend.title = element_text(face = "bold"),
    
    # Axis formatting
    axis.ticks.y = element_blank(),
    axis.ticks.x = element_line(color = "gray", linewidth = 0.5),
    axis.title.x = element_text(
      face = "bold", size = rel(0.85),
      margin = margin(t = 10)
    ),
    axis.title.y = element_text(
      face = "bold", size = rel(0.85),
      margin = margin(r = 10)
    ),
    axis.text.x = element_text(
      size = rel(0.85), family = fonts$subtitle,
      color = colors$text
    ),
    axis.text.y = element_text(
      size = rel(0.85), family = fonts$subtitle,
      color = colors$text
    ),
    
    # Grid lines
    panel.grid.minor = element_line(color = "#ecf0f1", linewidth = 0.2),
    panel.grid.major = element_line(color = "#ecf0f1", linewidth = 0.4),
    
    # Margin
    plot.margin = margin(20, 20, 20, 20)
  )
)

# Set theme
theme_set(weekly_theme)
```

#### 6. Plot

```{r}
#| label: plot
#| warning: false

### |-  panel 1 plot ----
# A) Veterans per 100k vs US mean
panel_1 <- ggplot(panel_1_data, aes(diff_from_mean, state, fill = above_avg)) +
  # Geoms
  geom_col(width = 0.68) +
  geom_vline(xintercept = 0, linewidth = 0.8, color = colors$palette$gray_dark) +
  geom_text(aes(label = number(diff_from_mean, accuracy = 0.1)),
    hjust = ifelse(panel_1_data$diff_from_mean > 0, -0.15, 1.15),
    size = 2.8, color = colors$palette$gray_dark
  ) +
  # Scales
  scale_fill_manual(values = c(`TRUE` = colors$palette$above_avg, `FALSE` = colors$palette$below_avg)) +
  scale_x_continuous(
    labels = label_number(accuracy = 1),
    expand = expansion(mult = c(0.1, 0.12))
  ) +
  # Labs
  labs(
    title = "A. Veterans per 100k vs US Mean",
    subtitle = "Higher values indicate greater veteran concentration (†)",
    x = "Difference from US mean", y = NULL
  ) +
  # Theme
  theme(
    plot.margin = margin(5, 5, 5, 5),
    panel.grid.major.y = element_blank(),
    axis.text.y = element_text(margin = margin(r = 4))
  )

### |-  panel 2 plot ----
# B) Top states by per-capita rate
panel_2 <-
  ggplot(panel_2_data, aes(veterans_per_100k, state)) +
  # Geoms
  geom_segment(aes(x = 0, xend = veterans_per_100k, yend = state),
    color = colors$palette$gray_light, linewidth = 1.1
  ) +
  geom_point(size = 3.2, color = colors$palette$primary_accent) +
  geom_vline(
    xintercept = national_mean_per_100k, linetype = "dashed",
    color = colors$palette$gray_dark, linewidth = 0.6
  ) +
  geom_text(aes(label = number(veterans_per_100k, accuracy = 0.1)),
    nudge_x = 1.0, hjust = 0, size = 2.8, color = colors$palette$gray_dark
  ) +
  # Scales
  scale_x_continuous(
    labels = label_number(accuracy = 1),
    expand = expansion(mult = c(0.01, 0.15))
  ) +
  # Labs
  labs(
    title = "B. Top States by Veterans per 100k",
    subtitle = glue(
      "Each exceeds the national average of {number(national_mean_per_100k, accuracy = 0.1)} per 100k (†)"
    ),
    x = "Veterans per 100k residents", y = NULL
  ) +
  # Theme
  theme(
    plot.margin = margin(5, 5, 5, 5),
    panel.grid.major.y = element_blank(),
    axis.text.y = element_text(margin = margin(r = 4))
  )

### |-  panel 3 plot ----
# C) Histogram + density with labels for states ≥ 3,000 veterans
panel_3 <-
  ggplot(ww2_veterans_clean, aes(living_wwii_veterans_2025)) +
  # Geoms
  geom_histogram(aes(y = after_stat(density)),
    binwidth = 250, boundary = 0,
    fill = colors$palette$primary_accent, alpha = 0.72,
    color = "white", linewidth = 0.3
  ) +
  geom_density(color = colors$palette$secondary_accent, linewidth = 1.0) +
  geom_vline(
    xintercept = national_median_count, linetype = "dotted",
    color = colors$palette$gray_dark, linewidth = 0.6
  ) +
  geom_label_repel(
    data = big_states,
    aes(x = living_wwii_veterans_2025, y = y_pos, label = state),
    seed = 2025, size = 3, label.size = 0.25,
    fill = alpha("white", 0.9), color = colors$palette$gray_dark,
    direction = "y", nudge_y = 0.00002, min.segment.length = 0,
    box.padding = 0.25
  ) +
  # Annotate
  annotate("label",
    x = national_median_count, y = 0.0009, label = "State median",
    size = 2.7, label.size = 0, fill = alpha("white", 0.92),
    color = colors$gray_dark
  ) +
  annotate("label",
    x = Inf, y = Inf, hjust = 1.02, vjust = 1.2,
    label = paste0("Top 5 hold ", percent(top5_share, accuracy = 1)),
    size = 3, label.size = 0, fill = alpha("white", 0.9),
    color = colors$gray_dark
  ) +
  # Scales
  scale_x_continuous(
    labels = label_comma(),
    expand = expansion(mult = c(0.02, 0.02))
  ) +
  scale_y_continuous(labels = label_number(accuracy = 0.0001)) +
  # Labs
  labs(
    title = "C. Distribution of Veterans Across States",
    subtitle = glue(
      "Most states have fewer than 1,000; top 5 hold {percent(top5_share, accuracy = 1)} of all veterans"
    ),
    x = "Living WWII veterans (count)", y = "Density"
  ) +
  # Theme
  theme(plot.margin = margin(5, 5, 5, 5))

### |-  panel 4 plot ----
# D) Box + jitter
panel_4 <-
  ggplot(panel_4_data, aes(region, veterans_per_100k, color = region)) +
  # Geoms
  geom_boxplot(width = 0.58, alpha = 0.28, linewidth = 0.8, outlier.shape = NA) +
  geom_jitter(width = 0.15, size = 2.2, alpha = 0.75) +
  geom_hline(
    yintercept = national_mean_per_100k, linetype = "dashed",
    color = colors$palette$gray_dark, linewidth = 0.6
  ) +
  geom_label_repel(
    data = filter(panel_4_data, state == "New Hampshire"),
    aes(label = state),
    seed = 2025, size = 3, label.size = 0.25,
    fill = alpha("white", 0.9), color = colors$palette$gray_dark,
    nudge_y = 2, nudge_x = 0.2
  ) +
  # Scales
  scale_color_manual(values = c(
    "Northeast" = colors$palette$box_northeast,
    "West" = colors$palette$box_west,
    "South" = colors$palette$box_south,
    "Midwest" = colors$palette$box_midwest,
    "Territory" = colors$palette$box_territory
  ), guide = "none") +
  scale_x_discrete(limits = region_levels, labels = label_map[region_levels]) +
  coord_cartesian(ylim = c(0, NA)) +
  # Labs
  labs(
    title = "D. Regional Distribution of Veterans per 100k",
    subtitle = "Northeast has highest rates; South has many states but lower median (†)",
    x = "Region", y = "Veterans per 100k residents"
  ) +
  # Theme
  theme(
    plot.margin = margin(5, 5, 5, 5),
    axis.text.x = element_text(lineheight = 1.1)
    )

### |-  combined plot ----
combined_plots <- (panel_1 | panel_2) / (panel_3 | panel_4) +
  plot_layout(heights = c(1.2, 0.8), widths = c(1, 1))

combined_plots <- combined_plots +
  plot_annotation(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text,
    # tag_levels = "A",
    theme = theme(
      plot.title = element_text(
        size = rel(2.4),
        family = fonts$title,
        face = "bold",
        color = colors$title,
        lineheight = 1.1,
        margin = margin(t = 5, b = 5)
      ),
      plot.subtitle = element_markdown(
        size = rel(1.2),
        family = fonts$subtitle,
        color = alpha(colors$subtitle, 0.9),
        lineheight = 1.2,
        margin = margin(t = 5, b = 15)
      ),
      plot.caption = element_markdown(
        size = rel(0.65),
        family = fonts$caption,
        color = colors$caption,
        hjust = 0,
        margin = margin(t = 10)
      )
    )
  )
```

#### 7. Save

```{r}
#| label: save
#| warning: false

### |-  plot image ----  
save_plot_patchwork(
  plot = combined_plots, 
  type = "makeovermonday", 
  year = current_year,
  week = current_week,
  width = 14, 
  height = 10
  )
```

#### 8. Session Info

::: {.callout-tip collapse="true"}
##### Expand for Session Info

```{r, echo = FALSE}
#| eval: true
#| warning: false

sessionInfo()
```
:::

#### 9. GitHub Repository

::: {.callout-tip collapse="true"}
##### Expand for GitHub Repo

The complete code for this analysis is available in `r create_link(project_file, repo_file)`.

For the full repository, `r create_link("click here", repo_main)`.
:::

#### 10. References

::: {.callout-tip collapse="true"}
##### Expand for References

1.  Data:

-   Makeover Monday `r current_year` Week `r current_week`: `r create_link("Living WWII Veterans by State 2025", data_main)`

2.  Article

-   `r create_link("Living WWII Veterans by State 2025", data_secondary)`

3.  Citation:
    -   National WWII Museum. (2025). *WWII Veteran Statistics*. Retrieved from https://www.nationalww2museum.org/war/wwii-veteran-statistics
    -   US Census Bureau. (2023). *Annual Estimates of the Resident Population*. Retrieved via tidycensus package.
:::

#### 11. Custom Functions Documentation

::: {.callout-note collapse="true"}
##### 📦 Custom Helper Functions

This analysis uses custom functions from my personal module library for efficiency and consistency across projects.

**Functions Used:**

-   **`fonts.R`**: `setup_fonts()`, `get_font_families()` - Font management with showtext
-   **`social_icons.R`**: `create_social_caption()` - Generates formatted social media captions
-   **`image_utils.R`**: `save_plot()` - Consistent plot saving with naming conventions
-   **`base_theme.R`**: `create_base_theme()`, `extend_weekly_theme()`, `get_theme_colors()` - Custom ggplot2 themes

**Why custom functions?**\
These utilities standardize theming, fonts, and output across all my data visualizations. The core analysis (data tidying and visualization logic) uses only standard tidyverse packages.

**Source Code:**\
View all custom functions → [GitHub: R/utils](https://github.com/poncest/personal-website/tree/master/R)
:::

© 2024 Steven Ponce

Source Issues