• Steven Ponce
  • About
  • Data Visualizations
  • Projects
  • Resume
  • Email

On this page

  • Steps to Create this Graphic
    • 1. Load Packages & Setup
    • 2. Read in the Data
    • 3. Examine the Data
    • 4. Tidy Data
    • 5. Visualization Parameters
    • 6. Plot
    • 7. Save
    • 8. Session Info
    • 9. GitHub Repository
    • 10. References

Traffic Volume Distribution Analysis Across California Counties

  • Show All Code
  • Hide All Code

  • View Source

Empirical Cumulative Distribution Function (ECDF) of Annual Average Daily Traffic

30DayChartChallenge
Data Visualization
R Programming
2025
Analyzing traffic volume distributions across California counties using Empirical Cumulative Distribution Functions (ECDF). This visualization reveals striking differences between urban and rural counties, providing insights into traffic patterns throughout the state highway network.
Author

Steven Ponce

Published

April 12, 2025

Figure 1: A line graph showing Empirical Cumulative Distribution Functions (ECDF) of Annual Average Daily Traffic across ten California counties. The x-axis shows traffic volume from 100 to over 100,000 vehicles on a logarithmic scale. Horizontal dashed lines mark the 25th, 50th, and 75th percentiles. Urban counties like Los Angeles (LA) and Orange (ORA) show curves shifted right, indicating higher traffic volumes. In contrast, rural counties like Tulare (TUL) show curves shifted left, indicating lower traffic volumes.

Steps to Create this Graphic

1. Load Packages & Setup

Show code
## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
pacman::p_load(
  tidyverse,      # Easily Install and Load the 'Tidyverse'
  ggtext,         # Improved Text Rendering Support for 'ggplot2'
  showtext,       # Using Fonts More Easily in R Graphs
  janitor,        # Simple Tools for Examining and Cleaning Dirty Data
  skimr,          # Compact and Flexible Summaries of Data
  scales,         # Scale Functions for Visualization
  paletteer,      # Comprehensive Collection of Color Palettes
  lubridate,      # Make Dealing with Dates a Little Easier
  camcorder       # Record Your Plot History
  )
})

### |- figure size ----
gg_record(
    dir    = here::here("temp_plots"),
    device = "png",
    width  = 8,
    height = 8,
    units  = "in",
    dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))

2. Read in the Data

Show code
traffic_volumnes_raw <- read_csv(here::here(
  'data/30DayChartChallenge/2025/Traffic_Volumes_AADT.csv')
  ) |>
    clean_names()

3. Examine the Data

Show code
glimpse(traffic_volumnes_raw)
skim(traffic_volumnes_raw)

4. Tidy Data

Show code
### |- Tidy ----
traffic_volumes_tidy <- traffic_volumnes_raw |>        
  select(objectid, district, route, county, location_description, 
         back_aadt, ahead_aadt) |>
  pivot_longer(
    cols = c(back_aadt, ahead_aadt),
    names_to = "direction",
    values_to = "aadt"
  ) |>
  mutate(
    direction = case_when(
      direction == "back_aadt" ~ "Back",
      direction == "ahead_aadt" ~ "Ahead",
      TRUE ~ direction
    )
  ) |>
  filter(!is.na(aadt))

# County-level summaries
county_traffic <- traffic_volumes_tidy |>
  group_by(county) |>
  summarize(
    count = n(),
    median_aadt = median(aadt, na.rm = TRUE)
  ) |>
  arrange(desc(count)) |>
  slice_head(n = 10)  # Top 10 counties 

# ECDF for top counties
top_counties <- county_traffic$county

# Data plot
ecdf_data <- traffic_volumes_tidy |>
  filter(county %in% top_counties)

5. Visualization Parameters

Show code
### |- plot aesthetics ---- 
colors <- get_theme_colors(
  palette = paletteer::paletteer_d(
    "ggprism::prism_dark2"
    )
  )

### |-  titles and caption ----
# text
title_text    <- str_wrap("Traffic Volume Distribution Analysis Across California Counties",
                          width = 50) 
subtitle_text <- str_wrap("Empirical Cumulative Distribution Function (ECDF) of Annual Average Daily Traffic", 
                          width = 90)

# Create caption
caption_text <- create_dcc_caption(
  dcc_year = 2025,
  dcc_day = 12,
  source_text =  "California AADT via data.gov" 
)

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----

# Start with base theme
base_theme <- create_base_theme(colors)

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
  base_theme,
  theme(
    # Text styling 
    plot.title = element_text(face = "bold", family = fonts$title, size = rel(1.14), margin = margin(b = 10)),
    plot.subtitle = element_text(family = fonts$subtitle, color = colors$text, size = rel(0.78), margin = margin(b = 20)),
    
    # Axis elements
    axis.title = element_text(color = colors$text, size = rel(0.8)),
    axis.text = element_text(color = colors$text, size = rel(0.7)),
    axis.text.y = element_text(color = colors$text, size = rel(0.68)),
    
    axis.line.x = element_line(color = "gray50", linewidth = .2),

    # Grid elements
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
 
    # Legend elements
    legend.position = "right",
    legend.title = element_text(family = fonts$text, size = rel(0.8)),
    legend.text = element_text(family = fonts$text, size = rel(0.7)),

    # Plot margins 
    plot.margin = margin(t = 10, r = 20, b = 10, l = 20),
  )
)

# Set theme
theme_set(weekly_theme)

6. Plot

Show code
### |-  Plot ----
p <- ecdf_data |>
  ggplot(aes(x = aadt, color = county)) +
  # Geoms
  stat_ecdf(geom = "step", linewidth = 1) +
  geom_hline(
    yintercept = c(0.25, 0.5, 0.75), linetype = "dashed", 
    color = "gray50", alpha = 0.7
    ) +
  # Annotate
  annotate(
    "text", x = min(traffic_volumes_tidy$aadt, na.rm = TRUE), 
    y = c(0.26, 0.51, 0.76), 
    label = c("25th", "50th", "75th"), 
    hjust = 0, size = 3, color = "gray30"
    ) +
  # Scales
  scale_x_log10(labels = scales::comma) +
  scale_color_manual(values = colors$palette) +
  # Labs
  labs(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text,
    x = "Annual Average Daily Traffic (log scale)",
    y = "Cumulative Probability",
    color = "County",
  ) +
  # Theme
  theme(
    plot.title = element_text(
      size = rel(1.9),
      family = fonts$title,
      face = "bold",
      color = colors$title,
      margin = margin(t = 5, b = 5)
    ),
    plot.subtitle = element_text(
      size = rel(.85),
      family = fonts$subtitle,
      color = colors$subtitle,
      lineheight = 1.2,
      margin = margin(t = 5, b = 10)
    ),
    plot.caption = element_markdown(
      size = rel(0.6),
      family = fonts$caption,
      color = colors$caption,
      lineheight = 0.65,
      hjust = 0.5,
      halign = 0.5,
      margin = margin(t = 5, b = 15)
    ),
  )

7. Save

Show code
### |-  plot image ----  

save_plot(
  p, 
  type = "30daychartchallenge", 
  year = 2025, 
  day = 12, 
  width = 8, 
  height = 8
  )

8. Session Info

TipExpand for Session Info
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] here_1.0.1      camcorder_0.1.0 paletteer_1.6.0 scales_1.3.0   
 [5] skimr_2.1.5     janitor_2.2.0   showtext_0.9-7  showtextdb_3.0 
 [9] sysfonts_0.8.9  ggtext_0.1.2    lubridate_1.9.3 forcats_1.0.0  
[13] stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.5    
[17] tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.6      xfun_0.49         htmlwidgets_1.6.4 tzdb_0.4.0       
 [5] vctrs_0.6.5       tools_4.4.0       generics_0.1.3    curl_6.0.0       
 [9] parallel_4.4.0    gifski_1.32.0-1   fansi_1.0.6       pacman_0.5.1     
[13] pkgconfig_2.0.3   lifecycle_1.0.4   farver_2.1.2      compiler_4.4.0   
[17] textshaping_0.4.0 munsell_0.5.1     repr_1.1.7        codetools_0.2-20 
[21] snakecase_0.11.1  htmltools_0.5.8.1 yaml_2.3.10       crayon_1.5.3     
[25] pillar_1.9.0      magick_2.8.5      commonmark_1.9.2  tidyselect_1.2.1 
[29] digest_0.6.37     stringi_1.8.4     rematch2_2.1.2    labeling_0.4.3   
[33] rsvg_2.6.1        rprojroot_2.0.4   fastmap_1.2.0     grid_4.4.0       
[37] colorspace_2.1-1  cli_3.6.3         magrittr_2.0.3    base64enc_0.1-3  
[41] utf8_1.2.4        withr_3.0.2       bit64_4.5.2       timechange_0.3.0 
[45] rmarkdown_2.29    bit_4.5.0         ragg_1.3.3        hms_1.1.3        
[49] evaluate_1.0.1    knitr_1.49        markdown_1.13     rlang_1.1.4      
[53] gridtext_0.1.5    Rcpp_1.0.13-1     glue_1.8.0        xml2_1.3.6       
[57] renv_1.0.3        vroom_1.6.5       svglite_2.1.3     rstudioapi_0.17.1
[61] jsonlite_1.8.9    R6_2.5.1          prismatic_1.1.2   systemfonts_1.1.0

9. GitHub Repository

TipExpand for GitHub Repo

The complete code for this analysis is available in 30dcc_2025_12.qmd.

For the full repository, click here.

10. References

TipExpand for References
  1. Data Sources:
    • California Annual Average Daily Traffic Volumes, Metadata Updated: November 27, 2024 data.gov
Back to top
Source Code
---
title: "Traffic Volume Distribution Analysis Across California Counties"
subtitle: "Empirical Cumulative Distribution Function (ECDF) of Annual Average Daily Traffic"
description: "Analyzing traffic volume distributions across California counties using Empirical Cumulative Distribution Functions (ECDF). This visualization reveals striking differences between urban and rural counties, providing insights into traffic patterns throughout the state highway network."
author: "Steven Ponce"
date: "2025-04-12" 
categories: ["30DayChartChallenge", "Data Visualization", "R Programming", "2025"]
tags: [
"ECDF", "ggplot2", "tidyverse", "distribution", "traffic analysis",
"California", "log scale", "cumulative distribution", "highway data", "data.gov"
  ]
image: "thumbnails/30dcc_2025_12.png"
format:
  html:
    toc: true
    toc-depth: 5
    code-link: true
    code-fold: true
    code-tools: true
    code-summary: "Show code"
    self-contained: true
    theme: 
      light: [flatly, assets/styling/custom_styles.scss]
      dark: [darkly, assets/styling/custom_styles_dark.scss]
editor_options: 
  chunk_output_type: inline
execute: 
  freeze: true                                                  
  cache: true                                                   
  error: false
  message: false
  warning: false
  eval: true
# filters:
#   - social-share
# share:
#   permalink: "https://stevenponce.netlify.app/data_visualizations/30DayChartChallenge/2025/30dcc_2025_12.html"
#   description: "Day 12 of #30DayChartChallenge: Visualizing traffic volume distributions across California counties using ECDF plots, revealing the stark contrast between urban and rural highway traffic patterns."
#   twitter: true
#   linkedin: true
#   email: true
#   facebook: false
#   reddit: false
#   stumble: false
#   tumblr: false
#   mastodon: true
#   bsky: true
---

![A line graph showing Empirical Cumulative Distribution Functions (ECDF) of Annual Average Daily Traffic across ten California counties. The x-axis shows traffic volume from 100 to over 100,000 vehicles on a logarithmic scale. Horizontal dashed lines mark the 25th, 50th, and 75th percentiles. Urban counties like Los Angeles (LA) and Orange (ORA) show curves shifted right, indicating higher traffic volumes. In contrast, rural counties like Tulare (TUL) show curves shifted left, indicating lower traffic volumes.](30dcc_2025_12.png){#fig-1}

### <mark> **Steps to Create this Graphic** </mark>

#### 1. Load Packages & Setup

```{r}
#| label: load
#| warning: false
#| message: false      
#| results: "hide"     

## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
pacman::p_load(
  tidyverse,      # Easily Install and Load the 'Tidyverse'
  ggtext,         # Improved Text Rendering Support for 'ggplot2'
  showtext,       # Using Fonts More Easily in R Graphs
  janitor,        # Simple Tools for Examining and Cleaning Dirty Data
  skimr,          # Compact and Flexible Summaries of Data
  scales,         # Scale Functions for Visualization
  paletteer,      # Comprehensive Collection of Color Palettes
  lubridate,      # Make Dealing with Dates a Little Easier
  camcorder       # Record Your Plot History
  )
})

### |- figure size ----
gg_record(
    dir    = here::here("temp_plots"),
    device = "png",
    width  = 8,
    height = 8,
    units  = "in",
    dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))
```

#### 2. Read in the Data

```{r}
#| label: read
#| include: true
#| eval: true
#| warning: false

traffic_volumnes_raw <- read_csv(here::here(
  'data/30DayChartChallenge/2025/Traffic_Volumes_AADT.csv')
  ) |>
    clean_names()
```

#### 3. Examine the Data

```{r}
#| label: examine
#| include: true
#| eval: true
#| results: 'hide'
#| warning: false

glimpse(traffic_volumnes_raw)
skim(traffic_volumnes_raw)
```

#### 4. Tidy Data

```{r}
#| label: tidy
#| warning: false

### |- Tidy ----
traffic_volumes_tidy <- traffic_volumnes_raw |>        
  select(objectid, district, route, county, location_description, 
         back_aadt, ahead_aadt) |>
  pivot_longer(
    cols = c(back_aadt, ahead_aadt),
    names_to = "direction",
    values_to = "aadt"
  ) |>
  mutate(
    direction = case_when(
      direction == "back_aadt" ~ "Back",
      direction == "ahead_aadt" ~ "Ahead",
      TRUE ~ direction
    )
  ) |>
  filter(!is.na(aadt))

# County-level summaries
county_traffic <- traffic_volumes_tidy |>
  group_by(county) |>
  summarize(
    count = n(),
    median_aadt = median(aadt, na.rm = TRUE)
  ) |>
  arrange(desc(count)) |>
  slice_head(n = 10)  # Top 10 counties 

# ECDF for top counties
top_counties <- county_traffic$county

# Data plot
ecdf_data <- traffic_volumes_tidy |>
  filter(county %in% top_counties)
```

#### 5. Visualization Parameters

```{r}
#| label: params
#| include: true
#| warning: false

### |- plot aesthetics ---- 
colors <- get_theme_colors(
  palette = paletteer::paletteer_d(
    "ggprism::prism_dark2"
    )
  )

### |-  titles and caption ----
# text
title_text    <- str_wrap("Traffic Volume Distribution Analysis Across California Counties",
                          width = 50) 
subtitle_text <- str_wrap("Empirical Cumulative Distribution Function (ECDF) of Annual Average Daily Traffic", 
                          width = 90)

# Create caption
caption_text <- create_dcc_caption(
  dcc_year = 2025,
  dcc_day = 12,
  source_text =  "California AADT via data.gov" 
)

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----

# Start with base theme
base_theme <- create_base_theme(colors)

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
  base_theme,
  theme(
    # Text styling 
    plot.title = element_text(face = "bold", family = fonts$title, size = rel(1.14), margin = margin(b = 10)),
    plot.subtitle = element_text(family = fonts$subtitle, color = colors$text, size = rel(0.78), margin = margin(b = 20)),
    
    # Axis elements
    axis.title = element_text(color = colors$text, size = rel(0.8)),
    axis.text = element_text(color = colors$text, size = rel(0.7)),
    axis.text.y = element_text(color = colors$text, size = rel(0.68)),
    
    axis.line.x = element_line(color = "gray50", linewidth = .2),

    # Grid elements
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
 
    # Legend elements
    legend.position = "right",
    legend.title = element_text(family = fonts$text, size = rel(0.8)),
    legend.text = element_text(family = fonts$text, size = rel(0.7)),

    # Plot margins 
    plot.margin = margin(t = 10, r = 20, b = 10, l = 20),
  )
)

# Set theme
theme_set(weekly_theme)
```

#### 6. Plot

```{r}
#| label: plot
#| warning: false

### |-  Plot ----
p <- ecdf_data |>
  ggplot(aes(x = aadt, color = county)) +
  # Geoms
  stat_ecdf(geom = "step", linewidth = 1) +
  geom_hline(
    yintercept = c(0.25, 0.5, 0.75), linetype = "dashed", 
    color = "gray50", alpha = 0.7
    ) +
  # Annotate
  annotate(
    "text", x = min(traffic_volumes_tidy$aadt, na.rm = TRUE), 
    y = c(0.26, 0.51, 0.76), 
    label = c("25th", "50th", "75th"), 
    hjust = 0, size = 3, color = "gray30"
    ) +
  # Scales
  scale_x_log10(labels = scales::comma) +
  scale_color_manual(values = colors$palette) +
  # Labs
  labs(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text,
    x = "Annual Average Daily Traffic (log scale)",
    y = "Cumulative Probability",
    color = "County",
  ) +
  # Theme
  theme(
    plot.title = element_text(
      size = rel(1.9),
      family = fonts$title,
      face = "bold",
      color = colors$title,
      margin = margin(t = 5, b = 5)
    ),
    plot.subtitle = element_text(
      size = rel(.85),
      family = fonts$subtitle,
      color = colors$subtitle,
      lineheight = 1.2,
      margin = margin(t = 5, b = 10)
    ),
    plot.caption = element_markdown(
      size = rel(0.6),
      family = fonts$caption,
      color = colors$caption,
      lineheight = 0.65,
      hjust = 0.5,
      halign = 0.5,
      margin = margin(t = 5, b = 15)
    ),
  )
```

#### 7. Save

```{r}
#| label: save
#| warning: false

### |-  plot image ----  

save_plot(
  p, 
  type = "30daychartchallenge", 
  year = 2025, 
  day = 12, 
  width = 8, 
  height = 8
  )
```

#### 8. Session Info

::: {.callout-tip collapse="true"}
##### Expand for Session Info

```{r, echo = FALSE}
#| eval: true
#| warning: false

sessionInfo()
```
:::

#### 9. GitHub Repository

::: {.callout-tip collapse="true"}
##### Expand for GitHub Repo

The complete code for this analysis is available in [`30dcc_2025_12.qmd`](https://github.com/poncest/personal-website/blob/master/data_visualizations/TidyTuesday/2025/30dcc_2025_12.qmd).

For the full repository, [click here](https://github.com/poncest/personal-website/).
:::


#### 10. References
::: {.callout-tip collapse="true"}
##### Expand for References

1. Data Sources:
   - California Annual Average Daily Traffic Volumes, Metadata Updated: November 27, 2024 [data.gov](https://catalog.data.gov/dataset/traffic-volumes-aadt-ee8d6)
  
:::

© 2024 Steven Ponce

Source Issues