dplyr

Search Shortcut cmd + k | ctrl + k

Documentation

dplyr

Downloads 494this week

GitHub stars 5

Extension repository on GitHub

Extension descriptor (YAML)

R dplyr pipeline syntax support for DuckDB - transpiles dplyr verbs to SQL

Maintainer(s): mrchypark

Installing and Loading

INSTALL dplyr FROM community;
LOAD dplyr;

Example

-- Transform R dplyr pipeline syntax directly in DuckDB
-- First, create a sample table
CREATE TABLE iris AS SELECT * FROM (VALUES
  (5.1, 3.5, 1.4, 0.2, 'setosa'),
  (4.9, 3.0, 1.4, 0.2, 'setosa'),
  (7.0, 3.2, 4.7, 1.4, 'versicolor'),
  (6.4, 3.2, 4.5, 1.5, 'versicolor'),
  (6.3, 3.3, 6.0, 2.5, 'virginica'),
  (5.8, 2.7, 5.1, 1.9, 'virginica')
) AS t(sepal_length, sepal_width, petal_length, petal_width, species);

-- Run a pipeline directly as a statement (parser extension)
iris %>%
  filter(sepal_length > 5) %>%
  select(species, sepal_length, petal_length) %>%
  arrange(desc(sepal_length));
┌────────────┬──────────────┬──────────────┐
│  species   │ sepal_length │ petal_length │
├────────────┼──────────────┼──────────────┤
│ versicolor │          7.0 │          4.7 │
│ versicolor │          6.4 │          4.5 │
│ virginica  │          6.3 │          6.0 │
│ virginica  │          5.8 │          5.1 │
│ setosa     │          5.1 │          1.4 │
└────────────┴──────────────┴──────────────┘

-- Group and summarize with dplyr syntax
iris %>%
  group_by(species) %>%
  summarise(avg_sepal = mean(sepal_length), count = n());
  ┌───────────┬───────┐
  │ avg_sepal │ count │
  │  double   │ int64 │
  ├───────────┼───────┤
  │      6.05 │     2 │
  │       5.0 │     2 │
  │       6.7 │     2 │
  └───────────┴───────┘

-- Create derived columns with mutate
iris %>%
  mutate(sepal_ratio = sepal_length / sepal_width) %>%
  select(species, sepal_ratio) %>%
  arrange(sepal_ratio);

-- Embed a pipeline inside normal SQL using (| ... |)
SELECT species, COUNT(*) AS n
FROM (| iris %>% filter(sepal_length > 5) %>% select(species) |)
GROUP BY species
ORDER BY n DESC;

dplyr is a DuckDB extension that brings R's popular dplyr package syntax to DuckDB. It transpiles dplyr pipeline syntax (using the %>% pipe operator) directly to SQL, allowing R users to write familiar data manipulation code in DuckDB.

Supported Functions

Core Verbs

Helper Functions

Aggregation: mean, sum, min, max, n, count, median, mode
Window: row_number, rank, lead, lag, ntile
Math: abs, sqrt, round, floor, log, exp
String: tolower, toupper, substr, trimws
Logic: ifelse, is.na, coalesce

Key Features:

Native R dplyr syntax support with %>% pipe operator
Transpiles to optimized SQL at runtime
Works with any DuckDB table (pipelines must start with a table name)
Error messages include helpful diagnostics
Embed pipelines in normal SQL using (| ... |) (e.g., SELECT * FROM (| iris %>% filter(x > 0) |);)

Usage:

-- Using the dplyr() table function
SELECT * FROM dplyr('table_name %>% filter(x > 5) %>% select(a, b)');

-- Or write a pipeline directly as a statement (parser extension)
table_name %>% filter(x > 5) %>% select(a, b);

For more information, visit the GitHub repository.

Added Functions

function_name	function_type	description	comment	examples
dplyr	table	NULL	NULL