Compare the distribution of a target variable vs another variable. This function automatically splits into quantiles for numerical variables. Custom and tidyverse friendly.


  type = 1,
  ref = TRUE,
  note = NA,
  top = 10,
  breaks = 10,
  na.rm = FALSE,
  force = "none",
  trim = 0,
  clean = FALSE,
  abc = FALSE,
  custom_colours = FALSE,
  plot = TRUE,
  chords = FALSE,
  save = FALSE,
  subdir = NA





Variables. Main (target variable) and secondary (values variable) to group by (if needed).


Integer. 1 for both plots, 2 for counter plot only, 3 for percentages plot only.


Boolean. Show a reference line if levels = 2? Quite useful when data is unbalanced (not 50/50) because a reference line is drawn.


Character. Caption for the plot.


Integer. Filter and plot the most n frequent for categorical values.


Integer. Number of splits for numerical values.


Boolean. Ignore NAs if needed.


Character. Force class on the values data. Choose between 'none', 'character', 'numeric', 'date'


Integer. Trim labels until the nth character for categorical values (applies for both, target and values)


Boolean. Use cleanText() for categorical values (applies for both, target and values)


Boolean. Do you wish to sort by alphabetical order?


Boolean. Use custom colours function?


Boolean. Return a plot? Otherwise, a table with results


Boolean. Use a chords plot?


Boolean. Save the output plot in our working directory


Character. Into which subdirectory do you wish to save the plot to?


Plot when plot=TRUE with two plots in one: counter distribution grouped by cuts, and proportions distribution grouped by same cuts. data.frame when plot=FALSE with counting, percentages, and cumulative percentages results. When type argument is used, single plots will be returned.

Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

# Relation for categorical/categorical values
distr(dft, Survived, Sex)
#> Warning: All aesthetics have length 1, but the data has 4 rows.
#>  Please consider using `annotate()` or provide this layer with data containing
#>   a single row.

# Relation for categorical/numeric values
dft %>%
  distr(Survived, Fare, plot = FALSE) %>%
#> # A tibble: 10 × 5
#>    targets value            n     p  pcum
#>    <lgl>   <ord>        <int> <dbl> <dbl>
#>  1 FALSE   (7.9,8.1]       87  82.1  82.1
#>  2 FALSE   [0,7.5]         79  85.9  85.9
#>  3 TRUE    (78,5.1e+02]    66  75.9  75.9
#>  4 FALSE   (7.5,7.9]       61  70.1  70.1
#>  5 FALSE   (8.1,10]        60  76.9  76.9
#>  6 FALSE   (27,40]         57  62.6  62.6
#>  7 FALSE   (14,22]         51  58.0  58.0
#>  8 FALSE   (10,14]         48  57.1  57.1
#>  9 TRUE    (40,78]         47  52.8  52.8
#> 10 TRUE    (22,27]         46  51.7  51.7
# Sort values
dft %>% distr(Survived, Fare, abc = TRUE)
#> Warning: All aesthetics have length 1, but the data has 20 rows.
#>  Please consider using `annotate()` or provide this layer with data containing
#>   a single row.

# Less splits/breaks
dft %>% distr(Survived, Fare, abc = TRUE, breaks = 5)
#> Warning: All aesthetics have length 1, but the data has 10 rows.
#>  Please consider using `annotate()` or provide this layer with data containing
#>   a single row.

# Distribution of numerical only
dft[dft$Fare < 20, ] %>% distr(Fare)

# Distribution of numerical/numerical
dft %>% distr(Fare, Age)

# Select only one of the two default plots of distr()
dft %>% distr(Survived, Age, type = 2)

dft %>% distr(Survived, Age, type = 3)
#> Warning: All aesthetics have length 1, but the data has 20 rows.
#>  Please consider using `annotate()` or provide this layer with data containing
#>   a single row.