Skip to contents

This function lets the user group, count, calculate percentages and cumulatives. It also plots results if needed. Tidyverse friendly.

Usage

freqs(
  df,
  ...,
  wt = NULL,
  rel = FALSE,
  results = TRUE,
  variable_name = NA,
  plot = FALSE,
  rm.na = FALSE,
  title = NA,
  subtitle = NA,
  top = 20,
  abc = FALSE,
  save = FALSE,
  subdir = NA
)

Arguments

df

Data.frame

...

Variables. Variables you wish to process. Order matters. If no variables are passed, the whole data.frame will be considered

wt

Variable, numeric. Weights.

rel

Boolean. Relative percentages (or absolute)?

results

Boolean. Return results in a dataframe?

variable_name

Character. Overwrite the main variable's name

plot

Boolean. Do you want to see a plot? Three variables tops.

rm.na

Boolean. Remove NA values in the plot? (not filtered for numerical output; use na.omit() or filter() if needed)

title

Character. Overwrite plot's title with.

subtitle

Character. Overwrite plot's subtitle with.

top

Integer. Filter and plot the most n frequent for categorical values. Set to NA to return all values

abc

Boolean. Do you wish to sort by alphabetical order?

save

Boolean. Save the output plot in our working directory

subdir

Character. Into which subdirectory do you wish to save the plot to?

Value

Plot when plot=TRUE and data.frame with grouped frequency results when plot=FALSE.

See also

Other Frequency: freqs_df(), freqs_list(), freqs_plot()

Other Exploratory: corr_cross(), corr_var(), crosstab(), df_str(), distr(), freqs_df(), freqs_list(), freqs_plot(), lasso_vars(), missingness(), plot_cats(), plot_df(), plot_nums(), tree_var()

Other Visualization: distr(), freqs_df(), freqs_list(), freqs_plot(), noPlot(), plot_chord(), plot_survey(), plot_timeline(), tree_var()

Examples

Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

# How many survived?
dft %>% freqs(Survived)
#> # A tibble: 2 × 5
#>   Survived     n     p order  pcum
#>   <lgl>    <int> <dbl> <int> <dbl>
#> 1 FALSE      549  61.6     1  61.6
#> 2 TRUE       342  38.4     2 100  

# How many survived per Class?
dft %>% freqs(Pclass, Survived, abc = TRUE)
#> # A tibble: 6 × 6
#>   Pclass Survived     n     p order   pcum
#>   <fct>  <lgl>    <int> <dbl> <int>  <dbl>
#> 1 1      FALSE       80  8.98     1   8.98
#> 2 1      TRUE       136 15.3      2  24.2 
#> 3 2      FALSE       97 10.9      3  35.1 
#> 4 2      TRUE        87  9.76     4  44.9 
#> 5 3      FALSE      372 41.8      5  86.6 
#> 6 3      TRUE       119 13.4      6 100   

# How many survived per Class with relative percentages?
dft %>% freqs(Pclass, Survived, abc = TRUE, rel = TRUE)
#> # A tibble: 6 × 6
#> # Groups:   Pclass [3]
#>   Pclass Survived     n     p order  pcum
#>   <fct>  <lgl>    <int> <dbl> <int> <dbl>
#> 1 1      FALSE       80  37.0     1  37.0
#> 2 1      TRUE       136  63.0     2 100  
#> 3 2      FALSE       97  52.7     1  52.7
#> 4 2      TRUE        87  47.3     2 100  
#> 5 3      FALSE      372  75.8     1  75.8
#> 6 3      TRUE       119  24.2     2 100  

# Using a weighted feature
dft %>% freqs(Pclass, Survived, wt = Fare / 100)
#> # A tibble: 6 × 6
#>   Pclass Survived     n     p order  pcum
#>   <fct>  <lgl>    <dbl> <dbl> <int> <dbl>
#> 1 1      TRUE     130.  45.3      1  45.3
#> 2 1      FALSE     51.7 18.0      2  63.4
#> 3 3      FALSE     50.9 17.7      3  81.1
#> 4 2      TRUE      19.2  6.69     4  87.8
#> 5 2      FALSE     18.8  6.56     5  94.3
#> 6 3      TRUE      16.3  5.68     6 100  

# Let's check the results with plots:

#' # How many survived and see plot?
dft %>% freqs(Survived, plot = TRUE)


# How many survived per class?
dft %>% freqs(Survived, Pclass, plot = TRUE)


# Per class, how many survived?
dft %>% freqs(Pclass, Survived, plot = TRUE)


# Per sex and class, how many survived?
dft %>% freqs(Sex, Pclass, Survived, plot = TRUE)


# Frequency of tickets + Survived
dft %>% freqs(Survived, Ticket, plot = TRUE)
#> Slicing the top 20 (out of 681) values; use 'top' parameter to overrule.


# Frequency of tickets: top 10 only and order them alphabetically
dft %>% freqs(Ticket, plot = TRUE, top = 10, abc = TRUE)
#> Slicing the top 10 (out of 681) values; use 'top' parameter to overrule.