Skip to contents

This function lets the user analize data by visualizing the frequency of each value of each column from a whole data frame.

Usage

freqs_df(
  df,
  max = 0.9,
  min = 0,
  novar = TRUE,
  plot = FALSE,
  top = 30,
  quiet = FALSE,
  save = FALSE,
  subdir = NA
)

Arguments

df

Data.frame

max

Numeric. Top variance threshold. Range: (0-1]. These variables will be excluded

min

Numeric. Minimum variance threshold. Range: [0-1). These values will be grouped into a high frequency (HF) value

novar

Boolean. Remove no variance columns?

plot

Boolean. Do you want to see a plot? Three variables tops

top

Integer. Plot most relevant (less categories) variables

quiet

Boolean. Keep quiet? (or show variables exclusions)

save

Boolean. Save the output plot in our working directory

subdir

Character. Into which subdirectory do you wish to save the plot to?

Value

Plot when plot=TRUE and data.frame with grouped frequency results when plot=FALSE.

See also

Other Frequency: freqs(), freqs_list(), freqs_plot()

Other Exploratory: corr_cross(), corr_var(), crosstab(), df_str(), distr(), freqs(), freqs_list(), freqs_plot(), lasso_vars(), missingness(), plot_cats(), plot_df(), plot_nums(), tree_var()

Other Visualization: distr(), freqs(), freqs_list(), freqs_plot(), noPlot(), plot_chord(), plot_survey(), plot_timeline(), tree_var()

Examples

data(dft) # Titanic dataset
freqs_df(dft)
#> 1 variables with more than 0.9 variance exluded: 'PassengerId'
#> # A tibble: 1,191 × 5
#> # Groups:   variable [10]
#>    variable value        n     p  pcum
#>    <chr>    <chr>    <int> <dbl> <dbl>
#>  1 Cabin    ""         687  77.1  77.1
#>  2 Parch    "0"        678  76.1  76.1
#>  3 Embarked "S"        644  72.3  72.3
#>  4 SibSp    "0"        608  68.2  68.2
#>  5 Sex      "male"     577  64.8  64.8
#>  6 Survived "FALSE"    549  61.6  61.6
#>  7 Pclass   "3"        491  55.1  55.1
#>  8 Survived "TRUE"     342  38.4 100  
#>  9 Sex      "female"   314  35.2 100  
#> 10 Pclass   "1"        216  24.2  79.3
#> # ℹ 1,181 more rows
freqs_df(dft, plot = TRUE)
#> 1 variables with more than 0.9 variance exluded: 'PassengerId'