Visualize frequency of elements on a list, list vector, or vector with comma separated values. Detect which combinations and elements are the most frequent and how much they represent of your total observations. This is similar to the UpSet Plots which may be used as an alternative to Venn diagrams.
Usage
freqs_list(
df,
var = NULL,
wt = NULL,
fx = "mean",
rm.na = FALSE,
min_elements = 1,
limit = 10,
limit_x = NA,
limit_y = NA,
tail = TRUE,
size = 10,
unique = TRUE,
abc = FALSE,
title = "",
plot = TRUE
)
Arguments
- df
Data.frame
- var
Variable. Variables you wish to process.
- wt
Variable, numeric. Select a numeric column to use in the colour scale, used as sum, mean... of those values for each of the combinations.
- fx
Character. Set operation: mean, sum
- rm.na
Boolean. Remove NA value from
wt
?- min_elements
Integer. Exclude combinations with less than n elements
- limit, limit_x, limit_y
Integer. Show top n combinations (x) and/or elements (y). The rest will be grouped into a single element. Set argument to 0 to ignore.
limit_x
/limit_y
answer tolimit
's argument.- tail
Boolean. Show tail grouped into "..." on the plots?
- size
Numeric. Text base size
- unique
Boolean. a,b = b,a?
- abc
Boolean. Do you wish to sort by alphabetical order?
- title
Character. Overwrite plot's title with.
- plot
Boolean. Plot viz? Will be generated anyways in the output object
See also
Other Frequency:
freqs()
,
freqs_df()
,
freqs_plot()
Other Exploratory:
corr_cross()
,
corr_var()
,
crosstab()
,
df_str()
,
distr()
,
freqs()
,
freqs_df()
,
freqs_plot()
,
lasso_vars()
,
missingness()
,
plot_cats()
,
plot_df()
,
plot_nums()
,
tree_var()
Other Visualization:
distr()
,
freqs()
,
freqs_df()
,
freqs_plot()
,
noPlot()
,
plot_chord()
,
plot_survey()
,
plot_timeline()
,
tree_var()
Examples
if (FALSE) { # \dontrun{
df <- dplyr::starwars
head(df[, c(1, 4, 5, 12)], 10)
# Characters per movies combinations in a list column
head(df$films, 2)
freqs_list(df, films)
# Skin colours in a comma-separated column
head(df$skin_color)
x <- freqs_list(df, skin_color, min_elements = 2, limit = 5, plot = FALSE)
# Inside "x" we'll have:
names(x)
# Using the 'wt' argument to add a continuous value metric
# into an already one-hot encoded columns dataset (and hide tail)
csv <- "https://raw.githubusercontent.com/hms-dbmi/UpSetR/master/inst/extdata/movies.csv"
movies <- read.csv(csv, sep = ";")
head(movies)
freqs_list(movies,
wt = AvgRating, min_elements = 2, tail = FALSE,
title = "Movies\nMixed Genres\nRanking"
)
# So, please: no more Comedy+SciFi and more Drama+Horror films (based on ~50 movies)!
} # }