Skip to contents

This function creates a correlation full study and returns a rank of the highest correlation variables obtained in a cross-table.


  plot = TRUE,
  pvalue = TRUE,
  max_pvalue = 1,
  type = 1,
  max = 1,
  top = 20,
  local = 1,
  ignore = NULL,
  contains = NA,
  grid = TRUE, = FALSE,
  quiet = FALSE,



Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered.


Boolean. Show and return a plot?


Boolean. Returns a list, with correlations and statistical significance (p-value) for each value.


Numeric. Filter non-significant variables. Range (0, 1]


Integer. Plot type. 1 is for overall rank. 2 is for local rank.


Numeric. Maximum correlation permitted (from 0 to 1)


Integer. Return top n results only. Only valid when type = 1. Set value to NA to use all cross-correlations


Integer. Label top n local correlations. Only valid when type = 2


Vector or character. Which column should be ignored?


Character vector. Filter cross-correlations with variables that contains certain strings (using any value if vector used).


Boolean. Separate into grids?

Boolean. Remove NAs?


Boolean. Keep quiet? If not, show messages


Additional parameters passed to corr


Depending on input plot, we get correlation and p-value results for every combination of features, arranged by descending absolute correlation value, with a data.frame plot = FALSE or plot plot = TRUE.

See also

Other Correlations: corr(), corr_var()

Other Exploratory: corr_var(), crosstab(), df_str(), distr(), freqs(), freqs_df(), freqs_list(), freqs_plot(), lasso_vars(), missingness(), plot_cats(), plot_df(), plot_nums(), tree_var()


Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

# Only data with no plot
corr_cross(dft, plot = FALSE, top = 10)
#> Returning only the top 10. You may override with the 'top' argument
#> # A tibble: 10 × 8
#> # Rowwise: 
#>    key           mix               corr    pvalue group1   cat1   group2 cat2   
#>    <chr>         <chr>            <dbl>     <dbl> <chr>    <chr>  <chr>  <chr>  
#>  1 Ticket_113781 Cabin_C22.C26    0.866 3.35e-269 Ticket   113781 Cabin  "C22.C…
#>  2 Pclass_1      Cabin_OTHER      0.795 4.58e-195 Pclass   1      Cabin  "OTHER"
#>  3 Pclass_1      Cabin_          -0.789 4.39e-190 Pclass   1      Cabin  ""     
#>  4 SibSp         Ticket_CA..2343  0.604 1.40e- 89 SibSp    SibSp  Ticket "CA..2…
#>  5 Fare          Pclass_1         0.592 2.87e- 85 Fare     Fare   Pclass "1"    
#>  6 SibSp         Ticket_OTHER    -0.571 3.37e- 78 SibSp    SibSp  Ticket "OTHER"
#>  7 Survived_TRUE Sex_male        -0.543 1.41e- 69 Survived TRUE   Sex    "male" 
#>  8 Pclass_3      Cabin_           0.539 2.25e- 68 Pclass   3      Cabin  ""     
#>  9 Pclass_3      Cabin_OTHER     -0.502 3.94e- 58 Pclass   3      Cabin  "OTHER"
#> 10 Fare          Cabin_          -0.482 4.85e- 53 Fare     Fare   Cabin  ""     

# Show only most relevant results filtered by pvalue
corr_cross(dft, = TRUE, max_pvalue = 0.05, top = 15)
#> Returning only the top 15. You may override with the 'top' argument

# Cross-Correlation for certain variables
corr_cross(dft, contains = c("Survived", "Fare"))
#> Returning only the top 20. You may override with the 'top' argument

# Cross-Correlation max values per category
corr_cross(dft, type = 2, top = NA)