Skip to contents

This function lets the user remove all columns that have some or all values as NAs

This function lets the user remove all rows that have some or all values as NAs

Usage

removenacols(df, all = TRUE, ignore = NULL)

removenarows(df, all = TRUE)

numericalonly(df, dropnacols = TRUE, logs = FALSE, natransform = NA)

Arguments

df

Data.frame

all

Boolean. Remove rows which contains ONLY NA values. If set to FALSE, rows which contains at least one NA will be removed

ignore

Character vector. Column names to ignore validation.

dropnacols

Boolean. Drop columns with only NA values?

logs

Boolean. Calculate log(x)+1 for numerical columns?

natransform

String. "mean" or 0 to impute NA values. If set to NA no calculation will run.

Value

data.frame with removed columns.

data.frame with removed rows.

data.frame with all numerical columns selected.

Examples

data(dft) # Titanic dataset
str(dft)
#> 'data.frame':	891 obs. of  11 variables:
#>  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ Survived   : logi  FALSE TRUE TRUE TRUE FALSE FALSE ...
#>  $ Pclass     : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ...
#>  $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
#>  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
#>  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
#>  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
#>  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
#>  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
#>  $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
#>  $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
numericalonly(dft) %>% head()
#>   PassengerId Age SibSp Parch    Fare Survived Sex
#> 1           1  22     1     0  7.2500        0   1
#> 2           2  38     1     0 71.2833        1   0
#> 3           3  26     0     0  7.9250        1   0
#> 4           4  35     1     0 53.1000        1   0
#> 5           5  35     0     0  8.0500        0   1
#> 6           6  NA     0     0  8.4583        0   1
numericalonly(dft, natransform = "mean") %>% head()
#>   PassengerId      Age SibSp Parch    Fare Survived Sex
#> 1           1 22.00000     1     0  7.2500        0   1
#> 2           2 38.00000     1     0 71.2833        1   0
#> 3           3 26.00000     0     0  7.9250        1   0
#> 4           4 35.00000     1     0 53.1000        1   0
#> 5           5 35.00000     0     0  8.0500        0   1
#> 6           6 29.69912     0     0  8.4583        0   1