This function lets the user remove all columns that have some or all values as NAs
This function lets the user remove all rows that have some or all values as NAs
Usage
removenacols(df, all = TRUE, ignore = NULL)
removenarows(df, all = TRUE)
numericalonly(df, dropnacols = TRUE, logs = FALSE, natransform = NA)
Arguments
- df
Data.frame
- all
Boolean. Remove rows which contains ONLY NA values. If set to FALSE, rows which contains at least one NA will be removed
- ignore
Character vector. Column names to ignore validation.
- dropnacols
Boolean. Drop columns with only NA values?
- logs
Boolean. Calculate log(x)+1 for numerical columns?
- natransform
String. "mean" or 0 to impute NA values. If set to NA no calculation will run.
Value
data.frame with removed columns.
data.frame with removed rows.
data.frame with all numerical columns selected.
See also
Other Data Wrangling:
balance_data()
,
categ_reducer()
,
cleanText()
,
date_cuts()
,
date_feats()
,
file_name()
,
formatHTML()
,
holidays()
,
impute()
,
left()
,
normalize()
,
num_abbr()
,
ohe_commas()
,
ohse()
,
quants()
,
replaceall()
,
replacefactor()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
,
zerovar()
Other Data Wrangling:
balance_data()
,
categ_reducer()
,
cleanText()
,
date_cuts()
,
date_feats()
,
file_name()
,
formatHTML()
,
holidays()
,
impute()
,
left()
,
normalize()
,
num_abbr()
,
ohe_commas()
,
ohse()
,
quants()
,
replaceall()
,
replacefactor()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
,
zerovar()
Other Data Wrangling:
balance_data()
,
categ_reducer()
,
cleanText()
,
date_cuts()
,
date_feats()
,
file_name()
,
formatHTML()
,
holidays()
,
impute()
,
left()
,
normalize()
,
num_abbr()
,
ohe_commas()
,
ohse()
,
quants()
,
replaceall()
,
replacefactor()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
,
zerovar()
Examples
data(dft) # Titanic dataset
str(dft)
#> 'data.frame': 891 obs. of 11 variables:
#> $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
#> $ Survived : logi FALSE TRUE TRUE TRUE FALSE FALSE ...
#> $ Pclass : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ...
#> $ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
#> $ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
#> $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
#> $ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
#> $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
#> $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
#> $ Cabin : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
#> $ Embarked : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
numericalonly(dft) %>% head()
#> PassengerId Age SibSp Parch Fare Survived Sex
#> 1 1 22 1 0 7.2500 0 1
#> 2 2 38 1 0 71.2833 1 0
#> 3 3 26 0 0 7.9250 1 0
#> 4 4 35 1 0 53.1000 1 0
#> 5 5 35 0 0 8.0500 0 1
#> 6 6 NA 0 0 8.4583 0 1
numericalonly(dft, natransform = "mean") %>% head()
#> PassengerId Age SibSp Parch Fare Survived Sex
#> 1 1 22.00000 1 0 7.2500 0 1
#> 2 2 38.00000 1 0 71.2833 1 0
#> 3 3 26.00000 0 0 7.9250 1 0
#> 4 4 35.00000 1 0 53.1000 1 0
#> 5 5 35.00000 0 0 8.0500 0 1
#> 6 6 29.69912 0 0 8.4583 0 1