Clean text strings automatically

cleanText: Clean character strings automatically. Options to keep ASCII characters only, keep certain characters, lower caps, title format, are available.

cleanNames: Resulting names are unique and consist only of the _ character, numbers, and ASCII letters. Capitalization preferences can be specified using the lower parameter.

Usage

cleanText(
  text,
  spaces = TRUE,
  keep = "",
  lower = TRUE,
  ascii = TRUE,
  title = FALSE
)

cleanNames(df, num = "x", keep = "_", ...)

Arguments

text: Character Vector
spaces: Boolean. Keep spaces? If character input, spaces will be transformed into passed argument.
keep: Character. String (concatenated or as vector) with all characters that are accepted and should be kept, in addition to alphanumeric.
lower: Boolean. Transform all to lower case?
ascii: Boolean. Only ASCII characters?
title: Boolean. Transform to title format (upper case on first letters).
df: data.frame/tibble.
num: Add character before only-numeric names.
...: Additional parameters passed to cleanText().

Value

Character vector with transformed strings.

data.frame/tibble with transformed column names.

Details

Inspired by janitor::clean_names.

Examples

cleanText("Bernardo Lares 123")
#> [1] "bernardo lares 123"
cleanText("Bèrnärdo LáreS 123", lower = FALSE)
#> [1] "Bernardo LareS 123"
cleanText("Bernardo Lare$", spaces = ".", ascii = FALSE)
#> [1] "bernardo lare"
cleanText("\\@®ì÷å   %ñS  ..-X", spaces = FALSE)
#> [1] "riansx"
cleanText(c("maría", "€", "núñez_a."), title = TRUE)
#> [1] "Maria"  "Eur"    "Nuneza"
cleanText("29_Feb-92()#", keep = c("#", "_"), spaces = FALSE)
#> [1] "29_feb92#"

# For a data.frame directly:
df <- dft[1:5, 1:6] # Dummy data
colnames(df) <- c("ID.", "34", "x_2", "Num 123", "Nòn-äscì", "  white   Spaces  ")
print(df)
#>   ID.    34 x_2 Num 123 Nòn-äscì   white   Spaces  
#> 1   1 FALSE   3    male       22                  1
#> 2   2  TRUE   1  female       38                  1
#> 3   3  TRUE   3  female       26                  0
#> 4   4  TRUE   1  female       35                  1
#> 5   5 FALSE   3    male       35                  0
cleanNames(df)
#>   id   x34 x_2 num_123 nonasci white_spaces
#> 1  1 FALSE   3    male      22            1
#> 2  2  TRUE   1  female      38            1
#> 3  3  TRUE   3  female      26            0
#> 4  4  TRUE   1  female      35            1
#> 5  5 FALSE   3    male      35            0
cleanNames(df, lower = FALSE)
#>   ID   x34 x_2 Num_123 Nonasci white_Spaces
#> 1  1 FALSE   3    male      22            1
#> 2  2  TRUE   1  female      38            1
#> 3  3  TRUE   3  female      26            0
#> 4  4  TRUE   1  female      35            1
#> 5  5 FALSE   3    male      35            0

Usage

Arguments

Value

Details

See also

Examples