cleanText: Clean character strings automatically. Options to keep ASCII
characters only, keep certain characters, lower caps, title format, are available.
cleanNames: Resulting names are unique and consist only of the _
character, numbers, and ASCII letters. Capitalization preferences can be
specified using the lower parameter.
Usage
cleanText(
text,
spaces = TRUE,
keep = "",
lower = TRUE,
ascii = TRUE,
title = FALSE
)
cleanNames(df, num = "x", keep = "_", ...)Arguments
- text
Character Vector
- spaces
Boolean. Keep spaces? If character input, spaces will be transformed into passed argument.
- keep
Character. String (concatenated or as vector) with all characters that are accepted and should be kept, in addition to alphanumeric.
- lower
Boolean. Transform all to lower case?
- ascii
Boolean. Only ASCII characters?
- title
Boolean. Transform to title format (upper case on first letters).
- df
data.frame/tibble.
- num
Add character before only-numeric names.
- ...
Additional parameters passed to
cleanText().
See also
Other Data Wrangling:
balance_data(),
categ_reducer(),
date_cuts(),
date_feats(),
file_name(),
formatHTML(),
holidays(),
impute(),
left(),
normalize(),
num_abbr(),
ohe_commas(),
ohse(),
quants(),
removenacols(),
replaceall(),
replacefactor(),
textFeats(),
textTokenizer(),
vector2text(),
year_month(),
zerovar()
Other Text Mining:
ngrams(),
remove_stopwords(),
replaceall(),
sentimentBreakdown(),
textCloud(),
textFeats(),
textTokenizer(),
topics_rake()
Examples
cleanText("Bernardo Lares 123")
#> [1] "bernardo lares 123"
cleanText("Bèrnärdo LáreS 123", lower = FALSE)
#> [1] "Bernardo LareS 123"
cleanText("Bernardo Lare$", spaces = ".", ascii = FALSE)
#> [1] "bernardo lare"
cleanText("\\@®ì÷å %ñS ..-X", spaces = FALSE)
#> [1] "riansx"
cleanText(c("maría", "€", "núñez_a."), title = TRUE)
#> [1] "Maria" "Eur" "Nuneza"
cleanText("29_Feb-92()#", keep = c("#", "_"), spaces = FALSE)
#> [1] "29_feb92#"
# For a data.frame directly:
df <- dft[1:5, 1:6] # Dummy data
colnames(df) <- c("ID.", "34", "x_2", "Num 123", "Nòn-äscì", " white Spaces ")
print(df)
#> ID. 34 x_2 Num 123 Nòn-äscì white Spaces
#> 1 1 FALSE 3 male 22 1
#> 2 2 TRUE 1 female 38 1
#> 3 3 TRUE 3 female 26 0
#> 4 4 TRUE 1 female 35 1
#> 5 5 FALSE 3 male 35 0
cleanNames(df)
#> id x34 x_2 num_123 nonasci white_spaces
#> 1 1 FALSE 3 male 22 1
#> 2 2 TRUE 1 female 38 1
#> 3 3 TRUE 3 female 26 0
#> 4 4 TRUE 1 female 35 1
#> 5 5 FALSE 3 male 35 0
cleanNames(df, lower = FALSE)
#> ID x34 x_2 Num_123 Nonasci white_Spaces
#> 1 1 FALSE 3 male 22 1
#> 2 2 TRUE 1 female 38 1
#> 3 3 TRUE 3 female 26 0
#> 4 4 TRUE 1 female 35 1
#> 5 5 FALSE 3 male 35 0
