cleanText
: Clean character strings automatically. Options to keep ASCII
characters only, keep certain characters, lower caps, title format, are available.
cleanNames
: Resulting names are unique and consist only of the _
character, numbers, and ASCII letters. Capitalization preferences can be
specified using the lower
parameter.
Usage
cleanText(
text,
spaces = TRUE,
keep = "",
lower = TRUE,
ascii = TRUE,
title = FALSE
)
cleanNames(df, num = "x", keep = "_", ...)
Arguments
- text
Character Vector
- spaces
Boolean. Keep spaces? If character input, spaces will be transformed into passed argument.
- keep
Character. String (concatenated or as vector) with all characters that are accepted and should be kept, in addition to alphanumeric.
- lower
Boolean. Transform all to lower case?
- ascii
Boolean. Only ASCII characters?
- title
Boolean. Transform to title format (upper case on first letters).
- df
data.frame/tibble.
- num
Add character before only-numeric names.
- ...
Additional parameters passed to
cleanText()
.
See also
Other Data Wrangling:
balance_data()
,
categ_reducer()
,
date_cuts()
,
date_feats()
,
file_name()
,
formatHTML()
,
holidays()
,
impute()
,
left()
,
normalize()
,
num_abbr()
,
ohe_commas()
,
ohse()
,
quants()
,
removenacols()
,
replaceall()
,
replacefactor()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
,
zerovar()
Other Text Mining:
ngrams()
,
remove_stopwords()
,
replaceall()
,
sentimentBreakdown()
,
textCloud()
,
textFeats()
,
textTokenizer()
,
topics_rake()
Examples
cleanText("Bernardo Lares 123")
#> [1] "bernardo lares 123"
cleanText("Bèrnärdo LáreS 123", lower = FALSE)
#> [1] "Bernardo LareS 123"
cleanText("Bernardo Lare$", spaces = ".", ascii = FALSE)
#> [1] "bernardo lare"
cleanText("\\@®ì÷å %ñS ..-X", spaces = FALSE)
#> [1] "riansx"
cleanText(c("maría", "€", "núñez_a."), title = TRUE)
#> [1] "Maria" "Eur" "Nuneza"
cleanText("29_Feb-92()#", keep = c("#", "_"), spaces = FALSE)
#> [1] "29_feb92#"
# For a data.frame directly:
df <- dft[1:5, 1:6] # Dummy data
colnames(df) <- c("ID.", "34", "x_2", "Num 123", "Nòn-äscì", " white Spaces ")
print(df)
#> ID. 34 x_2 Num 123 Nòn-äscì white Spaces
#> 1 1 FALSE 3 male 22 1
#> 2 2 TRUE 1 female 38 1
#> 3 3 TRUE 3 female 26 0
#> 4 4 TRUE 1 female 35 1
#> 5 5 FALSE 3 male 35 0
cleanNames(df)
#> id x34 x_2 num_123 nonasci white_spaces
#> 1 1 FALSE 3 male 22 1
#> 2 2 TRUE 1 female 38 1
#> 3 3 TRUE 3 female 26 0
#> 4 4 TRUE 1 female 35 1
#> 5 5 FALSE 3 male 35 0
cleanNames(df, lower = FALSE)
#> ID x34 x_2 Num_123 Nonasci white_Spaces
#> 1 1 FALSE 3 male 22 1
#> 2 2 TRUE 1 female 38 1
#> 3 3 TRUE 3 female 26 0
#> 4 4 TRUE 1 female 35 1
#> 5 5 FALSE 3 male 35 0