Removing punctuation in JavaScript is a relatively easy task, but removing accents, leaving only the letters is a bit more challenging. Regardless of the situation, I have below some minimalist functions that can be used for both cases.
How to remove accents in JavaScript
To simply remove accents and cedilla from a string and return the same string without the accents, we can use ES6's String.prototype.normalize method, followed by a String.prototype.replace:
Explanation
The normalize method was introduced in the ES6 version of JavaScript in 2015. It serves to convert a string into its standard Unicode format. In this case, we use the parameter NFD which can separate the accents from the letters and return their Unicode codes.
To get a better idea of how this conversion to Unicode works, see below:
Then the method replaces all occurrences of diacritical characters, combining them in the Unicode sequence \u0300 - \u036F, another advantage of ES6 that was added to allow Unicode ranges in RegEx.
Removing all special characters in JavaScript
To remove the accents and other special characters like /?!(), just use the same formula above, only replace everything but letters and numbers.
Explanation
To understand what happens in the code above, I suggest reading the previous paragraph where I talk about Unicode and the normalize method.
The only addition, in this case, was to create 2 groups in the regex through ([ group 1 ]|[ group 2 ])
and add to group 2 the regular expression [^0-9a-zA-Z]
, which means: anything that's not (^) 0-9, a-z or A-Z, is also replaced.
If you don't want to remove spaces, just add \s
:
Replacing special characters
Another quite recurrent use case is the need to clear the accents and then replace special characters with some other one, e.g. "Any phrase" -> "Any-phrase".
There is a very good regular expression to replace characters that are not common letters or numbers, but this expression also removes accents.
If we want to remove only the accents and then replace other special characters, we need to do sort of what was proposed in the first example:
But maybe you also need to replace unnecessary hyphens, as in the case of "This is a sentence!!!" turning into "This-is-a-sentence---".
Here's a complete function that removes accents, replaces special characters with hyphens, also removing additional hyphens:
If you want to use this same function to "slugify" a URL, just add toLowerCase()
at the end and it's done!
I think I covered all the more recurring cases when working with accents and special characters in JavaScript. I know that it's an additional challenge for many foreign languages not to have built-in methods to deal with special characters.