Regular Expressions in JavaScript: Ultimate Guide for Beginners

In this article I'll explain the main things you need to know about RegExp, this will be helpful for JavaScript and any other programming language you may use.

First of all, what are regular expressions? A regular expression, RegExp, or Regex, is a special way of working with strings in programming languages. Regular expressions can be used to find or replace information in a string. Almost all programming languages implement regular expressions.

If you, like me, have ever looked at regular expressions and understood absolutely nothing, I strongly recommend that you invest time and read this until the end, because I guarantee it's worth it!

Why learn Regex?

After years of working as a developer, for a long time, I was ignorant about this subject and ended up using only what was available on the internet. In a way, you're held hostage to what you're able to find on the forums.

But when you finally learn it, you stop wasting time searching for ready-made expressions, and even more time trying to solve problems less efficiently, without regex.

Understanding regular expressions puts you at a different level as a programmer. I would say that a minority of programmers have at least an intermediate knowledge of regular expressions, as it is not such an easy thing to learn. But when you actually learn even the basics, you have an extremely useful tool in your hands.

Introduction to regular expressions

As I said, regular expressions are a way of working with strings to find, replace, or extract certain information.

You've probably seen the use of wildcards somewhere (e.g., *.jpg, *.png, http://site.com/*), to define files, extensions, or URLs. The easiest way to understand regular expressions is to think of these wildcards, but you can do much more complex operations, such as extracting an email from a text or validating a phone number.

Before we go on with examples, know that a regular expression is not easy to read. It's not easy to write either, so don't be intimidated.

It's actually not easy to memorize once you study, (you'll need to do exercises) I'll try to help as much as possible, but in the end using a regular expression is often the only reasonable way to solve a programming challenge.

Starting a regular expression

In JavaScript, a regular expression is an object of class RegExp. We can define a regular expression in two ways:

Using the literal (most common) form:

const reg = /teste/;

Or initializing the RegExp object:

const reg = new RegExp('test');

The literal syntax is more common probably because it's simpler. Just as we have two ways of initializing arrays and objects [], new Array(), {}, new Object(), the literal syntax is always the most common one. But there are cases where we'll need to initialize the RegExp object. We're gonna see them later.

In the example above, test is our regex pattern, and it's what will define our regular expression to work with another string. Now, let's understand how this pattern works.

Methods that use regular expressions

Now we need to talk quickly about some methods that use regular expressions. The methods are all derived from the object String or the object RegExp itself.

RegExp	exec	Search for the string and return null or an array containing the position of the pattern in the string. E.g. /pattern/.exec(string)
RegExp	test	Search the string and return true or false if it didn't find the pattern. E.g. /pattern/.test(string)
String	match	Search the string for the pattern. Returns null or array. E.g. 'test'.match(/pattern/)
	replace	Search the string for the pattern and replace with another string. E.g. '-test-'.replace(/pattern/, '')
	search	Search the string for the pattern and return the position or -1 if nothing is found. E.g. 'test'.search(/pattern/)
	split	Search the string for the pattern and return an array with a substring. E.g. '-test-'.split(/pattern/)

In this article, we'll explain how to create RegExp standards, without going too deep into each method.

Important to understand

Regex is something you learn best with practice, so to understand how the patterns I'm going to describe here work, I recommend that you open your browser console and test the commands described below. Use Ctrl + Shift + J (Windows) or Cmd + Option + J (Mac).

You can change the defaults to see if it's returned true or false. That way you can test it with me, little by little, and you can skip to the next part if there's no doubt.

Checking if a string contains a substring

As you've seen above, you can divide, replace, or find patterns within a string using regular expressions. In this article, we'll use mostly the test method to explain how these patterns work.

Let's say you want to check if a string contains the word "test".

By setting /word/, we create a RegExp object with the pattern "word", using the literal form. Then, with the test method we can verify if the pattern exists in a string, returning true or false:

/word/.test('words'); // ✅
/word/.test('This sentence contains words'); // ✅
/word/.test('Sentence without clear meaning'); // ❌

Substring at the beginning or end of the string

Let's say you need to check if a word exists at the beginning or end of a string.

Operator ^ (start)

To validate if a string begins with a certain pattern, use the circumflex accent ^:

/^hi/.test('hi, how are you?'); // ✅
/^hi/.test('...hi, how are you?'); // ❌

Operator $ (end)

To validate if a string ends with a certain pattern, use the $ operator:

/end$/.test('The end'); // ✅
/end$/.test('The end. Not quite yet...'); // ❌

Combining start and end of a string

To check if a string begins and ends with the same pattern, you can use ^ and $ in the same expression:

/^hi$/.test('hi'); // ✅
/^hi$/.test('hi, how are you?'); // ❌
/^hi$/.test('hi, how are you? hi'); // ❌

Note that the last example, although it starts and ends with "hi", our pattern only recognizes what starts and only what ends, so it will validate only the string "hi".

The most common scenario for these operators is when we have something in the middle and we know how the string must start or end.

Flags

We can determine how a regular expression should be interpreted by flags. You can use more than one flag in the same expression.

For now, we won't worry too much about that, but see below what each flag is used for:

g The search should return all patterns found. If this flag is not used, only the first pattern found will be returned.
i The search should not be case sensitive. If this flag is not used, the search will return true only if the string is exactly the same.
m Affects only the behavior of the ^ and $ operators. If this flag is not used, the search will return true if the whole string matches the pattern, but if used, it will return true if a line matches the pattern.
s Allows the character . to match the line break as well.
u Allows Unicode character support. You can search for emojis, for example.

The use of flags takes place at the initialization of the regular expression object, in the same way as taught at the beginning of the article. The literal form:

/test/i.test('Test'); // ✅

or by initializing the RegExp object with the flag as the second parameter:

new RegExp('test', 'i').test('Test'); // ✅

Character sets

This is one of the most important topics and it's used in most regular expressions. You can use brackets to create sets of patterns. These sets can be a scale from "0" to "9", or from "a" to "z", etc.

We can use sets to check if a string contains numbers from 0-9. Say I want to extract a number (age) from a string:

/[0-9]+/.exec('Age: 22')[0]; // 22

Take a look at other examples:

/[1-5]/ // 1, 2, 3, 4, 5
/[0-9]/ // 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
/ab/ // a, b
/[a-d]/ // a, b, c, d
/[a-z]/ // a, b, c, d ... z
/A-Z/ // A, B, C ... Z

We can also combine sets:

/[0-9a-zA-Z]/.test('1'); // ✅
/[0-9a-zA-Z]/.test('A'); // ✅
/[0-9a-zA-Z#]/.test('#'); // ✅
/[0-9a-zA-Z]/.test('#'); // ❌
/[0-9a-zA-Z]/.test('Á'); // ❌

Negating sets

The ^ operator, as explained above, defines the start of a pattern. However, when used within character sets, it has the role of denying that set. Example:

// Test if the string contains something that's not a number
/[^0-9]/.test('Test'); // ✅
/[^0-9]/.test('012345test'); // ✅
/[^0-9]/.test('012345'); // ❌

// Test if the string contains something that's not a number and also not from a-z
/[^0-9a-z]/.test('012345test!'); // ✅
/[^0-9a-z]/.test('012345test'); // ❌

Metacharacters

Metacharacters are another topic that is widely used in regular expressions. They are used to abbreviate certain sets and also to specify some special matches. The use of backslashes in most of them serves to "escape" the character so that the character is not considered in the pattern match. Let's see:

. Matches any character, except line break. To correspond specifically to a dot (.), it's necessary to escape this character with \..
\d Matches a numeric character. It's the same as [0-9].
\D Matches a non-numerical character. It's the same as [^0-9].
\w Matches any alphanumeric character plus _. Equivalent to [0-9a-zA-Z_].
\W Matches any non-alphanumeric character plus _. Equivalent to [^0-9a-zA-Z_].
\s Matches a space, tab, and line break character.
\S Matches any character other than space, tab, or line break.
\n Matches line break.
\t Matches tab.
\o Matches null.
\p{x} Matches a Unicode character whose property passed in "x" is true. Requires the use of the "u" flag.
\P{x} Matches the opposite of \p{x}.
[^] Matches any character, including line breaks (other than .).

Some mnemonics to facilitate memorization:

As you can see, when the metacharacter is capitalized it always corresponds to the opposite of the character in lower case.
"." can be remembered as reticence (...) to correspond to anything.
"d" can be remembered as an abbreviation for digits - only numbers.
"w" can be remembered as a word abbreviation - it's often used together with quantifiers to match words. Remember that it can also contain numbers.
"s" can be remembered as a space abbreviation.
"\p", perhaps as a property abbreviation, is a recent inclusion of ES2018, let's see below how to use it.

Quantifiers

If you want to test whether a string matches a pattern zero or N times, you will need to use quantifiers. Check out all quantifiers:

A? Corresponds to zero or one "A": It doesn't make much sense to use it with the test method, but you'll see an example below where it might make more sense. The ? can be considered as an operator to specify something optional.

/A?/.test(''); // ✅
/A?/.test('A'); // ✅
/A?/.test('AAA'); // ✅
/A?/.test('B'); // ✅

A* Corresponds to zero or more "A"s:

/A*/.test(''); // ✅
/A*/.test('A'); // ✅
/A*/.test('AAA'); // ✅
/A*/.test('B'); // ✅

A+ Corresponds to at least one "A":

/A+/.test(''); // ❌
/A+/.test('A'); // ✅
/A+/.test('AAA'); // ✅
/A+/.test('B'); // ❌

A{x} Corresponds to exactly x times "A" - Note that the sequence should also match:

/A{3}/.test('AA'); // ❌
/A{3}/.test('AAA'); // ✅
/A{3}/.test('AAAAA'); // ✅
/A{3}/.test('B'); // ❌
/A{3}/.test('ABAA'); // ❌

A{x,y} Corresponds to exactly x to y times "A":

/A{2,3}/.test('A'); // ❌
/A{2,3}/.test('AA'); // ✅
/A{2,3}/.test('AAA'); // ✅
/A{2,3}/.test('AAAAA'); // ✅
/A{2,3}/.test('ABA'); // ❌

A{x,} Corresponds to x to infinite times "A":

/A{3,}/.test('A'); // ❌
/A{3,}/.test('AA'); // ❌
/A{3,}/.test('AAA'); // ✅
/A{3,}/.test('AAAAA'); // ✅

Examples with quantifiers

Now that you have a general notion of groups and quantifiers, let's try to combine them. Let's say you want to validate an IPv4 address:

/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/.test('127.0.0.1'); // ✅
/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/.test('127.0.0'); // ❌
/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/.test('127.0.0.1000'); // ❌
/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/.test('localhost'); // ❌

Note that we use a backslash on the dot, because . is a special character, so we need to "escape" it.

To check whether a phone number contains the 9 in front of it or not, ignoring other characters to simplify:

/^9+[0-9]{8}$/.test(999998888); // ✅
/^9+[0-9]{8}$/.test(899998888); // ❌
/^9+[0-9]{8}$/.test(99998888); // ❌

To match an HTML span tag and its content:

/<span>.*<\/span>/.test('<span>Test</span>');

Note that the forward slash must be escaped with the use of the backslash so that the expression does not end. See below how these escapes work.

Escaping special characters

These are the characters that need to be "escaped", or ignored:

\ The bar serves to escape other characters. Use\\ if you want to match "\"
/ Begins and ends a regular expression.
[ ] Defines a set.
{ } Defines a property.
( ) Defines a group.
?+* Quantifiers.
| OR Operator.
. Wild Card.
^ Defines the start of a pattern and also serves to negate a group.
$ Defines the end of a pattern.

It's not necessary to escape special characters in sets

Usually, when we want to escape a special character, we use \. But that's not necessary within sets. For example, to check if a set has numbers from 0-9, a period or a comma, you can use [0-9.,] without any problem. To match a dot and a comma outside of a set you would have to do [0-9]\.,.

Groups

Using parentheses, you can create groups of patterns. Let's say you want to validate a string that may have two formats "No. 1234" or "1234":

/^(No\s)?[0-9]{4}$/.test('No 1234'); // ✅
/^(No\s)?[0-9]{4}$/.test('1234'); // ✅
/^(No\s)?[0-9]{4}$/.test('Number 1234'); // ❌

By grouping the string "No " and adding the quantifier "?" (optional), the whole group becomes optional.

When you want to add an optional character only, it's not necessary to create groups. For example, to check the plural on "dog" or "dogs".

/dogs?/.test('dog'); // ✅
/dogs?/.test('dogs'); // ✅

In the above script, we use the pattern "s?" to match or not an optional "s".

Another advantage of using groups is being able to simplify repeated patterns. For example:

/(ha){2,}/.test('hahahaha'); // ✅

The regular expression above captures a laugh of two or more "ha"s. If I wanted to do that expression without groups, it would be impossible to test "ha" endless times.

By removing the group, the quantifier works for the previous character:

/ha{2,}/.test('hahaha'); // ❌
/ha{2,}/.test('haaaaa'); // ✅

Capturing groups

One of the key advantages of using groups is that we can capture their content.

So far, we've used the test method to validate groups and other expressions, but if you use the exec or the String.match method, you can capture the contents of a group.

Say you want to read a string containing city and state, and you want to separate the city and state into two different matches:

/(.+) - ([\w]{2})/.exec('California - CA'); // [ "California - CA", "California", "CA" ]

First, we delimit any character (.) at least once (+), then we specify the characters " - " as a delimiter, and then we use \w times 2 to match the 2 letters of the state.

Note that an array is returned containing the first index as the complete string, the second being the first group, and so on.

Group reference + replacements

Groups are also very useful in replacements when we need a group to be referenced. We can call the groups defined with $1, $2, $3...

Let's say you need to capture a numerical value in a string and present it in monetary value. You can do that pretty easily with regex:

'The value is 500'.replace(/(\d+)/, 'U$ $1'); // "The value is U$ 500"

Ignoring groups

When we define a group, it's automatically considered and can be referenced with $1, $2, etc. But if you want to ignore a group, it's possible by using the ?: syntax inside the beginning of the group (?: ... ), for example:

// Without ignoring
/(https?):\/\/(.*)/.exec('https://ricardometring.com'); // [ "https://ricardometring.com", "https", "ricardometring.com" ]
// Ignoring the protocol
/(?:https?):\/\/(.*)/.exec('https://ricardometring.com'); // [ "https://ricardometring.com", "ricardometring.com" ]

Named groups

Groups may also have names to facilitate string manipulation. This is a recent function, implemented in the ES2018 version of JavaScript.

Let's say you want to know the day, month, and year of a date, use the following syntax to name the data in different groups:

const date = /(?<month>\d{2})\/(?<day>\d{2})\/(?<year>\d{4})/.exec('03/05/2019');
date.groups.day; // "05"
date.groups.month; // "03"
date.groups.year; // "2019"

Group reference within the pattern

As you've seen in group replacements, you can reference groups using $1, $2, and so on.

You can also reference the first group previously specified within the regular expression itself using \1. This can be a little hard to understand, so here's an example:

Imagine you need to test a string that contains either single or double quotes, and return its contents:

/['"](.*)['"]/.exec('<img src="test.jpg">'); // [ ""test.jpg"", "test.jpg" ]

Everything ok so far, the first group just returned the string in double quotes. But what if inside double quotes I also have a single quote? E.g. <img title="Ricardo's site">. The above regex wouldn't work properly.

Instead, we need to encapsulate the quotes inside a group and use it as a reference when closing the quotes, so the expression won't close with the single quote in the middle of the string. E.g.

/(['"])(.*?)\1/.exec(`<img title="Ricardo's site">`); // [ ""Ricardo's site"", """, "Ricardo's site" ]

Note that we used \1 to refer to the first group ['"], so no matter whether single or double quotes were used for the title attribute, the expression will only close when it matches with the character from the same group that opened it, in this case, ".

It's also possible to pass group names as a reference using k<name>. Example:

/(?<quotes>['"])(.*?)\k<quotes>/.exec(`Company: "Ricardo's Pizzaria" LLC`); // [ ""Ricardo's Pizzaria"", """, "Ricardo's Pizzaria" ]

Group nesting

Groups can be nested as well as functions, using parentheses within parentheses. The order of the groups is defined from left to right, according to the opening of the parentheses. Let's say you want to read a string containing an <a> tag, and return the href attribute, the link contained in href, and the contents of the tag:

'<a href="https://ricardometring.com">Website</a>'.match(/<a (href="(.*)")>(.*)<\/a>/);
// [ "<a href="https://ricardometring.com">Website</a>", "href="https://ricardometring.com"", "https://ricardometring.com", "Metring" ]

OR operator (|)

If you want to choose between one pattern or another, use the | operator. E.g:

/whiskey|vodka/.test('whiskey'); // ✅
/whiskey|vodka/.test('vodka'); // ✅
/whiskey|vodka/.test('pinga'); // ❌

The or operator also works within groups and between groups:

/(whiskey|vodka)|(pop|juice)/.test('whiskey'); // ✅
/(whiskey|vodka)|(pop|juice)/.test('pop'); // ✅
/(whiskey|vodka)|(pop|juice)/.test('pinga'); // ❌

Lookahead

Literally means "look forward" and serves to test whether a string is followed by a pattern. It's defined by ?=. Example:

If after the letter A comes the letter B:

/A(?=B)/.test('AB'); // ✅
/A(?=B)/.test('AZ'); // ❌
/A(?=B)/.test('BA'); // ❌

It's also possible to negate the above expression using ?!. Example:

/A(?!B)/.test('AB'); // ❌
/A(?!B)/.test('BA'); // ✅
/A(?!B)/.test('ABA'); // ✅ Contains and at the same time doesn't contain B in front of A

Lookbehind

Lookbehind does a "look back." It works the same way as lookahead, only the other way around. Use ?<=:

/(?<=A)B/.test('ABC'); // ✅
/(?<=A)B/.test('BA'); // ❌

To negate lookbehind, use ?<!:

/(?<!A)B/.test('ABC'); // ❌
/(?<!A)B/.test('BA'); // ✅

Interesting Unicode properties

Some properties recently added in the ES2018 version of JavaScript allow you to test the Unicode character properties of your string.

You can do this by using the metacharacter \p or \P (negated).

It's important to remember that since this is a Unicode metacharacter property, the "u" flag needs to be used when starting your regular expression.

To test if a Unicode string contains one or more capital characters:

/\p{Uppercase}+/u.test('UPPERCASE'); // ✅
/\p{Uppercase}+/u.test('Uppercase'); // ✅
/\p{Uppercase}+/u.test('lowercase'); // ❌

And the negated syntax of the above expressions using \P. Note that it will not return false to "Uppercase", because the string while containing a capital letter, contains lower case characters as well.

/\P{Uppercase}+/u.test('UPPERCASE'); // ❌
/\P{Uppercase}+/u.test('Uppercase'); // ✅
/\P{Uppercase}+/u.test('lowercase'); // ✅

Other examples using \p:

/^\p{Lowercase}$/u.test('a'); // ✅
/^\p{Lowercase}$/u.test('A'); // ❌
/^\p{Emoji}$/u.test('?') // ✅
/^\p{Emoji}$/u.test('A') // ❌
/^\p{Script=Arabic}+$/u.test('صفيحة'); // ✅
/^\p{Script=Hebrew}+$/u.test('ואיה'); // ✅
/^\p{Script=Latin}+$/u.test('Party in the AP'); // ✅