[Back to previous page] [Next Page in Series]

Ken Ward's Java Script Tutorial ...

Regular Expression - Simple Usage

Regular expressions aren't really that difficult, even though they wouldn't win any beauty contests. We all know some of them, such as the "*" and the "?" which are used in window's searches.

This page was written originally for the php tutorial and has been modified for JavaScript.

When using these in JavaScript, the regular expression is enclosed in forward slashes, and the whole is enclosed in quotes.

Use this page to try out regular expressions in JavaScript. It is probably better to experiment with your own regular expressions than only to read about them. Try out the examples on this page in the Regular Expression Checker.

On this page:

Usage

Note a regular expression, such as /o/ is NOT a string (so it is never enclosed in quotes). When a regular expression is entered by a user in a form, we need to change the form value from a string. For example

str="hello folks";

re="/o/"; //this is a string, so it won't work

re=eval(re); //change it to an object

check=str.replace(re,"a");

alert (check);

match

str="hello folks";

re=/o/;

check=str.match(re);

alert (check);

//result is "o"

//otherwise it returns null

replace

str="hello folks";

re=/o/;

check=str.replace(re,"a");

alert (check);

// result is "hella folks"

search

search can be used to find the position of the first occurrence of a match:

str="hello folks";

re="/o/";

re=eval(re);

check=str.search(re);

document.write (check);

//result is 4

Flags (i, g, m)

The i flag makes the search case-insensitive. The g makes the search global, finding all matches and not just the first. And the m flag covers multiple lines.

/rea/ matches bread

/rea/i matches bReAd

/rea/g matches bread bREAd bread

/rea/gi matches bread bREAd bread

Period, Asterisk, Plus and Question Mark

The Period

The period (.) matches any character.

So "bread read ted ba a".match(/.b./) matches " b" (of ba). That is, the space and the "b". It doesn't match the "b" in bread, because there is no character before the "b" in bread. There must be one or more characters in the "." space.

The Asterisk (*)

The asterisk (*) matches one or more or none of the preceding.

"bread read ted ba a".match(/b*d/) matches "e" in bread. It doesn't care about the "b" because it matches zero or more of the preceding.

/cat*/g matches all the cats in "cat catamaran cater concatenate".

Often we must escape the asterisk in javascript because /* looks like a comment to javascript. We need, therefore to write /\\*/

The question mark will often be escaped too.

The Plus (+)

The plus (+) matches one or more of the preceding character. It is like the asterisk (*), but must have at least one character present. The asterisk gives us these results:

str='ca at ct c cat catatat'
re=/c(at)*/g
re is: /c(at)*/g
str.match(/c(at)*/g) gives: "c,c,c,cat,catatat"

However, the plus gives us:

str='ca at ct c cat catatat'
re=/c(at)+/g
re is: /c(at)+/g
str.match(/c(at)+/g) gives: "cat,catatat"
str.search(/c(at)+/g) gives the index of the first occurence: 11

That is, there must be at least one preceding item to give a match. Note the brackets (at) before the plus means that the whole of the contents of the bracket are considered. In this case, they must occur at least once for a match.

The Question Mark (?)

The question mark (?) is like the period, but unlike the period, it can take 0 of the previous items.

In summary, then, the asterisk (*) matches 0 or more of the preceding items, and the plus (+) matches 1 or more of the preceding items.

The question mark matches 0 or more of the preceding characters, whilst the period (.) matches 1 or more.

For instance

The question mark finds 5 c's (Note we have to escape the ? with two backslashes!):

str='ca at ct c cat catatat'
re=/\\?c/g
re is: /\\?c/g
str.match(/\\?c/g) gives: "c,c,c,c,c"
str.search(/\\?c/g) gives the index of the first occurence: 0

Whilst the dot gives us (only 4 c's):

str='ca at ct c cat catatat'
re=/.c/g
re is: /.c/g
str.match(/.c/g) gives: " c, c, c, c"
str.search(/.c/g) gives the index of the first occurence: 5

The question mark takes zero or more of the preceding for a match, but the dot must have at least one item.

The Beginning (^) and the End ($), and the Character Class ([])

Character Class ([])

Well, let's start at the end with the Character Class. For instance, /[0-9]/ selects all the number characters, and /[aeiou]/ selects all the lower-case vowels. /[a-z]/ selects all the lower-case letters, and /[A-Z]/ selects all the upper-case letters.

Selecting strings beginning with ...

The caret (^) is used to match strings beginning with certain characters. So /^[A-Z]/ matches any capital letter beginning a word. So this will match the B in Band0. If we add a dot (.) like this, /^[A-Z]./, it will match any string beginning with a capital letter having any character after it. So this matches Ba in Band0.

/^[A-Z].+/ will match the whole word Band0.

Selecting strings ending with...

The dollar sign ($) is used to select matches at the end of a string. I suppose the dollar sign is used here because in the end you have to pay (!!!).

/[A-Z]$/ will match a string ending in a capital letter. So it matches the D in BanD, but does not match in Band.

The next example is one of beginning and ending with a particular pattern.

str='Bill has a coat'
re=/^[A-Z].+(at)$/g
re is: /^[A-Z].+(at)$/g
str.match(/^[A-Z].+(at)$/g) gives: "Bill has a coat"
str.search(/^[A-Z].+(at)$/g) gives the index of the first occurence: 0

Selecting strings beginning with ... and ending with ...

/^[0-9].+[.]$/ will match the string 1band., but not band;.

Special characters become normal characters in a Character Class

The full stop, or any special character in a Character Class, is no longer a special character. So in the example, the dot in [.] matches a dot only, whereas, the dot in ".+" matches any character (and is a special character here.

To continue with beginning and ending...

/^[A-Z].+[.]$/ will match a string beginning with a capital letter and ending with a full stop (.). So it will match "Band.", but not "Band;", or "band.".

Matching a given number of times {2}

/(\.).{3}/ will match a string containing a period (.) followed by three characters. So it will match ".com" in ".com", or ".comm", but not ".co". It will also accept ".007". (Note the dot (.) has been escaped to make it a real dot and not a special character (\.))

/(\.)[a-zA-Z]{3}$/ will accept only strings ending with a dot and 3 letters, either lower- or upper-case.

/(\.)[a-zA-Z]{2,5}$/ will accept an ending consisting of letters beginning with a full stop (.) and having between 2 and 5 letters. So it will accept "my.co", "my.commm", but not "my.commmm".

Saying what you don't want [^...

The caret (^) serves to say what the beginning should be, but inside a character class, it means that none of the characters in the class are allowed in a match.

/^[^aeiou]./ will select strings beginning with anything except a lower-case vowel.

In the above example, the first caret says that this is a match for the beginning, and the one inside the character class says the contents are excluded.

The Pipes, or giving a choice of match

You can use alternatives in regular expressions. The sign to use is called a pipe (|). The following expression:

/^[a-z]|[0-9]/ matches strings beginning with lower-case letters or with numbers. The pipe (|) indicates a choice.

/(A|b)$/ matches any string ending with a capital A or a lower-case b. The pipe (|) indicates a choice.

Email Checker

The following is an example of an email checker. There are many such regular expressions, and none of them work with all email addresses.

/^[a-zA-Z0-9][a-zA-Z0-9-_\s]+@[a-zA-Z0-9-\s].+\.[a-zA-Z]{2,5}$/