Regular Expressions

7. Character Classes

The [] characters are used with regular expressions to define what is known as a character class. The simplest use for a character class is to define a number of different characters one of which must occur at that spot in the text in order for a match to be found.

In its simplest form it tahes the place of multiple or conditions so for example the following two statements are equivalent:

var regexp = /a|b|c|d/;
var regexp2 = /[abcd]/

In each case any one of the alternate characters will be considered a match.

There is a lot more to character classes than this though. We can substitute a character class for a single character in a longer expression like this:

var regexp = /[bcfh]at/i;

This regular expression will now match with any of the following (ignoring any capitalization as well):

  • bat
  • cat
  • fat
  • hat

We can also set up a negation character class where any character is allowed except for the ones specified in the class. To do this we place a ^ in the front of the class definition like this:

var re = /[^cr]ot/

Now the regular expression will match any three characters ending wit 'ot' with the two exceptions specified. So hot, lot, not, and tot will be considered as matches while cot and rot will not.

Since the letters of the alphabet and numbers occur in a particular sequence we can simplify our character class where such a sequence is to be specified by placing a hyphen between the first and last letters in a range. The following definition is identical with the first two that we looked at on this page and is even shorter.

var regexp3 = /[a-d]/;

Of course we can also escape the [] characters in order to test for them in the text like this (which will match either [ or ]):

var re = /[\[\]]/;

