1. Technology

Regular Expressions

15. Back References

clr gif

In the previous tutorial we introduced the concept of grouping characters together to be able to test for multiple occurrences of them as a group. Groups are in fact far more powerful than that because groups by default are also captured and stored in an array (called the back reference array) and we can then include references to the entries in that array from later in our regular expression to provide exact matching.

Suppose (for example) that we have the group /(\bth[ae]n\b)/ (which will test for either the word "than" or the word "then"). If we want to be able to reference the exact text that was matched then we need to use a backreference.

We can have up to nine backreferences within a single regular expression and these correspond to the values that were matched by the first nine groups captured in the backreference array. You then reference those array entries using the values $1 through $9.

One useful method that Javascript does not provide for strings is the ability to trim whitespace from the start and end of the string. We can easily create our own trim method by using a backreference and the string replace method like this:

String.prototype.trim = function() {
var re = /^\s*(.*?)\s*$/;
return this.replace(re,"$1");}

Let's revise what we know of regular expressions by looking at this expression and seeing what each piece means.

  • ^ the start of the string
  • \s+ one or more whitespace characters
  • (.*?) our group of characters that will be loaded into the backreference array
    . means any character except a linefeed or carriage return,
    * means zero or more occurrences, and
    ? makes it a reluctant group so that it wont include trailing spaces
  • $ the end of the string
  • $1 a reference to the actual text that matched with our group allowing us to replace the whole string with just that part that matched the group

In this particular instance we are not using the group to allow us to test for multiple repetitions of multiple characters but are instead using the group to capture the exact text that matches with the pattern that it contains allowing us to use that text within our replace method. Had we not placed .*? within a group we could have still matched the string but would have had no way to reference the value that it matched.

©2014 About.com. All rights reserved.