JavaScript - String Matching

(2017-04-22)

Finding occurrences of a pattern in a text is a common problem that can be usually solved using String.prototype.match(). For example, given the DNA sequence AGCATGCTGCAGTCATGCTTAGGCTA, we can find GCT occurrences with the following code:

1
2
3
4
5
const dnaSequence = 'AGCATGCTGCAGTCATGCTTAGGCTA';
const dnaPattern = /GCT/g;
const occurrences = dnaSequence.match(dnaPattern);
console.log(occurrences); // ["GCT", "GCT", "GCT"]

As there are various string matching algorithms, I was curious to discover what happens under the hood when we call the method. According to ECMA-262 7.0 docs, this is what happens when we call .match() with a RegExp as argument:

  1. O ? RequireObjectCoercible(this);
  2. If regexp is not undefined or null, then:
    1. matcher ? GetMethod(regexp, @@match);
    2. If matcher is not undefined, then:
      1. Return ? Call(matcher, regexp, « O »);
  3. S ? ToString(O);
  4. rx ? RegExpCreate(regexp, undefined);
  5. Return ? Invoke(rx, @@match, « S »);

You may notice that the function match() is intentionally generic, thus does not require this to be a strin g and yeah, I know it sounds an alphabet soup. So let’s take our first example and analyze is step-by-step:

  1. RequireObjectCoercible will be called with the string’s value, which is in our case ‘AGCATGCTGCAGTCATGCTTAGGCTA’. This method returns an error if the value can not be converted to an object (undefined ou null);
  2. If regexp is not undefined or null, given that dnaPattern = /GCT/g, then:
    1. GetMethod will be called and access the value @@match of regexp, returning a function to be stored in matcher;
    2. If matcher is not undefined, then:
      1. Call will be called and then call the function matcher with regexp‘s value against O (our object), which will return an value, ["GCT", "GCT", "GCT"] in our case;

If our regular expression were undefined or null, the algorithm would have transformed the object this into a string, created a new regular expression and invocated it against the created string using @@match, which is a symbol that contains a method from a regular expression.

Final notes:

  • As the purpose of this post were to show a little about what happens behind JavaScript often-called magical methods, I used a certain level of abstraction when writing it. If there’s interest in learning more, maybe you should read the documentation;
  • If you have interest in String Matching, Introduction to Algorithms by Thomas Cormen has a dedicated chapter (32) on this topic;