We are investigating how and whether to build a parallel web browser. The SunSpider benchmark suggests that 30-50% of time taken by JavaScipts is in String and RegExp (hereafter stregexp) methods. If so, this code is a prime candidate for optimization, including parallelization. But before parallelizing stregexp code, we wanted to confirm SunSpider's conclusions by profiling real web pages.
What we wanted to profile
- r: the time spent in code implementing RegExp
- s: the time spent in code implementing String
- SunSpider suite as a whole
- SunSpider regexp benchmark alone (regexp-dna)
- CNN and NY Times
- Slashdot and Fark
- Google Maps
- Google Mail
- Malicious regexp that causes exponential-squared behavior in JS
The “malicious regexp” was selected to sanity-check the profiling code. It is based on an idea from this page by Russ Cox. Because it causes artificial, worst-case behavior of the stregexp code, it should show that code taking very close to 100% of total execution time.
How we profiled stregexp code
float64 start = gettime ();
/* ... original code ...*/
globalTimings->regexpFooTime += (gettime () - start);
Running each test was simple: we opened Firefox, loaded the web site(s) under test, and followed a sequence of steps intended to simulate a normal browsing experience. For example, when testing Google Maps, we searched for one location, zoomed in and out of the map, searched for another, panned around, searched for restaurants, then opened “information bubbles” for a few locations.
We need to say a bit more about testing the SunSpider suite: we ran it normally (loaded it and clicked “Start Now!”), but instead of using SunSpider's reported perfomance, we used our own profiling inside Spidermonkey. This means that we record stregexp performance over all of SunSpider's tests, not just its string and regexp ones.
(to parse e-mails, for example), and not important in the application's steady state.
For each test, we show four graphs below. The first is “Time in regexp code vs. other JS.” This plots the percentage 100r/t over time. The x-axis on the graph is the particular performance “sample”, described above, from which the y-axis values were calculated (so the x-axis roughly corresponds to time). The second graph “Time spent in string code vs. everything else” is similar to the first, except that it shows how the percentage 100s/t changes over time.
The third graph “Time spent in each regexp function”, breaks down the regexp time r shown in the first graph among the specific regexp functions over time. The x-axis is the same as in the first and second graph. The fourth graph is similar to the third, but shows string methods.
It's worth explaining in more detail one data series in the third graph: “bookkeeping”. This is the time spent in regexp code but outside of the main regexp interpreter, including setting up the memory allocator, preparing the match results to be returned to caller, etc. (For Spidermonkey geeks: regexp interpreter time is that spent in ExecuteREBytecode(); bookkeeping is the time spent in js_ExecuteRegExp() minus that in ExecuteREBytecode().)
filtered. Dave Mandelin suggested a fix to us, but we haven't implemented it yet.
(Apologies for the less-than-ideal layout of the graphs below. Blogger is limited in this regard.)
Regexp-dna benchmark from SunSpider
CNN and NY Times
Slashdot and Fark
It appears that regexp and string code does not particularly affect the performance of real web sites. Of all the “real” pages, regexp and string code accounted for no more than about five percent of the total time. This suggests that the performance of these pages will perhaps not be affected much by optimizing this code. A corollary of this is that it appears that the SunSpider benchmark weighs string code disproportionately high (~60%) compared to its use in the real world.
Of the string methods, it appears that String.replace() is approximately the most expensive in the real world; only Gmail behaved differently, with String.split() dominating. Of the time spent in String.replace(), approximately half was devoted to “bookkeeping time.”