Wednesday, October 8, 2008

In Firefox, String and RegExp performance is not a bottleneck for real-world pages
And, SunSpider might weigh String and Regexp too high

(Thanks to collaborators Joel Galenson, David Mandelin, and Ras Bodik.)

We are investigating how and whether to build a parallel web browser. The SunSpider benchmark suggests that 30-50% of time taken by JavaScipts is in String and RegExp (hereafter stregexp) methods. If so, this code is a prime candidate for optimization, including parallelization. But before parallelizing stregexp code, we wanted to confirm SunSpider's conclusions by profiling real web pages.

The short story is: surprisingly, stregexp code takes less than about 5% of JavaScript execution time in real web pages, much less than the 60% our data shows that it occupies in SunSpider. The rest of this post is the long story: our profiling method, results, and more detailed conclusions.

What we wanted to profile


Our question was, “What percentage of JavaScript execution time is spent in Spidermonkey code implementing RegExp and String methods?” To answer this question, we needed to know

  • t: the total time spent interpreting JavaScript

  • r: the time spent in code implementing RegExp
    methods

  • s: the time spent in code implementing String
    methods


This data, along with a modest understanding of the Spidermonkey code, allows us to compute the percentages of JavaScript time spent in the implementations of RegExp (100r/t) and String (100s/t) methods. We would have liked to have simply computed a single stregexp percent time as 100(r+s)/t, but more care was necessary: the JavaScript String class offers three public methods, replace, search, and match, that accept RegExp arguments. As one would expect, their implementation is a wrapper around RegExp code. To avoid double counting RegExp code, we wanted more precision than just r and s, and so we decided to further time the individual methods of RegExp and String (smatch, sreplace, and so forth).

We of course also had to decide on which JavaScripts to gather performance data. In addition to SunSpider, we chose a smattering of real-world sites intended to be representative of the typical browsing of one of the authors. The particular sites chosen are listed below.

    Tests
  • SunSpider suite as a whole

  • SunSpider regexp benchmark alone (regexp-dna)

  • CNN and NY Times

  • Slashdot and Fark

  • Google Maps

  • Google Mail

  • Malicious regexp that causes exponential-squared behavior in JS


The “malicious regexp” was selected to sanity-check the profiling code. It is based on an idea from this page by Russ Cox. Because it causes artificial, worst-case behavior of the stregexp code, it should show that code taking very close to 100% of total execution time.

How we profiled stregexp code


The method is quite simple: we manually inserted timing probes both around the main JavaScript interpreter loop, to compute t above, and around the stregexp code, to compute r, s, etc. This looks like

void
regexpFoo ()
{
float64 start = gettime ();

/* ... original code ...*/

globalTimings->regexpFooTime += (gettime () - start);
}

We considered instrumenting JavaScript code directly and using a sampling profiler like gprof or sysprof, but rejected both. Automatically instrumenting JavaScript code is not as easy as it sounds, and besides, JavaScript only provides millisecond-resolution timers. This also slows down JavaScript considerably. Using a *prof profiler is better, but, unfortunately,
the size of the Firefox codebase makes it difficult to extract detailed information about small modules like the JavaScript stregexp code. It's also impossible to only profile websites' JavaScript execution, and ignore time taken by the Firefox user interface.

Running each test was simple: we opened Firefox, loaded the web site(s) under test, and followed a sequence of steps intended to simulate a normal browsing experience. For example, when testing Google Maps, we searched for one location, zoomed in and out of the map, searched for another, panned around, searched for restaurants, then opened “information bubbles” for a few locations.

We need to say a bit more about testing the SunSpider suite: we ran it normally (loaded it and clicked “Start Now!”), but instead of using SunSpider's reported perfomance, we used our own profiling inside Spidermonkey. This means that we record stregexp performance over all of SunSpider's tests, not just its string and regexp ones.

While each test ran, our profiling code printed a sequence of accumulated performance data, once per call into the JavaScript interpreter. The numbers at each “sample” were the running totals t, r, s, etc., described above. So each performance sample is a snapshot of the percentages of time taken by stregexp versus everything else, over all JavaScript code executed up to that sample. (In particular, from the last sample, we can compute the total percentage of time stregexp has taken while Firefox has been open.) Taking samples allows us to show graphically how stregexp code has been important or less important over time; this is interesting because it might be very important while a page loads
(to parse e-mails, for example), and not important in the application's steady state.

Profiling results


For each test, we show four graphs below. The first is “Time in regexp code vs. other JS.” This plots the percentage 100r/t over time. The x-axis on the graph is the particular performance “sample”, described above, from which the y-axis values were calculated (so the x-axis roughly corresponds to time). The second graph “Time spent in string code vs. everything else” is similar to the first, except that it shows how the percentage 100s/t changes over time.

The third graph “Time spent in each regexp function”, breaks down the regexp time r shown in the first graph among the specific regexp functions over time. The x-axis is the same as in the first and second graph. The fourth graph is similar to the third, but shows string methods.

It's worth explaining in more detail one data series in the third graph: “bookkeeping”. This is the time spent in regexp code but outside of the main regexp interpreter, including setting up the memory allocator, preparing the match results to be returned to caller, etc. (For Spidermonkey geeks: regexp interpreter time is that spent in ExecuteREBytecode(); bookkeeping is the time spent in js_ExecuteRegExp() minus that in ExecuteREBytecode().)

You might notice that in the graphs of the “Malicious regexp” and “Regexp-DNA” results, there are many data points, even though both tests only make a single call to the JavaScript interpreter. This is because we didn't successfully filter all Firefox UI code from the results; only some calls were
filtered. Dave Mandelin suggested a fix to us, but we haven't implemented it yet.

(Apologies for the less-than-ideal layout of the graphs below. Blogger is limited in this regard.)


Malicious regexp






Regexp-dna benchmark from SunSpider






SunSpider






Google Mail






Google Maps






CNN and NY Times






Slashdot and Fark







Conclusions


It appears that regexp and string code does not particularly affect the performance of real web sites. Of all the “real” pages, regexp and string code accounted for no more than about five percent of the total time. This suggests that the performance of these pages will perhaps not be affected much by optimizing this code. A corollary of this is that it appears that the SunSpider benchmark weighs string code disproportionately high (~60%) compared to its use in the real world.

Of the string methods, it appears that String.replace() is approximately the most expensive in the real world; only Gmail behaved differently, with String.split() dominating. Of the time spent in String.replace(), approximately half was devoted to “bookkeeping time.”

Monday, September 3, 2007

Browsing Web 3.0 on 3.0 Watts: Why browsers will be parallel

Slides from presentation at Intel

"A man walks into a bar, asks for a cup and starts his browser." A cup is all he needs to turn a phone into a tablet computer.



We live in exciting times. What starts like a joke is actually the near future of personal computing. Phones, growing in compute power and I/O convenience, will eclipse laptops as predicted long ago by Bell's Law. Constantly in our hands, phones will reshape how we use computers. Thanks to the laptop, students no longer had to walk to a computer lab; thanks to the phone, they'll walk to a bar instead.

But isn't the handheld future already here? A web browser is where we spend more and more of our computer time, and the iPhone comes with a usable browser. So, doesn't the iPhone and its browser give us all we need to replace laptops?

No. Laptops are still more productive than phones because the best phone display is five times smaller than the common laptop display. And then there is the browser speed: the iPhone browser is at least 10-times slower than a desktop browser — too slow for the rich, Web 2.0 web pages. When laptops replaced desktops, it was because they matched desktop productivity. We think the same will be true for handhelds. The key difference is that all the phone needs to displace the laptop is a laptop-quality browser experience.

The iPhone browser speed has been blamed on the slow AT&T network but, as we show below, the blame falls on the phone's power-starved processor. But can't we wait a few years for handheld processors to become faster? After all, laptop processors now run nearly as fast as desktop processors at a fraction of power.

Unfortunately, the trend may not continue long enough, due to inherent physical limitations in semiconductor technology. The problem is the high power consumption and heat dissipation, the so-called "power wall."

The good news is that we can build a sufficiently high-performance handheld processor even with today's semiconductor technology, without further silicon shrinking. The bad news is that the high performance would have to be delivered in many-core parallel processor, rather than a single-threaded Pentium successor. The necessity of resorting to many cores is dictated by physics: many low-power cores provide more instructions per second per Watt than a single high-power core.

So, parallelism can break through the power wall and deliver many CPU cycles on low power. Unfortunately, all web browsers are written as sequential programs and will have to be rewritten to split their work onto the many cores. No one knows how to do it because the components of the browser have long been considered nearly inherently sequential. While scientific computing taught us how to write some parallel programs, these supercomputer lessons do not apply to the browser, motivating interesting new computer science research.

These slides, based on a talk we gave at Intel in Aug 16, 2007, describe why handhelds will eclipse laptops and why browsers will need to be parallel for this to happen. We also give some preliminary results from our research into building a parallel browser.

How did we become interested in browsers? We realized that browsers must be parallel while brainstorming about the vision of the Berkeley Par Lab. We identified a few great "killer applications" enabled by many-core processors, but all of these applications were either already parallel or contained components that naturally could be implemented in parallel, such as machine learning. In these applications, the revolutionary role of upcoming manycores was to (i) fit these applications into smaller form factors (on a desktop rather than on a supercomputer) or (ii) make them to run in real time (rather than in batch mode).

Since all our killer applications appeared to be already parallel, we became curious whether the multicore revolution is going to have any impact at all on the sequential software (it has been widely speculated that we would have to start rewriting today's sequential software into parallel). The web browser, predicted to be the new desktop, was the logical sequential application to examine.

Initially, it seemed that browsers currently run fast enough, and will do so even as web pages get richer. However, measuring today's browsers on old laptops suggested that tomorrow's pages will make today's browsers run unacceptably slow (because neither processors nor browsers will become much faster, while page complexity will increase). Furthermore, Bell's Law's insistence that handhelds will soon play the role of laptops nailed it for us: given the CMOS roadmap, there was no way a sequential browser could run fast enough on a phone.

We have a long way to go, but a great goal. If we can build a parallel web browser, tomorrow's handhelds could provide as rich an experience as today's laptops. This would lead us to the next Bell “computer class.”