Thursday, December 15, 2011

Box2D as a Measure of Runtime Performance

For those unfamiliar with it, Box2D is a great 2D physics library written by Erin Catto, which is at the core of a large number of casual games on consoles and mobile devices. Angry Birds is one you might have heard of, but there are many, many others.

It's also not a simple library by any means. When porting Angry Birds to HTML5, we found that in some cases Box2D performance could be the limiting factor in the game's frame-rate (on the more complex levels). It turns out this little library is doing a lot of work under the hood. And the work it's doing isn't limited to any one tight loop or hotspot. Rather, its work is distributed all over the place -- matrix and vector math, creation of lots of small objects, and general object-oriented logic distributed over a complex code base.

Motivation

Box2d makes a great general benchmark -- it's a bottleneck on real-world apps, and there's no one thing that a VM or compiler can optimize to "fix it". It also has the nice property that it's been ported to lots of different platforms -- the original is written in C++, and it's been ported to ActionScript, Java, Javascript, and several other systems. So I took it upon myself to put together a little benchmark that measures the time it takes to simulate one frame of a reasonably complex Box2D world.

The goal of this little experiment is not to add fuel to the flames of the Internet's already-tiresome "Compiler and VM Wars" -- rather, my intention is to get some hard data on what behavior real-world performance-sensitive code can actually expect to see in practice on various platforms. Measuring the performance of virtual machines in isolation is particularly tricky, but this benchmark has the nice property that, if a VM or compiler improves it, then real-world problems are actually solved in the wild, and everyone wins.

The Platforms

My intention is to measure the performance of various platforms, not particular Box2D ports. The ports themselves necessarily vary somewhat from one-another, but they should still be broadly equivalent. The reason Javascript VMs are represented four times in this list is that I wanted to make sure that we compared Javascript VMs, at their best, to the JVM, NaCl, and native code.

  • Native : This is the standard Box2D code, compiled via gcc or clang/llvm (the latter on my test machine, as described below).
  • NaCl : The same code, compiled via the NaCl SDK's custom gcc build, and running within Chrome.
  • Java : The JRE (1.6), as currently shipped by Apple on Mac OS 10.6.
  • Box2D-web : The hand-written Javascript Box2D port, on various browsers.
  • Emscripten : The original C++ code, compiled via Emscripten to Javascript.
  • Mandreel : The original C++ code, compiled via Mandreel to Javascript.
  • GwtBox2D : The Java port, compiled via GWT to Javascript.

The Test

World

Picking the right world structure for this kind of benchmark is a bit tricky, because it needs to have the following properties:

  • A high per-frame running time.
  • Not settle quickly: simulations that settle eventually stop doing work, as the physics engine skips calculations for settled objects.
  • Stable: Subtle variations in the behavior of floating-point math can cause the behavior on different VMs to diverge badly, invalidating comparisons.

I eventually settled on a simple pyramid of boxes with 40 boxes at the base, for a total of about 800 boxes. I manually verified that this simulation is stable on all the systems tested, and that it doesn't settle within the number of frames simulated for each test. It takes at least 3-4ms to simulate in native code, and only gets slower from there, so the running time is sufficient to avoid problems with timer resolution.

Code

All the test code is available on GitHub. Some of it requires a bit of care and feeding to get running (especially the C++ version), but should allow you to reproduce these results if so desired. There are also copies of the Mandreel and Emscripten Javascript output checked in, so you won't have to go through the pain of building those yourself.

Test System

I used my MacBook Pro 2.53 GHz Intel Core i5 as a test machine. It seems a fairly middle-of-the road machine (maybe a bit on the fast side for a laptop). As always, your mileage may vary.

Data

The raw data I collected is in the following spreadsheet. I ran each test several times, to mitigate spikes and hiccups that might be caused by other activity, and to give each system a fair chance. Each run warms up over 64 frames, and then runs for 256 frames -- I've confirmed that the simulation is stable over this duration on all the platforms under test.

First, let's look at all the results together, on a log-scale graph:

The first thing you'll notice is that there are three clusters of results, centered roughly around 5ms, 10ms, and 100ms. These clusters are associated with native code, the JVM, and Javascript VMs.

Now let's compare the best of the best of each of these three groups.

This is similar to the first graph, except that we've removed the noise of NaCl (which is within 20-30% of raw native performance) and all but the best of the various Javascript approaches. This looks a bit better for Javascript, clocking in at a mean of around 50ms (interestingly, this number is achieved by the Mandreel C++ cross-compiler running on Chrome 17; more on this below).

Analysis

Looking at the last graph above, I believe the most important observation is that, while Javascript implementations have improved drastically in recent years, the best result achieved by any Javascript implementation is still more than an order-of-magnitude slower than native code. This is perhaps an unsurprising result to many, but some have begun suggesting that modern Javascript VMs are "nearing native performance". While this may be true for some workloads, they still have a long way to go in this case.

This also demonstrates that native code compiled via NaCl stays within 20-30% of the performance of un-sandboxed native code, which is in line with what I've been told to expect.

Finally, it's somewhat interesting to see that the JVM is running 3x slower than native code. I've been told, anecdotally, that OpenJDK on Mac OS X is producing numbers closer to 6ms/frame, which would be ~50-60% slower than native, but I need to confirm this. I don't think it can be proven, but I suspect the JVM's performance can be taken as a rough lower-bound on what one can expect from any dynamic VM. Thus, Javascript VM implementors can take the JVM's performance as a reasonable target to shoot for.

Appendix: Javascript Implementations and Compilers

While it wasn't a primary goal of these benchmarks, they do give some interesting data about Javascript VMs, and the various compilers that use Javascript as a target language.

The following is a graph of the Javascript VMs in Chrome, Safari, Firefox, and Opera (IE9's Chakra VM is not yet included because it seems that Box2D-Web is using Javascript properties, which it doesn't support yet).

First off, all the VMs tested are well within 3x of each other, which is wonderful, because wildly variant performance across browsers would make it exceedingly difficult to depend upon them for any heavy lifting. V8 and JSCore are quite close to one-another, but JSCore has an edge in variance. It's not immediately obvious what's causing this, but GC pauses are a likely culprit given the regularly-periodic spikes we see in this graph.

Now we move to the various Javascript compilers. Here we have Box2D-Web (representing "raw" Javascript), Emscripten, Mandreel, and GWT (Closure Compiler should give performance roughly in line with "raw" Javacript on any modern Javascript VM).

The real shocker here is that Mandreel outperforms all the others fairly consistently, given that it's translating C++ to Javascript (!). Note that the Emscripten results are not as good as they could be -- the author is currently finishing up a new optimizer. I also believe the GWT compiler should be doing a better job than it is; there are a couple likely culprits in the compiled output involving static initializers and array initialization overhead. It should be possible to optimize these out and improve the results.

Note: See the update below about Emscripten performance

Caveats

As with all benchmarks, especially ones as fuzzy as this, there are a lot of caveats.

The code's not identical

These are all ports of the same C++ source code, so by their nature they must vary from one another. It may be the case that a particular port is unfairly negatively biased, because of particular idioms used in the code. If you suspect this is the case, please say so and preferably offer a patch to the maintainer. These aren't being used in the same way as, e.g., the V8 or Kraken benchmarks, so it's entirely fair to optimize the code to get better numbers.

I may have made mistakes

I suppose this goes without saying, but there are a lot of knobs to be tweaked here, and there could easily be something sub-optimal in my configuration, makefiles, or what-have you. If you notice anything amiss, please say so and I'll try to address it.

This is just one machine

As described above, I ran these tests on my MacBook Pro. The relative performance of these tests might not be the same on different machines.

Update

Based on feedback from comments and elsewhere, I've updated a few of the numbers in the linked spreadsheet, along with the graphs.

  • Emscripten: I got an updated build with a better compiler, and the numbers are now much more in line with the other Javascript implementations.
  • Java: I updated the JVM implementation to use a faster sin()/cos() implementation (there was a flag for this in the JBox2D code that I simply missed. It's no about 2.5x the speed of the native implementation. I also wasn't entirely clear about what I meant by the "JVM" -- so to be clear, this means the standard Mac OS X JDK 1.6 server JVM (there is no standard client JVM on the Mac).
None of this changes the core conclusions in any substantive way, however.

Thursday, September 16, 2010

An Ugly Bug in the IE9 Beta

Hot on the heels of my very happy discovery that IE9 finally plugs its leaks, I've found a subtle-but-important bug in the IE9 beta. Bear with me, as it's a little tricky to explain.

The Good News

IE9 finally implements the standard HTML5 DOM element interfaces, which will make many things simpler. Further good news: IE9 includes a nice debugger you can use to explore these interfaces. As I understand it, the IE team has cleaned up all the bizarre old COM bindings that have been giving developers fits for years. So when you inspect an element in their nifty new debugger, you get something like this:

document.body   {...}                     [Object, HTMLBodyElement]
- accessKey     ""                        String
- appendChild   function appendChild(...  Object, (Function)
...

The Bad News

This is beautiful, and matches your expectations of the interfaces quite nicely. But then I discovered this little gem:

elem:           {...}                     DispHTMLImg
- [Events]
- [Expandos]
- [Methods]
- accessKey     ""                        String
...

What on earth is this? It sure looks like an IDispatch interface to an element -- but I thought we weren't supposed to be seeing that sort of thing anymore. But if you resolve properties on the object using the Javascript VM, most of them resolve the same way, so no harm done, right?

Not so fast. When digging into a bug in my code, I kept running into this bizarre situation where elements didn't seem to be comparing properly. Specifically, I got into a situation where (ElemA == ElemB) and (ElemB != ElemA). These were two different elements, so they shouldn't have been equal to one another anyway, but the asymmetric equality relation was a really big surprise!

As you might have guessed, one of these two elements was an HTMLElement, while the other was a DispHTMLDivElement. Ok, if one of them is a Disp interface to an element and the other is a native DOM host object, you can imagine how the comparison might get screwed up (I'm going on the assumption that IE didn't expect to have those Disp objects exposed at all). Which begs the question of how I got that reference in the first place.

When I tried to reproduce the bug in isolation, everything seemed to work fine -- no Disp references in sight. I finally tracked it down to the fact that my code was running in an iframe, while the DOM elements themselves were in the outer frame (this is a not-uncommon technique for isolating code). Specifically, it seems to be triggered by the following situation:

Outer page:
  <div id='target'>...</div>

IFrame:
  <script>
  var target = parent.document.getElementById('target');
  target.onclick = function(evt) {
    // both 'evt' and 'elem' will be Disp interfaces
    var elem = evt.currentTarget;
  };
  </script>

So it appears that something's going wrong when marshalling the event object from one frame to the other. And once you get one of these funky Disp objects, all references you get from it will be Disp objects as well. Which opens you to these comparison failures.

A couple of caveats

I'm assuming that the "Disp" part of these objects' names refers to IDispatch, but if that's not correct it doesn't really change much. Also, you may have noticed that I used the == comparison operator above -- it turns out that === behaves as expected. However, there's no good reason to use === when comparing two objects.

A possible explanation

If I understand IE's architecture correctly, older versions appeared to use DCOM for cross-frame communication. If I'm correct about this (and it's still the case in IE9), then it may be that something just went wrong in the marshalling of references from one frame to another (hence my assumption that "Disp" means "IDispatch").

Does This Really Matter?

Yes. It might seem really subtle, but these are the kinds of bugs that can take hours or days to track down when something goes wrong (and for which the fix is non-obvious at best). And while putting your code in an iframe might seem like a slightly odd thing to do, there are very good reasons for it under some circumstances (I'll have more to say on precisely why this is important in a follow-up post).

Repro

I've posted a relatively simple reproduction case here. It's a little screwy, because it's a case hoisted out of a much more complex app, but it should illustrate the issue reasonably well.

IE9: Memory Leaks Finally Declared Dead

It is with great pleasure that I can finally declare the infamous, painful, long-standing, never fixed IE memory leak bug fixed! With the release of IE9, I have verified that every leak pattern I'm aware of is fixed. It's been a long-time coming, but I'm starting to feel more confident that IE9 can be reasonably called part of the "modern web" -- the web that is sufficiently powerful to support complex applications, and not just lightly scripted documents.

One caveat: Do be aware that your "standard" pages need to explicitly request "IE9 Standards" mode, using either an HTTP response header or a meta tag like the following:

<meta http-equiv='X-UA-Compatible' content='IE=9'/>

Failure to do so, in addition to giving you all the old crufty bugs and quirks in previous IE versions, will continue to leak memory, presumably because it is using the DLLs from the old rendering engine.

Now perhaps I can finally stop writing about this stupid bug!

Friday, April 30, 2010

Trying out the new Wave element

Google Wave just added a new "web element" that makes it easy to embed a wave in a blog or article, so I thought I'd try it out. If you're one of the two people who actually reads this, give it a shot. If you don't have a Wave account, you won't be able to edit -- if you need an invite, ping me at jgw@pobox.com.

Friday, April 02, 2010

Quake II in HTML5 -- What does this really mean?

Now that we've finally been able to push our port of Quake II to the browser public, it's time to discuss the questions "what's the point?" and "what does this mean for the future?".

Let me begin with a tweet I saw last night, which neatly summarizes a very salient point:

Not sure if the best endorsement of JS engine speed in 2010 is ports of games from 1997...

http://twitter.com/tberman/status/11446377136

Well said. We should be setting the bar higher than this. The choice of Quake II was mainly predicated on the fact that a Java port already existed, and this was just a 20% project (more like -5%, actually -- nights and weekends for the most part). I'm pretty certain Quake III would have ported just as easily (perhaps more easily, as it was written specifically for hardware acceleration and likely leans on the hardware a little more).

So then there's the fact that it's running at frame rates about a third of what's possible on the same hardware in C (or on the JVM, for that matter). There are a few reasons for this:

  • There are inefficiencies still to be worked out in WebGL implementations, especially the expensive frame buffer read-back on Chrome.
  • There are things being done in Javascript that really ought to be done in vertex shaders; this especially applies to things like model animation interpolation, which is a nasty hot spot in the Javascript code that could easily be pushed to the shader.
  • There are some things, such as dealing with packed binary data structures, that are incredibly inefficient in Javascript (and I mean something like 100x slower than JVM or C code). This can be largely mitigated through better APIs, such as the Khronos Group's Typed Arrays spec.
  • This code is fairly idiosyncratic, having been ported from straight C, and exercises some code generation paths in the GWT compiler that could be better optimized (using lots of small arrays, and lots of static methods, still needs some work).

I would be willing to hazard a guess that we could easily get another 30% out of the frame rate with relatively minor tweaks. If the game were written from the ground up to work with web technologies, it would likely be twice as fast on any reasonable metric (frame rate, startup time, etc.). That's an extremely rough estimate, but I'd be willing to bet on it.

So back to our original question. What's the point? What this code currently proves is that it's feasible to build a "real" game on WebGL, including complex game logic, collision detection, and so forth. It also did an excellent job exposing weak spots in the current specs and implementations, information we can and will use to go improve them.

Now if I were starting with a plain C code base, I would most likely prefer to port it using a technology like NaCl -- that's what it's for. But while it's not feasible to actually ship games on WebGL yet (it will be a while before it's widely deployed enough for that), one can envision a world where game developers can deploy their games as easily as we currently deploy web pages, and users can play them just as easily. And that's a pretty damned compelling world to imagine.

Friday, September 11, 2009

Javascript Variables, Continued

In my previous post, I addressed the question of the speed of variable access, as originally explored by Mike Wilcox here. In the comment section of his blog, he posted a cogent response to my concerns that I failed to address at the time (about a month ago). Sorry about that, Mike -- I blame the 10-month-old girl who consumes most of my time and attention lately :)

Terminology

A commenter on that post appears to take serious offense with my choice of terminology, so let's start by clearing that up:

  • Array Notation: What I am referring to here is the array-like syntax for accessing properties on Javascript objects. I am fully aware that there is a difference between accessing string-keyed and integer-keyed properties in this way. But it is not uncommon to refer to this as array-access notation. Call it what you will, but I suspect anyone reading this will have little trouble understanding what I'm referring to.

  • Globals and Fields: Perhaps I'm letting my Java-centric view of the world poke through a bit here, but I thought it was fairly clear from the examples I gave. I was using the term "global" to refer to a globally-scoped variable defined with an explicit "var" statement. The term "field" I used somewhat loosely to refer to an explicitly-defined property on an object. Again, sorry if this wasn't clear from context. Henceforth, I'll refer to the latter as "properties", which is more accurate in a Javascript context.

Now allow me a moment to clarify speicifically why I bring up the question of "array access" notation vs. "dot" notation (call them what you will). The problem with using them interchangeably in performance tests is that while logically they both refer to properties on an object, we have no reason to believe that their performance will be identical in all Javascript engines. In addition, there is good reason to suspect that accessing globals via window[name] may be significantly different from simply accessing them via an unadorned name expression.

Mike's Concerns

From Mike's comment:

I think we were going for slightly different goals, and that makes your tests very complimentary to mine.

This is an entirely fair point. Let me attempt to justify the particular forms of variable access that I chose to test. In a nutshell, I wanted to test the precise form of usage that I see in much, if not all, idiomatic Javascript, and that we produce from the GWT compiler. This involves global variables that are declared explicitly, as in:

var g0 = 0, g1 = 'foo', ...;

And objects whose fields (or properties, if you prefer), are explicitly declared and initialized (usually in a constructor function, but the expression form below is common as well):

{
  f0: 0,
  f1: 'foo',
  ...
}

These correspond to static and instance variables, respectively, in Java source code.

One thing to note, and I’m pretty sure neither of us did, is that the JS engine may possibly resolve those variables in the code block before the code is executed. Keeping them in functions (and not in the global scope) should prevent that.

setLVars = function(){ var a0=0;var a1=1;var a2=2;var a3=3;var a4=4; … 5000 }

I believe this is referring to the above test of initializing local variables, and it doesn't sound inplausible. In an attempt to work around this possibility, the "locals" test I added separates variable declarations from their assignments, and uses the same random values as all the other tests to initialize them.

A word on closures

When I added the test for variable access "via closure", it was to test a structure I've seen used a lot to isolate the "global" namespaces used by separate chunks of Javascript code. This pattern tends to look something like this:

(function() {
  var g0, g1, ...;

  function init() {
    // Do stuff with g0, g1, ...
  }

  init();
)();

This is a really useful pattern when you want to keep separately-loaded scripts from stepping on each-others' toes, and I thought it worthwhile to see if using such a structure imposed any significant overhead on access to globals.

You may note that my results for closures are much more flattering on some browsers than before. This is because I made a bad typo in the test and failed to declare the variables properly. This led to them being declared implicitly in the global scope.

Test Code

It also appears that both my dear commenter and Mike Wilcox himself had some difficulty reproducing my results with the simple snippets I provided, so I am publishing the test page here. I've made a few changes in the process:

  1. I have changed references to "fields" to refer to "properties" for clarity.
  2. I added a test for "locals" that should be self-explanatory.
  3. I fixed a bug in the "closures" case that was skewing the results fairly badly.

If anyone notices anything amiss with this code, please do let me know. I am aware that the tests conflate read and write times, and that the accumulator code makes their absolute values irrelevant. What I'm attempting to tease out here is only the relative costs of defining variables in each scope.

Results

(Note that for simplicity's sake, I've simplified these results to only cover Safari 4, IE8, Firefox 3.5, and Chrome 2)

MacBook Pro 2.4 GHz Core 2 Duo:

  • Safari 4:

    • Globals: 11ms
    • Properties: 24ms
    • Locals: 12ms
    • Closures: 11ms
  • Firefox 3.5:

    • Globals: 44ms
    • Properties: 47ms
    • Locals: 14ms
    • Closures: 91ms

VMWare on aforementioned Mac:

  • IE8:

    • Globals: 31ms
    • Properties: 47ms
    • Locals: 15ms
    • Closures: 31ms
  • Chrome 2:

    • Globals: 28ms
    • Properties: 30ms
    • Locals: 5ms
    • Closures: 6ms
  • Firefox 3.5:

    • Globals: 32ms
    • Properties: 35ms
    • Locals: 16ms
    • Closures: 69ms

Again, please note that the results for closures are rather different than in my previous post, because of the aforementioned bug in my original test.

Regardless, though, the most clear patterns that emerge here are as follows: - In none of these cases are properties on a local object faster to access than globally-scoped variables. - Locals are almost invariably faster than (or the same speed as) global or property accesses, which is not terribly surprising. - Accessing variables via closure can be as fast as accessing variables in the global scope, but on Chrome it's actually faster, while on Firefox 3.5 it's about twice as slow.

Monday, August 10, 2009

Where should I define Javascript variables?

I recently stumbled across this article by Mike Wilcox at SitePen, which suggests that defining Javascript variables in the global scope causes reads and writes to them to be much slower than if they are defined as fields on an object. While Javascript VMs never fail to surprise in their strange and disparate behavior, something didn't quite smell right about this, so I decided to dig in a bit further.

As I understand it, the investigation was triggered by Mike noticing a large number of global variables in the Google Docs graphics editor. The variables in question are defined something like this:

var rs=parseFloat;
var us="shape",vs="activeElement", ...

In all likelihood, a combination of normal static variables, interned strings, and such. In order to test the hypothesis that defining these variables in the global scope could be a performance problem, he created a test that looks something like this:

// Assume that v[i] is a randomly-created variable name.
obj = {};
setLocals = function(){
  for(var i=0;i<v.length;i++){
    obj[v[i]] = true;
  }
}

setGlobals = function(){
  for(var i=0;i<v.length;i++){
    window[v[i]] = true;
  }
}

Tests for reads follow basically the same form. You can see the results in the linked article, but the upshot is that accessing "globals" in this manner is anywhere from 1x to 10x slower than the equivalent object field access.

There are two main problems with this test:

  • It always accesses globals via the window object explicitly.
  • It doesn't explicitly declare the variables in either scope.
    • This conflates creation with assignment.

Let's rewrite these tests to be closer to normal, idiomatic Javascript usage. We will declare global variables explicitly, and reference them implicitly. We will also explicitly declare the object fields explicitly, for as close an apples-to-apples comparison as possible. Note that the random variables, as well as the accumulator and return value, are probably-unnecessary attempts to normalize for any static optimization tricks that newer Javascript VMs might be playing.

// Allocate a bunch of random values to be used later in assignments.
var r0 = Math.random(), r1 = Math.random(), ... ;

// Test assignment and reading of global variables.
var g0 = 0, g1 = 0, ... ;
function globals() {
  var a = 0;
  g0 = r0; g1 = r1; ... ;
  a += g0; a += g1; ... ;
  return a;
}

// Test assignment and reading of fields on a local object.
var obj = {f0 : 0, f1: 0, ... };
function locals() {
  var a = 0, o = obj;
  o.f0 = r0; o.f1 = r1; ... ;
  a += o.f0; a += o.f1; ... ;
  return a;
}

In addition, we'll test the the performance of the increasingly common pattern of scoping variables via closure.

// Test assignment and reading of locals via closure.
function closures() {
  var c0 = 0; c1 = 1; ... ;

  return function() {
    var a = 0;
    c0 = r0; c1 = r1; ... ;
    a += c0; a += c1; ... ;
    return a;
  };
}

This is all far from "real-world" Javascript, but it's a lot closer than referencing everything by name, using the array notation. I generated the above code such that there were 1000 variables, assignments, and reads. I then averaged the times over 100 runs, getting the following results:

MacBook Pro 2.4 GHz Core 2 Duo:

  • Safari 4:

    • Globals: 15ms
    • Fields: 30ms
    • Closures: 30ms
  • Firefox 3.0.12:

    • Globals: 60ms
    • Fields: 43ms
    • Closures: 78ms
  • Firefox 3.5.2:

    • Globals: 43ms
    • Fields: 46ms
    • Closures: 52ms

VMWare on aforementioned Mac:

  • IE8:

    • Globals: 31ms
    • Fields: 47ms
    • Closures: 47ms
  • Chrome 2:

    • Globals: 31ms
    • Fields: 47ms
    • Closures: 47ms
  • Firefox 3.0.8:

    • Globals: 35ms
    • Fields: 29ms
    • Closures: 68ms

The absolute values aren't as important as the relative values for each browser. The most notable pattern is that globals and fields don't perform all that differently from one-another, and referencing variables via closure is almost invariably slower. This is a random smattering of browsers, and doubtless others will perform differently. But most importantly, there's no clear-cut advantage to either one.

This has limited implications for hand-written code; you're almost always going to want to write whatever makes the most sense to you, because the differences aren't all that great either way. It does have strong implications for tools (like GWT) that generate Javascript. Such tools have a high degree of latitude in how they define variables, and these small effects can be amplified for large programs.

I'd say the moral of this story is that you must be careful to measure precisely what you think you're measuring, especially in as complex and unpredictable a doman as Javascript VMs. I can understand how Mike arrived at this point -- in any sane world, (window[name]) would be the same as (name), since they do precisely the same thing. But we're not in a sane world here, are we?

Sunday, July 12, 2009

Memory Leaks in IE8

Now that IE8's out, it seems I get to revisit this topic once again, which is getting quite tedious. When Microsoft first began touting IE8 features, I noticed a couple of pages pointing out that they had done a great deal of work to "mitigate" memory leaks in IE. The word "mitigate" sounds a bit fishy, as the source of the problem is pretty fundamental to the design of the COM interface that their script engine uses to access the DOM and other native objects.

As you may recall, IE7 contained a rough attempt to solve this problem by walking the DOM on unload, cleaning up leaks on any elements still there. This helped somewhat, but left many common leak patterns unresolved (in particular, any element removed from the DOM could still leak easily).

From my tests, IE8 appears to have resolved all of the most common leak patterns (as described in the two IE8 links above). In particular, I can't uncover a single leak that doesn't at least get cleaned up on unload. This is good news for IE users, because under most circumstances it means that the browser won't get slow and bloated over time.

How IE8 leaks

With some cursory testing, however, I have uncovered at least one pattern that still leaks memory on IE8. Consider the following code (which you can run here):

// This approach hangs a massive js object from a dynamically created DOM
// element that is attached to the DOM, then removed. This pattern leaks
// memory on IE8 (in "IE8 Standards" mode).
function spew() {
  // Create a new div and hang it on the body.
  var elem = document.createElement('div');
  document.body.appendChild(elem);

  // Hang a *really* big-ass javascript object from it.
  var reallyBigAss = {};
  for (var i = 0; i < 5; ++i) {
    reallyBigAss[i] = createBigAssObject();
  }
  elem.__expando = reallyBigAss;

  // Complete the circular reference.
  // Comment out the following line, & the leaks disappear.
  elem.__expando.__elem = elem;

  // Remove it from the DOM. The element should become garbage as soon as
  // this function returns.
  elem.parentElement.removeChild(elem);

  // Just to give it a fighting chance to collect this garbage.
  CollectGarbage();
}

function createBigAssObject() {
  var o = {};
  for (var i = 0; i < 100000; ++i) {
    o[i] = 'blah';
  }
  return o;
}

This will leak at runtime on IE8. It does get cleaned up when the page is unloaded, but it can still be a serious problem for long-running pages (complex Ajax applications, for example).

This particular example is admittedly somewhat contrived, but it is actually isomorphic to a common use pattern:

  • Create an element.
  • Attach it to the DOM.
  • Create some event handlers that result in circular refs.
  • At some point in the future, after the user's done with it, remove it.

Sounds like a popup, a menu bar, or just about any interactive element that gets created and removed in an application, n'est-ce pas?

What should I do about it?

Honestly, I would advise that you continue to do whatever you always have done. Most Java[script] libraries (GWT, Dojo, jQuery, Prototype, etc.) already have code in place to clean up these sorts of leaks, and they will continue to work as advertised (I've personally checked GWT for leaks on IE8). It is unfortunate that we have to continue doing these things, because they have a non-trivial performance cost; and although it's taken a while, WebKit and Gecko seemed to have finally nailed their own memory leak issues.

Aside: Drip is dead

I wrote Drip some years back in order to help track down memory leaks on Internet Explorer. I incorrectly assumed that it would be useful for a year or two, as the problem would eventually be dealt with by Microsoft.

Well, they did finally deal with the problem, primarily by building their own memory leak detector. The good news is that it works quite well, and is probably much more comprehensive than Drip ever was (and I haven't had much time to maintain it). The first caveat I would add is that you almost invariably want to change it to report "actual leaks" -- I don't find the IE6 and IE7 options to be useful in practice. The really bad news is that it isn't useful on IE8 -- it will install, but doesn't catch any actual leaks, as far as I can tell.

Oct 28, 2009 update: Keilaron kindly points out that Microsoft's memory-leak detector article has been updated, and I've updated the above link accordingly. The linked article still claims that "In IE8 the problem is completely mitigated.", which appears to still be an overstatement at best.

GWT, JavaScript, and the Correct Level of Abstraction

[note: I originally posted this last December on the "Unofficial GWT Blog" that Scott Blum and I created, but neither one of us has had much time to maintain it, so I'm copying it back here to keep everything in one place.]

John Resig seems to have stirred up a bit of a hornet's nest with a recent post on JavaScript language abstractions. I don't want to wade into the details of the argument; I think Francisco over at cappuccino.org does a great job of explaining the value of language abstractions that I largely agree with.

However, I've noticed a number of misunderstandings about GWT in this and other articles, and I feel some clarification is in order.

Why Java?

First and foremost, let me clarify yet again the reasons behind the decision to use Java as a source language. Allow me to quote from the "Making GWT Better" document:

Why does GWT support the Java programming language instead of language X? In a word, tools. There are lots of good Java tools. That's the entire explanation. It isn't that we don't like language X or that we think the Java programming language is somehow superior. We just like the tools.

This point has perhaps been beaten to death, but there are a lot of good tools for Java, whether you love the language itself or not. Code completion, automated refactoring, static error detection, code coverage, testing, and so forth, are pretty hard to give up once you're used to them.

But wait, there's more. Let's talk about compiler optimizations a bit. I'll quote Ray Cromwell, from the comments on Resig's article:

... Aggressive compiler optimizations, that happen before code is sent down the wire. This is a problem that no amount of trace-jitting will solve. Far from being bloated, as I demonstrated in my GWT Query presentation at Google I/O, GWT can produce code an order of magnitude smaller than most selector libraries in specific circumstances. GWT can prune dead code far better than any JS optimizer, and can obfuscate code to reduce size in a way that helps maximum GZIP/DEFLATE algorithms.

What Ray's referring to here is the fact that a statically-typed language like Java allows you to perform absolutely correct optimizations far in excess of what is achievable without type information (There's a lot more to say on this topic, which I'll save for a future post).

Why is this so important for JavaScript output? In a word (or two), code size. Code that grows without bound is a serious problem for Javascript applications, and static analysis gives us a lot of leverage over the problem. The GWT compiler can determine precisely which classes, methods, and fields are actually used and aggressively prune everything else. And this pruning feeds back into the iterative compilation process, allowing still more optimizations (e.g., type-tightening, devirtualization, and so forth) to be performed.

Indeed, you can take this even further, with a concept we refer to as runAsync(). With static whole-program analysis, it is possible to automatically break a program into optimal fragments at user-defined cut points. This is still experimental, but the preliminary results look pretty good.

To put all this in concrete terms, check out this great example on Ray's blog showing the following transformation:

  public class MyArray implements Iterable<string> {  
    private String[] items = {"foo", "bar", "baz"};  

    public Iterator<string> iterator() {  
      return new StringArrayIterator(items);  
    }  

    private class StringArrayIterator implements Iterator<string> {  
      private String[] items;  
      private int index;  

      public StringArrayIterator(String[] items) {  
        this.items = items;  
        this.index = 0;  
      }  

      public boolean hasNext() {  
        return index < items.length;  
      }  

      public String next() {  
        return items[index++];  
      }  

      public void remove() {  
        throw new UnsupportedOperationException();  
      }  
    }  
  }  

  void iterate() {
    MyArray m = new MyArray();  
    for(String s : m)   
      Window.alert(s);  
  }

iterate() becomes

  function $iterate(m){  
    var s, s$iterator;  
    for (s$iterator = $MyArray$StringArrayIterator(new MyArray$StringArrayIterator(), m.items);  
       s$iterator.index < s$iterator.items.length;) {  
      s = s$iterator.items[s$iterator.index++];  
      $wnd.alert(s);  
    }  
  }  

which becomes something like

  function x(a){var b,c;for(c=y(new z(),a.a);c.b<c.a.length;){b=c.a[c.b++];$wnd.alert(b);}}

The point of going into all this detail about tools and optimization is that choosing Java as a source language gives GWT leverage that would have been provably impossible in JavaScript. It's most emphatically not about loving or hating any given language, or providing Java programmers with a way to avoid JavaScript -- it's a pragmatic decision based upon specific, measurable benefits.

I've seen lots of assertions that various languages are either "higher level" or "lower level" than others, without any clear definition of what metric is being used to justify these statements. For example, "JavaScript is the assembly language of the web" or "Java is a low-level language because it doesn't have closures and dynamic typing". These sorts of arguments are pointless at best, and at times simply disingenuous. What matters is not some ill-defined notion of a language's "level of abstraction", but rather what you can actually achieve with it.

JavaScript Interop

So what if I need to write or use existing Javascript code? Most everyone seems to finally be aware that it's possible to write JavaScript directly in your Java source using JavaScript Native Interface methods, like so:

  // Java method declaration...
  native String flipName(String name) /*-{
    // ...implemented with JavaScript
    var re = /(\w+)\s(\w+)/;
    return name.replace(re, '$2, $1');
  }-*/;

What doesn't seem to be clear is just how fundamental a feature this is. I've seen various people suggest that this is somehow "circumventing" the "normal" way of doing things -- and while it's true that you lose some of the benefits of Java type-checking, optimization, and debugging in these methods, they're also the foundation upon which all the GWT libraries are built. And there's no particular reason they have to be short snippets, either. They can be complex methods and classes that reach back into Java code, pass exceptions around, and so forth.

What about calling into existing JavaScript libraries, or exposing GWT classes to JavaScript code? For the former, check out Sanjay Jivan's SmartGWT library, which wraps the massive Isomorphic SmartClient library. For the latter, have a look at Ray Cromwell's GWT-Exporter gwtexporter library.

It's also worth noting that this functionality, combined with Overlay Types makes it really easy to efficiently parse and operate on JSON structures. Once again, you get a side-benefit: once you've written the overlay type for a particular JSON type, everyone using it gets code completion and documentation for free. For example:

  var customerJsonData = [
    { "FirstName" : "Jimmy", "LastName" : "Webber" },
    { "FirstName" : "Alan",  "LastName" : "Dayal" },
    { "FirstName" : "Keanu", "LastName" : "Spoon" },
    { "FirstName" : "Emily", "LastName" : "Rudnick" }
  ];

  class Customer extends JavaScriptObject {
    protected Customer() { } 

    public final native String getFirstName() /*-{ return this.FirstName; }-*/;
    public final native String getLastName()  /*-{ return this.LastName;  }-*/;
  }

which means, as a user of this class, if I get a Customer type from somewhere, I don't have to ask what fields and methods are available. The IDE can simply tell me.

HTML DOM Integration

But don't GWT's abstractions keep you from just working with DOM elements? Again, no. It has always been possible to work with the DOM from GWT (how else do you think all those widgets get implemented?), but as of 1.5, it's gotten a lot easier. See this Google IO presentation for details. Here's a taste, though:

  TableElement table = ...;
  String s = table.getRows().getItem(0).getCells().getItem(0).getInnerText();

Which of course translates to:

  var s = table.rows[0].cells[0].innerText;

With the important addition that as you type "table.[ctrl-space]", the IDE can actually tell you that "getRows()" is an option. Given that TextAreaElement alone has nearly 100 methods, that starts to be pretty useful.

The statement "You are likely to never see a DOM object or pieces of the native JavaScript language" is actually only as true as you want it to be. Most JavaScript frameworks provide some sort of higher-level abstraction than simple DOM elements, and GWT is no exception, but you should pick the right level of abstraction for the task at hand.

Independently Useful Parts

Finally, we have the question of whether (GWT === GWT's libraries). This is another common misconception -- GWT looks interesting, but I don't like the way their widget library works. That's like saying you like JavaScript, but not the way Prototype (or whatever) works.

GWT "eats its own dogfood" almost all the way down. That is to say, there's practically no module you can't replace (including the JRE). Like the DOM and JRE modules, but not all the widgetry? Write your own widgets. Don't like any of the modules? Replace the whole bloody thing! The many problems of building browser-based UIs are not yet well-solved in the state of the art, so it's highly unlikely that GWT's widget library represents a "perfect" solution. For a completely different take on how a widget library should work, have a look at the Ext-GWT library.

To Sum Up

I hope I've managed to convey the pragmatic goals that underlie GWT's design decisions, and the methods we've used to achieve them. There are still a million things we'd like to build and/or change, but I feel like it's off to a pretty good start. Nothing would make me happier than if we could stop arguing about abstract notions of what language is "right" and focus on practical goals and metrics.