Thursday, September 16, 2010

An Ugly Bug in the IE9 Beta

Hot on the heels of my very happy discovery that IE9 finally plugs its leaks, I've found a subtle-but-important bug in the IE9 beta. Bear with me, as it's a little tricky to explain.

The Good News

IE9 finally implements the standard HTML5 DOM element interfaces, which will make many things simpler. Further good news: IE9 includes a nice debugger you can use to explore these interfaces. As I understand it, the IE team has cleaned up all the bizarre old COM bindings that have been giving developers fits for years. So when you inspect an element in their nifty new debugger, you get something like this:

document.body   {...}                     [Object, HTMLBodyElement]
- accessKey     ""                        String
- appendChild   function appendChild(...  Object, (Function)
...

The Bad News

This is beautiful, and matches your expectations of the interfaces quite nicely. But then I discovered this little gem:

elem:           {...}                     DispHTMLImg
- [Events]
- [Expandos]
- [Methods]
- accessKey     ""                        String
...

What on earth is this? It sure looks like an IDispatch interface to an element -- but I thought we weren't supposed to be seeing that sort of thing anymore. But if you resolve properties on the object using the Javascript VM, most of them resolve the same way, so no harm done, right?

Not so fast. When digging into a bug in my code, I kept running into this bizarre situation where elements didn't seem to be comparing properly. Specifically, I got into a situation where (ElemA == ElemB) and (ElemB != ElemA). These were two different elements, so they shouldn't have been equal to one another anyway, but the asymmetric equality relation was a really big surprise!

As you might have guessed, one of these two elements was an HTMLElement, while the other was a DispHTMLDivElement. Ok, if one of them is a Disp interface to an element and the other is a native DOM host object, you can imagine how the comparison might get screwed up (I'm going on the assumption that IE didn't expect to have those Disp objects exposed at all). Which begs the question of how I got that reference in the first place.

When I tried to reproduce the bug in isolation, everything seemed to work fine -- no Disp references in sight. I finally tracked it down to the fact that my code was running in an iframe, while the DOM elements themselves were in the outer frame (this is a not-uncommon technique for isolating code). Specifically, it seems to be triggered by the following situation:

Outer page:
  <div id='target'>...</div>

IFrame:
  <script>
  var target = parent.document.getElementById('target');
  target.onclick = function(evt) {
    // both 'evt' and 'elem' will be Disp interfaces
    var elem = evt.currentTarget;
  };
  </script>

So it appears that something's going wrong when marshalling the event object from one frame to the other. And once you get one of these funky Disp objects, all references you get from it will be Disp objects as well. Which opens you to these comparison failures.

A couple of caveats

I'm assuming that the "Disp" part of these objects' names refers to IDispatch, but if that's not correct it doesn't really change much. Also, you may have noticed that I used the == comparison operator above -- it turns out that === behaves as expected. However, there's no good reason to use === when comparing two objects.

A possible explanation

If I understand IE's architecture correctly, older versions appeared to use DCOM for cross-frame communication. If I'm correct about this (and it's still the case in IE9), then it may be that something just went wrong in the marshalling of references from one frame to another (hence my assumption that "Disp" means "IDispatch").

Does This Really Matter?

Yes. It might seem really subtle, but these are the kinds of bugs that can take hours or days to track down when something goes wrong (and for which the fix is non-obvious at best). And while putting your code in an iframe might seem like a slightly odd thing to do, there are very good reasons for it under some circumstances (I'll have more to say on precisely why this is important in a follow-up post).

Repro

I've posted a relatively simple reproduction case here. It's a little screwy, because it's a case hoisted out of a much more complex app, but it should illustrate the issue reasonably well.

IE9: Memory Leaks Finally Declared Dead

It is with great pleasure that I can finally declare the infamous, painful, long-standing, never fixed IE memory leak bug fixed! With the release of IE9, I have verified that every leak pattern I'm aware of is fixed. It's been a long-time coming, but I'm starting to feel more confident that IE9 can be reasonably called part of the "modern web" -- the web that is sufficiently powerful to support complex applications, and not just lightly scripted documents.

One caveat: Do be aware that your "standard" pages need to explicitly request "IE9 Standards" mode, using either an HTTP response header or a meta tag like the following:

<meta http-equiv='X-UA-Compatible' content='IE=9'/>

Failure to do so, in addition to giving you all the old crufty bugs and quirks in previous IE versions, will continue to leak memory, presumably because it is using the DLLs from the old rendering engine.

Now perhaps I can finally stop writing about this stupid bug!

Friday, April 30, 2010

Trying out the new Wave element

Google Wave just added a new "web element" that makes it easy to embed a wave in a blog or article, so I thought I'd try it out. If you're one of the two people who actually reads this, give it a shot. If you don't have a Wave account, you won't be able to edit -- if you need an invite, ping me at jgw@pobox.com.

Friday, April 02, 2010

Quake II in HTML5 -- What does this really mean?

Now that we've finally been able to push our port of Quake II to the browser public, it's time to discuss the questions "what's the point?" and "what does this mean for the future?".

Let me begin with a tweet I saw last night, which neatly summarizes a very salient point:

Not sure if the best endorsement of JS engine speed in 2010 is ports of games from 1997...

http://twitter.com/tberman/status/11446377136

Well said. We should be setting the bar higher than this. The choice of Quake II was mainly predicated on the fact that a Java port already existed, and this was just a 20% project (more like -5%, actually -- nights and weekends for the most part). I'm pretty certain Quake III would have ported just as easily (perhaps more easily, as it was written specifically for hardware acceleration and likely leans on the hardware a little more).

So then there's the fact that it's running at frame rates about a third of what's possible on the same hardware in C (or on the JVM, for that matter). There are a few reasons for this:

  • There are inefficiencies still to be worked out in WebGL implementations, especially the expensive frame buffer read-back on Chrome.
  • There are things being done in Javascript that really ought to be done in vertex shaders; this especially applies to things like model animation interpolation, which is a nasty hot spot in the Javascript code that could easily be pushed to the shader.
  • There are some things, such as dealing with packed binary data structures, that are incredibly inefficient in Javascript (and I mean something like 100x slower than JVM or C code). This can be largely mitigated through better APIs, such as the Khronos Group's Typed Arrays spec.
  • This code is fairly idiosyncratic, having been ported from straight C, and exercises some code generation paths in the GWT compiler that could be better optimized (using lots of small arrays, and lots of static methods, still needs some work).

I would be willing to hazard a guess that we could easily get another 30% out of the frame rate with relatively minor tweaks. If the game were written from the ground up to work with web technologies, it would likely be twice as fast on any reasonable metric (frame rate, startup time, etc.). That's an extremely rough estimate, but I'd be willing to bet on it.

So back to our original question. What's the point? What this code currently proves is that it's feasible to build a "real" game on WebGL, including complex game logic, collision detection, and so forth. It also did an excellent job exposing weak spots in the current specs and implementations, information we can and will use to go improve them.

Now if I were starting with a plain C code base, I would most likely prefer to port it using a technology like NaCl -- that's what it's for. But while it's not feasible to actually ship games on WebGL yet (it will be a while before it's widely deployed enough for that), one can envision a world where game developers can deploy their games as easily as we currently deploy web pages, and users can play them just as easily. And that's a pretty damned compelling world to imagine.