Inception! JS-in-JS!

4 December 2014

Apparently this November has been the gloomiest in a while, and that certainly may have slowed down my on-going quest to bring arrow functions to V8. Though arrow functions deserve a write-up themselves, my musings today are about a side quest that started in September, when I had the chance to visit enchanting Berlin one more time, this time as a speaker at JSConf EU.

So what does a guy hacking on JavaScript engines talk about in a conference which is not a lair of compiler neckbeards? Of course it had to be someting related to the implementation of JavaScript engines, but it should have something to do with what can be done with JavaScript. It turns out that V8 has large-ish parts implemented in a subset of JavaScript (and SpiderMonkey; and JavaScriptCore, a bit too!). Enter “Inception: JavaScript-in-JavaScript — where language features are implemented in the language itself any similarity with the movie is a mere coincidence)”.

Intermission

Provided that the video of my talk at JSConf EU is now online, you may want to go watch it now (as an option, with the slides in another window) and come back afterwards to continue reading here.

Did you see the video of the talk? Good, let’s recap a bit before continuing.

Engines use JavaScript, too

I made some quick numbers (with CLOC) to give you an idea of how much major engines are using JavaScript themselves. This is the number of thousands of lines of code (kLOC) of the engines themselves, how much of it is JavaScript, and its percentage over the total (excluding the test suites):

Engine	Total	JS	%
JavaScriptCore	269	1	0.3
SpiderMonkey	457	18	3.9
V8	532	23	4.3

Both V8 and SpiderMonkey have a good deal of the runtime library implemented in JavaScript, relying on the JIT to produce machine code to provide a good performance. JavaScriptCore might have an edge in cases where it is not possible to use the JIT and its C++ implementation can be faster when using the interpreter (iOS, anyone?). In the case of V8 the choice of going JavaScript is clear, because it always compiles to machine code.

Most of the features introduced in the runtime library by the ES6 specification can be implemented without needing to touch low-level C++ code. Or do we?

Live-coding %TypedArray%.prototype.forEach()

For my talk in Berlin I wanted to show how accessible it is to use JavaScript itself as an entry point to the guts of V8. What would be better than picking something new from the spec, and live-coding it, let’s say the .forEach() method for typed arrays? Because of course, coding while delivering a talk cannot ever go bad.

As a first approach, let’s write down an initial version of .forEach(), in the very same way as we would if we were writing a polyfill:

function TypedArrayForEach(cb, thisArg) {
    if (thisArg === undefined)
        thisArg = this;
    for (var i = 0; i < this.length; i++)
        cb.call(thisArg, this[i]);
}

But, in which file exactly should this go, and how is it made available in the runtime library? It turns out that there is a bunch of .js files under src/, one of them conspicuously named typedarray.js, prompting us to go ahead and write the above function in the file, plus the following initialization code inside its —also aptly named— SetupTypedArrays() function, right at the end of the macro definition:

function SetupTypedArrays() {
macro SETUP_TYPED_ARRAY(ARRAY_ID, NAME, ELEMENT_SIZE)
    // ...

    // Monkey-patch .forEach() into the prototype
    global.NAME.prototype.forEach = TypedArrayForEach;
endmacro

TYPED_ARRAYS(SETUP_TYPED_ARRAY)
}

Wait! What, macros? Yes, my friends, but before crying out in despair, rebuild V8 and run the freshly built d8 passing --harmony-arrays:

% ./out/x64.debug/d8 --harmony-arrays
V8 version 3.31.0 (candidate) [console: readline]
d8> Int8Array.prototype.forEach
function forEach() { [native code] }
d8>

It is alive!

Standards are hard

At this point we have a working polyfill-style implementation patched in the prototypes of typed arrays, and it works. Yet, it does not quite work as it should. To begin with, the function should throw TypeError if called on an object that is not a typed array, or if the callback is not callable. The spec also mentions that the .length property of the function should be 1 (that is: it is declared as having a single argument), and so on.

You see, formally specifying a programming language is a complex task because all those iffy details have to be figured out. Not to forget about the interactions with other parts of the language, or the existing code bases. In the particular case of JavaScript the backwards compatibility adds a good deal of complexity to the formal semantics, and wading through the hundreds of pages of the ES6 specification can be a daunting task. We may feel compelled to define things differently.

How standards are born, courtesy of XKCD.

Personally, I would eradicate one of the modes, keep just sloppy or strict, but not both. Only that then it would not be JavaScript anymore. Think about all that legacy code that lurks on the depths of the Internet, wouldn’t it be nice that it continues to work?

That leads to yet another subtlety: the value passed as thisArg (available as this in the callback function) is expected to be wrapped into an object only for sloppy mode functions — but not for strict mode ones, because strict mode is, well, stricter. While .call() handles this, it is not particularly fast (more on this later), so we may end up needing a way to know which functions are in sloppy mode, do the wrapping ourselves (when needed) and arrange the calls in a more efficient way. We have to assume that there will be code in the wild relying in this behavior. If only we could knew which functions are sloppy!

A better solution

As a matter of fact, the JavaScript engines already know which functions are strict and which ones not. We just cannot get access to that information directly. Fortunately V8 provides a way in which JavaScript code can invoke C++ runtime functions: the functions declared using the RUNTIME_FUNCTION() macro can be invoked from JavaScript, by prefixing their names with %, and if you take a peek into src/runtime/, there are plenty.

To begin with, there is %_CallFunction(), which is used pervasively in the implementation of the runtime library instead of .call(). Using it is an order of magnitude faster, we will be keeping it in our utility belt. Then, there is %IsSloppyModeFunction(), which can be used to find out the “sloppyness” of our callback functions.

But wait! Do you remeber that I mentioned macros above? Take a look at src/macros.py, go, now. Did you see something else interesting in there? Plenty of utility macros, used in the implementation of the runtime library, live there. We can use SHOULD_CREATE_WRAPPER() instead. Also, there are a number of utility functions in src/runtime.js, most annotated with sections of the EcmaScript specification because they do implement algorithms as written in the spec. Using those makes easier to make sure our functions follow all the intrincacies of the spec.

Let’s give them a try:

// Do not add a "thisArg" parameter, to make
// TypedArrayForEach.length==1 as per the spec.
function TypedArrayForEach(cb /* thisArg */) {
    // IsTypedArray(), from src/runtime/runtime-typedarray.cc
    // MakeTypeError(), from src/messages.js
    if (!%IsTypedArray(this))
        throw MakeTypeError('not_typed_array', []);

    // IS_SPEC_FUNCTION(), from src/macros.py
    if (!IS_SPEC_FUNCTION(cb))
        throw MakeTypeError('called_non_callable', [cb]));

    var wrap = false;
    var thisArg = this;
    if (arguments.length > 1) {
        thisArg = arguments[1];
        // SHOULD_CREATE_WRAPPER(), from src/macros.py
        wrap = SHOULD_CREATE_WRAPPER(cb, thisArg);
    }

    for (var i = 0; i < this.length; i++) {
        // ToObject(), from src/runtime.js
        %_CallFunction(wrap ? ToObject(thisArg) : thisArg,
                       this[i], i, this, cb);
    }
}

Now that is much closer to the version that went into V8 (which has better debugging support on top), and it covers every the corner cases in the spec. Rejoice!

When in Rome…

Needless to say, there is no documentation for most of the helper functions used to implement the runtime library. The best is to look around and see how functions similar to the one we want to implement are done, while keeping a copy of the spec around for reference.

Those are a couple of things one can find by scouring the the existing code of the runtime library. First: for typed arrays, the macro in SetupTypedArrays() installs a copy of the functions for each typed array kind, effectively duplicating the code. While this may seem gratuitious, each copy of the function is very likely to be used only on typed arrays in which elements are a certain type. This helps the compiler to generate better code for each of the types.

Second: Runtime functions with the %_ prefix in their names (like %_CallFunction()) do not have their C++ counterpart in src/runtime/. Those are inline runtime functions, for which the code is emitted directly by the code generation phase of the compiler. They are real fast, use them whenever possible!

After coming back from Berlin, I have been implementing the new typed array methods, and the plan is to continue tackling them one at a time.

As mentioned, %TypedArray%.prototype.forEach() is already in V8 (remember to use the --harmony-arrays flag), and so is %TypedArray%.of(), which I think is very handy to create typed arrays without needing to manually assign to each index:

var pow2s = Uint8Array.of(1, 2, 4, 16, 32, 64, 128, 256);

Apart from my occassional announcements (and rants) on Twitter, you can also follow the overall progress by subscribing to issue #3578 in the V8 bug tracker.

Last but not least, a mention to the folks at Bloomberg, who are sponsoring our ongoing work implementing ES6 features. It is great that they are able to invest in improving the web platform not just for themselves, but for everybody else as well. Thanks!

perezdecastro.org