The potential for performance wins by baking modules into memory in Node.js

In a presentation at Yahoo's front-end engineer's conference earlier this year, Dav Glass demonstrated a performance gain by building a custom Node binary that bakes his own modules into memory in the same way Node bakes in the core classes.  I already discussed one aspect of his presentation, whether its valuable for front-end engineers using Node to have access to the toolkits familiar to them from their client side work.  But there was a little segue in his presentation where he showed a neat trick that gave a neat performance improvement.

To start with we must recognize that Node's core packages are baked into the binary.  The require function has a mode where it can resolve a requested module from one that's already baked into memory.  During the Node build process it converts some the javascript files for the core packages (e.g. "http") into C files which are then compiled into memory.  This simplifies deploying Node onto servers because you have fewer files to install, but it turns out to be a performance enhancement.

What Dav showed was a performance gain from baking his own module sources into a custom Node binary.  However I'm not sure how appropriate his optimization is for general use on Node.

The test he showed was to repeatedly load YUI instances and with Y.use to load needed YUI modules.  Using the normal Node binary this was pretty fast, but he constructed a custom Node binary that had some (?all?) of the modules already baked into memory.  The performance jumped dramatically.

Where I'm not sure about this is that Node's require already caches module source in memory.  If you require('xyzzy') more than once the second time around the module will already be in memory and it won't fetch it from disk again.  Dav Glass in his talk claimed that the performance improvement came from not having to grope around the file system for module files, so by having module source baked into memory there's no need to search the file system doing readdir's and stat's along the way.

Because Node's require function already cache's the module in memory, Node only has to search the file system once to resolve a module request.  It's clear from the presentation that Dav Glass was talking about modules loaded using the Y.use method rather than through Node's require function.

http://yuilibrary.com/theater/davglass/f2esummit2011-glass/