Findings while looking into implementing simultaneous multi-user editing on top of Node.js

I'm pondering developing something to support multiple people editing the same document at the same time.  I thought I'd seen a demo using something like backbone.js, socket.io, websockets, or something.  But that was months ago, I can't find it now, and after some searching have come up with some interesting pointers to some stuff but nothing that provides a solid starting point.  Instead I came up with warnings that this is HARD stuff - as in CompSciHard - including one toolkit written by an ex-Google-Wave engineer who said it took them 2 yrs to write Wave, and it would take another 2 yrs to reimplement because it's such a hard thing to do.

Having used Google Docs or Wave and done a bit of multi-user-editing of documents - the task as a user seems so simple.  However after reading up on some of the available libraries, I begin to see why it's hard.  The server has to maintain an object for each document, and between clients and server there must be a protocol for communicating changes to the document.  Because it's multiple clients, each could be trying to change the document at the same time, so the object model has to account for rationalizing where each edit occurred, which edits win in case there are overlapping changes.  Additionally there is the task of notifying all clients of all edits, simultaneously, in a way that prevents collisions and confusion.

The rest of this is probably TL;DR .. so the short story is ...

  • There are HARD problems here (as in CompSciHard)
  • There isn't a simple library to just pick up and bolt into an application
  • The leading library (DerbyJS) is entering a phase of massive rewrite
  • The other leading library (Meteor) is widely regarded as interesting, but incompatible with the Node.js ecosphere, so I'm ignoring that project

Let's get on with the things I found:-

Getting started with Meteor and Derby on your own server:

Did I say that the server side would be written with Node.js?  I must have missed that, however the title of this blog should be a giveaway.  Two principle toolkits I found were Meteor and DerbyJS.  That blog post goes over setting up and kicking the tires of both.

I was able to quickly dismiss Meteor from consideration despite it having a fairly active community behind it.  Why?  While it runs on Node.js, they implement the thing with Fibers and are eschewing the asynchronous aspect of Node.js.  The Meteor site even has a long discussion saying that Meteor is better because synchronous, in-line, code is much easier read than is asynchronous code with a zillion callbacks.

Specifically:

Meteor gathers all your JavaScript files, excluding anything under the clientand public subdirectories, and loads them into a Node.js server instance inside a fiber.

And:

In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node. We find the linear execution model a better fit for the typical server code in a Meteor application.

Granted there is a point to that line of reasoning, but if that's how they feel about things then why are they implementing on Node.js?  Sorry .. but this platform is about asynchronous code.  Further one of the things NOT implemented for Node.js 0.10.x was anything in the vicinity of Isolates or Fibers.  While fiber may be an important part of a healthy diet, the Node.js community seems to be shunning fibers as a programming model.

http://derbyjs.com/ is immediately more compatible with the Node.js environment because it can be hosted on top of Express.  Cool.

The Derby project has this to say about itself:

"Derby eliminates the tedium of wiring together a server, server templating engine, CSS compiler, script packager, minifier, client MVC framework, client JavaScript library, client templating and/or bindings engine, client history library, realtime transport, ORM, and database. It eliminates the complexity of keeping state synchronized among models and views, clients and servers, multiple windows, multiple users, and models and databases.

At the same time, it plays well with others. Derby is built on top of popular libraries, including Node.jsExpressSocket.IOBrowserifyStylusLESSUglifyJSMongoDB, and soon other popular databases and datastores. These libraries can also be used directly. The data synchronization layer, Racer, can be used separately. Other client libraries, such as jQuery, and other Node.js modules from npm work just as well along with Derby."

The blog post linked above does a walk-through of running an example DerbyJS application.  However the overall state of DerbyJS examples is really poor.  There is a github repository of them at https://github.com/codeparty/derby-examples but the examples weren't terribly useful. 

I did find a very useful blog post:-  Derby.js – Working with Views, Models, and Bindings  This gave enough insights into how Derby models and Views worked to get started writing some code.

The library let's you write a simple model description, write a simple view, and so long as you follow certain conventions it wires everything up for you without requiring additional coding. 

Then, browsing through the DerbyJS Google Group I found this question:- Can you make Google Docs like functionality with derby?  Essentially what I wanted to develop was an extremely simplified "Google Docs functionality."  The answer?  That the DerbyJS team replied saying

Not at the moment, but we're reworking the core so it can in the future.

And

For now, you should check out ShareJS if you want to implement collaborative text editing: http://sharejs.org/

But before I get into that, I do want to mention this thing that I found:  https://github.com/addyosmani/todomvc  It's a group project to implement the same example application in multiple frameworks - specifically, a simple TODO application.  They've developed examples for 2-3 dozen frameworks and it's a quite useful starting point for understanding.  The DerbyJS example is not in the main set of examples so go hunting for it.

https://github.com/josephg/ShareJS bills itself as supplying collaborative editing in any application.  It is directly the sort of thing required for this project, and the API looks clean and simple to use.  The example server is a little convoluted to follow but I believe that's just a matter of a few hours of playing with the code to see how it ticks.

The ShareJS website starts with this question:

You’re writing a web app. Your app contains data that users edit. Your users should be able to user your app from multiple computers if they need to. Sometimes you want multiple users to view & edit the same data.  How do you make that work, without the data going out of sync and without losing anything?

He claims the answer is Operational Transformation - describing that as

OT is a class of algorithms that do multi-site realtime concurrency. OT is like realtime git. It works with any amount of lag (from zero to an extended holiday). It lets users make live, concurrent edits with low bandwidth. OT gives you eventual consistency between multiple users without retries, without errors and without any data being overwritten.

And:

Unfortunately, implementing OT sucks. There's a million algorithms with different tradeoffs, mostly trapped in academic papers.

And:

I am an ex Google Wave engineer. Wave took 2 years to write and if we rewrote it today, it would take almost as long to write a second time. (What??)

At this point I was thinking - okay, just how big of a task have I bitten off for myself?  He does describe ShareJS as a small/simple server written in 4k lines-of-code of coffescript.  And the demo code looks okay, though as I said it'll take some time to really grok.

But - I noticed a blog, and started reading the authors blog finding this one:-  It's time to rewrite ShareJS!  To cut to the chase - the guy was hired by Lever, the team behind DerbyJS, and they want to rewrite DerbyJS and Racer to base it on some stuff in ShareJS.  Along the way they want to do a massive rewrite of ShareJS.

Soooo... while ShareJS looks like a cool system directly useful to this project, I'm not going to be in a position to allow the code to become incompatible with the toolkit used to implement the application.

FWIW the Derby team wrote about hiring Joseph Gentle here:- Getting Derby ready for prime time.  They have a lot of really nice things to say about this and they portray an ambition for the combination of DerbyJS, Racer and ShareJS to be a powerful stack for developing real time web applications.  However, it's also clear that they'll be in a period of massive change to the three libraries.

Etherpad (http://etherpad.org/ and https://github.com/ether/etherpad-lite) is already an implementation of this idea.  It's even written in Node.js.  But it's a little difficult for me to get my head around it in the timeframe I have to study it.  And it doesn't seem to be written so I could extract some pieces/parts to build a much simpler application.

Next-generation JavaScript frameworks (https://gist.github.com/clarle/3396225) is the starting point for an overview of several frameworks ...

Rant: Backbone, Angular, Meteor, Derby (https://gist.github.com/lefnire/4454814) is another comparison piece ... makes some of the points I made above about Meteor, says great things about Derby and Backbone