Why are programmers dead set against my concurrent Node.js runtime?

victorNicollet · on April 12, 2016

Does your concurrent runtime guarantee that scripts will be executed as if they were running on a single-threaded runtime ?

Consider a naive function that queries several asynchronous data sources and constructs an array of all distinct responses:

    function distinct(sources, next) {

      var left = sources.length;
      var result = [];

      for (var i = 0; i < sources.length; ++i)
         sources[i].query(process);

      function process(data){
         if (result.indexOf(data) < 0) result.push(data);
         if (--left == 0) next(result);
       }
    }

This code can only work if each call to `process()` is executed atomically, start-to-finish, without another call touching `result` or `left`. In any language but JS, this would be an obvious race condition. But JS guarantees that there will be no concurrent access, and plenty of third-party code relies (perhaps unknowingly) on this serial execution property.

Yes, this could have been implemented as a join, followed by duplicate elimination from the array of results, but it wasn't, and you'll find plenty of JS code which reinvents the wheel in a similar way. Even worse: you can't be sure that a given JS plugin or dependency _doesn't_ rely on single-threaded evaluation without an in-depth code review.

I'm not saying that your concurrent runtime is a bad idea. In fact, I think it's an absolutely great idea, but I also know that it's an insanely _hard_ thing to implement, because you need to reliably detect whether two pieces of code are allowed to run in parallel, or if there's a non-obvious dependency between them that would prevent it (e.g. thread A is about to read memory cell X, is there any way for thread B to write to memory cell X before it yields to the scheduler ?)

eveningcoffee · on April 12, 2016

This code can only work if each call to `process()` is executed atomically, start-to-finish, without another call touching `result` or `left`. In any language but JS, this would be an obvious race condition. But JS guarantees that there will be no concurrent access, and plenty of third-party code relies (perhaps unknowingly) on this serial execution property.

Wow, this is very strong guaranty. Thanks for this insight.

I do not write code in JS (only small spinets when necessary), so I do not have deep knowledge in it, but it would be really interesting to know where this property is described or what books should one read today to get into JS programming.

victorNicollet · on April 12, 2016

Not an easy read, but the ECMAScript specification is very interesting if you have the time:

http://www.ecma-international.org/ecma-262/6.0/index.html#se...

> At any point in time, there is at most one execution context that is actually executing code.

> Evaluation of code by the running execution context may be suspended at various points defined within this specification. Once the running execution context has been suspended a different execution context may become the running execution context and commence evaluating its code.

Suspending the evaluation context is triggered by executing certain constructs, some of which are obvious (calling a function suspends the caller context until that function returns) and some of which are complex (there's a nice game of context switching whenever generators are involved). The general idea is that the evaluation context decides to be suspended, rather than forced to do so by anything else (like a thread scheduler).

nostrademons · on April 12, 2016

Hard to say without knowing what you mean by "if two tasks try to access a variable concurrently it gets marked atomic and they contend for access." People who have worked in depth with multithreaded programs know that there be dragons here, and so unless you've proved your code correct for all cases, the assumption will be that it's broken. Do you handle the case that Thriqon brings up safely, for example? Do you take all locks in a consistent well-defined order? (If you don't, you're inviting deadlock.) What counts as a variable? If you mutate an array element, do you lock the whole array or just that one object?

These issues are why Python has a GIL and why Java has separate "synchronized(monitor) { ... }" blocks, and Java programmers still get it wrong all the time.

PhilWright · on April 12, 2016

People need a compelling reason to switch from the tried and tested to something new and unproven. So you need a compelling feature/benefit. But I cannot see what it is.

Is your approach going to significantly improve performance? Doubtful. You really need to prove this.

Is your approach going to make it easier/faster to write code? Definitely not, multithreaded code is many times harder to get right than single threaded code.

Is your approach going to be more robust? Definitely not, deadlocks, race conditions etc. are a nightmare to fix.

tracker1 · on April 12, 2016

To your last point, this is worse when it's a runtime environment for other people's code... Upstream bugs are extra painful, even in open-source, as getting them fixed and out the door can take time, and your custom branch/fork in an interim may have other bugs, and a "final" solution may not match your own.

Not that it's impossible, but trying to keep certain bits separated is a good thing... I'd love to see proper coroutines/csp in node/js... WebWorkers is similar, but not quite the same, though probably as close as we're likely to see.

voodooattack · on April 12, 2016

Try writing a raytracer using a single thread, now try again with multiple threads working in tandem.

Trust me, I tried.

PhilWright · on April 12, 2016

If you need that level of performance then JavaScript is the wrong tool for the job.

Node is mostly for request/response scenarios such as a web server or REST API server etc.

voodooattack · on April 12, 2016

That's exactly what I'm trying to change. JavaScript should be the right tool for that level of performance. It has JIT compilers, now it needs concurrency, and it'll perform with the best of them.

dragonbonheur · on April 12, 2016

Javascript has mostly web programmers as a community. I'm not sure they have the same mindset as people who normally try to wring out the most performance from their machines. Even QBasic programmers knew how to turn to machine code for more performance. Nowadays the culture is "megahertz are cheap". That is the kind of culture that leads to e.g. Microsoft making an IOT operating system for the PI that weighs half a gigabyte and does not do much. Or people thinking that a 1 Ghz system with 512MB of RAM is too slow and is only good for browsing the Web and a 300 MHz system with 128MB of RAM is useless nowadays. These were once called workstations back in the day.

If you're trying to push Javascript to its limits and make it comparable to C++, good for you. Just remember that JS has many flaws.If you're going down that road I'd much prefer you give the world a Turbo javascript that comes with an IDE like Turbo C and the requisite documentation that all fit on a single 1.44MB disk and which can compile to directly executable code that doesn't need any runtime.

bsou · on April 12, 2016

For that specific problem you could use the upcoming addition to JS of SIMD for data parallelism while still in single threaded JS.

eveningcoffee · on April 12, 2016

You say that "if two tasks try to access a variable concurrently it gets marked atomic and they contend for access."

This is actually not sufficient. As victorNicollet pointed out, JS gives very strong guaranties when executing the code.

Let say

  function swap(a, b) {
    var swap;
    swap = a.amount;
    a.amount = b.amount;
    b.amount = swap.
  }

If you just mark a and b atomic, the code above will fail. You have to run it in transaction (see software transactional memory) or use locks.

If you use STM, then you can abstract away locks and have control over the lock taking order so you can avoid dead locks.

For example Clojure language is using STM, but they assume that the variables used in transaction are immutable, as they only manage the control over the references.

mizchief2 · on April 12, 2016

First of all, if you enjoy this project keep at it. What you learn in the process will be more valuable to you than any advise you get from the naysayers.

However, don't expect everyone to jump on it. The original purpose of Node was to provide a simple way of handling I/O in a non-blocking fashioned without having to deal with the headache of true multi-threaded programming. So you are trying to re-write the engine to do the exact opposite of what it was created for.