The security model of WebAssembly

wyldfire · on July 26, 2018

For some reason I assumed this might include specifics about spectre mitigations.

Heading off on that tangent: the mitigation for browsers was to turn off the time reference or to downsample the resolution, right? What if they do the work to sync up with a remote reference clock? Including characterization of the channel's jitter/RTT/all that cleverness? Could they have high enough precision to deduce speculative execution? Or were there other mitigations that I'm not recalling?

(apologies for the tangent, none of this post is wasm specific)

uluyol · on July 26, 2018

Chrome is now using more processes to leverage OS-level spectre protections. I think they will re-enable high precision timers.

Source: https://security.googleblog.com/2018/07/mitigating-spectre-w...

extrapickles · on July 26, 2018

Don’t forget that a tight loop with a counter can be a good source of time. Some of the mitigations against crypomining would work where they throttle the tab if it consumes too much CPU.

sillysaurus3 · on July 26, 2018

One interesting aspect of webassembly is that it's feasible to do some advanced program-specific architectures.

Consider a website whose img src references are not actually strings, but instead are pointers to memory addresses. Your application works like this: When you attempt to dereference the pointer, a page fault fires, which causes the underlying system to load the image from the server. When the loading is complete, your program resumes.

Big deal. But this opens up another interesting avenue: It's very easy to return a different image while you're waiting for the original to load. Meaning there is no async/await -- no waiting at all. You just write code in the natural, blocking way. No callback hell or promise chains.

There are other interesting applications of this too. When you dereference the pointer to the image, you know the user's viewport size. Meaning you know how large the image should be. Therefore the underlying system can automatically request a perfectly-sized image from the server -- or have the server construct it on the fly, then stream it to the client.

You never have to deal with any of this complexity yourself.

The most important area where this type of design is applicable is gaming. You want to stream the exact texture miplevels you need, at an ~infinite level of detail. If you approach a wall, you want to see lots of high resolution wall textures. But when you move away from the wall, that texture data should be freed. The above architecture handles this type of concern automatically.

sbjs · on July 26, 2018

I really don’t see how any of this isn’t already possible in JS? It can be done cleanly to. Take your first example:

    img.src = placeholderFor(viewport);
    loadRealImage().then(src => img.src = src);
    // carry on

sillysaurus3 · on July 26, 2018

Now, what do you do when you want to return from that function after loadRealImage() finishes?

  function loadImage(src) {
    loadRealImage(src).then(img => /*... return img from loadImage ...?*/);
  }

"Just make it an async function." Well yes, but then the async nature contaminates the rest of your code. You have to make sure to set up a toplevel try-catch for every one of your async chains, for example.

What do you do when you want to have an async generator? That is, you want to both await on something, and yield N values from the function. JS makes that difficult.

There are all kinds of limitations like this, and everyone has their own favorite hacks around them. But they're hacks, not unification. And yes, hacks can be effective, but a unified framework can be tactical.

sbjs · on July 26, 2018

So... to solve this... you want to use what's basically goto?

sillysaurus3 · on July 26, 2018

In this case, I'd like to write blocking-style code.

longjmp, however, is a crucial missing facility in JS. Emacs relies on it heavily in its design -- it's how catch / throw work, and it's why you can do things like

   (catch 'foo
    (map (fn (x)
           (if (= x 42) (throw 'foo x)))
      values))

Without longjmp, you can't do that. Just as you can't in JS.

So what? Well, that means you can't write emacs, because you're limited to what JS provides you. An entire class of software is beyond your ability to write, because you cannot provide the same features that other runtimes give.

This gets me started on the lack of any kind of reasonable error definitions in JS. In elisp, you define errors. Imagine you want to write some code that parses some parens -- it turns "(a b (c))" into ["a", "b", ["c"]]. What do you do when your program encounters "(a b" and then the end of the string? Throw a scan error!

Not in JS. It's considered poor manners to throw errors to the people using your library. Worse, it's a pain in the ass for users to catch and respond to errors. If you use someone's library, you usually don't expect to have to wrap it in a try-catch. And the code is massive:

try { operation } catch (e) { if (e instanceof ScanError) { do something else } }

Contrast that with elisp:

  (condition-case nil
    operation
    (scan-error do something else))

There's no contest. It's way easier to write the latter than to use the tools JS gives you. But it's a cultural difference, and culture is slow to change.

That's why webassembly at least gives an escape hatch.

millstone · on July 26, 2018

> Well, that means you can't write emacs, because you're limited to what JS provides you

Well JS does provide this via exceptions.

Second it's totally crazy that emacs depends on longjmp, which is an insane decades-old wart that ought to just quietly die.

Third...does web assembly even support longjmp? I'm pretty sure it does not.

> That's why webassembly at least gives an escape hatch.

Escape hatch from...writing five lines instead of three?

sillysaurus3 · on July 26, 2018

  function foo() {
    return doSomething();
  }

  async function doSomething() {
    ...
    throw ScanError();
  }

Spot the bug? That's an async function. Every JS programmer worth their salt will tell you how many times they've been annoyed to discover a missing await, and that the promise is failing mostly-silently (or worse, it works by accident until it doesn't, since that code will work fine most of the time).

Well JS does provide this via exceptions.

Second it's totally crazy that emacs depends on longjmp

It's not crazy.

  function bar() {
    [1,2,3].forEach(x => {
      if (x == 2) /* return from bar...? */
    }
  }

Why can't you write this code? I mean you "can":

  function bar() {
    let tag = [];
    try {
      [1,2,3].forEach(x => {
        if (x == 2) { tag.value = x; throw tag; }
      }
    } catch (e) {
      if (e === tag) {
        return e.value;
      }
  }

But holy crap that's terrible. EDIT: I also forgot to rethrow the error, showing just how easy it is to screw up.

Just write it as a for loop, then. Well sure, except for the hundreds of libraries that don't support that style. What do you do when you want to return prematurely from the iterator functions you pass to them? Now you can't just make a for loop unless they provide Symbol.iterator. And 9 times out of 10 they give you a promise, meaning you're forced to convert your code to async style.

It's a huge mess, and longjmp saves you a lot of headaches in disciplined situations. Every tool has its place, and it's strange to argue that a computer should be able to do less, not more.

The goal is to save you time in the long run. And those 5 lines better be exactly right, or you'll waste a lot.

gear54rus · on July 26, 2018

just use one of myriad array methods available, like "find" for this case

"can't write some class of software" is just wrong, is all

gpderetta · on July 26, 2018

I'm prettty sure you can implement a elisp (or any language with continuations) interpreter/compiler without longjump in the host language.

sillysaurus3 · on July 26, 2018

Yes, but it can’t easily interop with the host language. Elisp saves the stack, which you can’t do in JS. That means you can’t cross JS -> elisp boundaries with your longjmp without some pretty awful hacks.

gpderetta · on July 26, 2018

You cannot longjmp across 'native' stackframes (which mightor might not be a big limitation), but otherwise interop should be fine. The native interpreter stack is always around to be used to run native function calls.

nawgszy · on July 26, 2018

>yield N values from the function

const { all, the, values, you, want } = await multiValuedReturn();

?

sbjs · on July 26, 2018

But sillysaurus3 has a valid complaint about this line of code, that the very next line is completely blocked until multiValuedReturn() completes. A better solution is to not use `await` here, and instead use .then and Promise.all to control execution of several callbacks, while running other code synchronously.

I think the problem sillysaurus3 is facing is that he wants complete convenience of the kind that async/await bring, meaning being able to execute code regardless if it finishes now or later, but always pretending that it finishes now. That's just not possible, and for good reason: some things happen now, and some things happen later, and if you need ultra-fine-grained control over what happens when, then you need to be very explicit about what's happening and how and when. It's messy and ugly, and we're working towards cleaning that up, with async/await being a great step forward, but this is inherent to the concepts of sync/async control flow, and the assembly solution is much more of a hack to solve this (if I even understand the solution correctly) than Promise/async/await/etc.

sillysaurus3 · on July 26, 2018

That's just not possible, and for good reason: some things happen now, and some things happen later, and if you need ultra-fine-grained control over what happens when, then you need to be very explicit about what's happening and how and when.

Green threads. No silver bullet, but sometimes you upgrade your pistol.

JS needs greenthreads. There's no reason you shouldn't just spin up a thread for every blocking context. This is arguably what async/await already does, but you don't have control over the toplevel loop. Point out where your while (userIsOnWebsite) { ... } loop is. :)

Having a toplevel loop is very important for simplicity -- but even moreso for what you mention: having ultra fine-grained control, and not constantly fighting with the underlying scheduler / ecosystem.

meowface · on July 26, 2018

I've seen a lot of debate over this, but I find greenthreads so much simpler to work with and reason about, on top of keeping my code clean and completely interoperable with my synchronous code.

This is off-topic from the current JavaScript discussion, but even though Python introduced "first-class" async support a while ago, I still exclusively use gevent [1] for all of my concurrency needs. It provides full greenthread support for Python. It's extremely performant on top of being simple and clean to integrate. In contrast, Python's asyncio and async/await feels needlessly verbose and tortuous, often with full rewrites or replacements of libraries required to actually take advantage of the features (e.g. https://github.com/requests/requests vs. https://github.com/aio-libs/aiohttp). With gevent, I can just put whatever code I want into a greenthread and get instant asynchronicity (excluding hiccups with a few 3rd party libraries that have network I/O in native extensions, which is rare).

The one downside is it achieves its magic with standard library monkeypatching, which is pretty hideous, but it's done very seamlessly. Developers are able to write the exact same code with or without monkeypatching and not worry about what it's doing. I've never encountered an issue with the monkeypatching when using any standard or 3rd party library.

[1] http://www.gevent.org/

millstone · on July 26, 2018

How does gevent handle spawning kernel threads? Last time I tried Go I found that it spawned unlimited kernel threads. For example, a thousand goroutines calling stat() would result in as many kernel threads.

I ended up tracking down which goroutines were likely to spawn kernel threads (gross global reasoning), and applied a rate-limiter to that set.

meowface · on July 26, 2018

Unfortunately, Python is still bound by the GIL, so to my understanding, gevent will never spawn any kernel threads. This effectively means there is no true parallelism possible; context switches between greenthreads pretty much only occur when waiting on I/O. I believe this is the case for all other Python concurrency libraries as well, including the threading library in Python's standard lib.

Most people, including myself, use gevent for I/O bound applications (where it excels), so this usually doesn't pose any issues. For CPU bound tasks where performance is important, I'd probably use something other than Python (likely Go, personally).

euroclydon · on July 26, 2018

> Big deal. But this opens up another interesting avenue: It's very easy to return a different image while you're waiting for the original to load. Meaning there is no async/await -- no waiting at all. You just write code in the natural, blocking way. No callback hell or promise chains.

I guess. The renderer asks for an image and you provide one. Then you want to provide one again, but this is like following a customer out of your store after the transaction to give him a different product. It’s not exactly some straight up procedural code.

marcosdumay · on July 26, 2018

It's really asking from an abstraction layer with opaque capabilities... Something like the mobile frameworks or Haskell's MTL-style IO.

AgentME · on July 26, 2018

>[dynamic images in general]

Canvas elements already exist.

>But this opens up another interesting avenue: It's very easy to return a different image while you're waiting for the original to load. Meaning there is no async/await -- no waiting at all. You just write code in the natural, blocking way. No callback hell or promise chains.

Can't you use canvas elements from web workers? Or you could post messages to the UI thread telling it to edit the canvas, which should get you the same effect.

None of this seems specific to WebAssembly.

NegativeLatency · on July 26, 2018

> You never have to deal with any of this complexity yourself.

Even if you're using a bunch of saas apps to do this stuff it's still added complexity over a regular web page.

brian_cloutier · on July 26, 2018

Could you give another example of how this would be used? It seems like a cool idea, but I think I'm too stupid to quite understand why yet.

> When you attempt to dereference the pointer, a page fault fires, which causes the underlying system to load the image from the server. When the loading is complete, your program resumes.

This first half doesn't sound very exciting, this is exactly how your program already works when it communicates with the OS. It's also how co-routines work, this space has already been pretty well explored.

> When you dereference the pointer to the image, you know the user's viewport size. Meaning you know how large the image should be. Therefore the underlying system can automatically request a perfectly-sized image from the server

This part seems pretty cool, it's not currently easy to reconfigure your OS and have it change the answers it gives you (except in some obvious cases, like changing the file your program is going to read() ).

I like the idea of "dynamically reconfiguring the syscalls", but I'm not sure what I would use it for.

sillysaurus3 · on July 26, 2018

I like the idea of "dynamically reconfiguring the syscalls", but I'm not sure what I would use it for.

Exactly this. Little things, like why can't you just write `fs.readFileSync(<server URL>)`? That's all you want to do. You don't want to deal with issuing a fetch call and setting up handlers and so on. That should be abstracted from you. Ad everyone has their own ways of doing this, and you just search for whatever JS lib happens to tickle you that day. It's the opposite of a unified engine model.

Why does that matter? Because the easier your code is to write, the faster you can write. And the larger the systems you can create, individually. Since system value scales inversely proportional to the number of people working on it, that means you alone can build something in a few days that would normally take a team of programmers quite a lot longer.

Think of it this way. Why bother writing React? There were existing solutions. Yet we all saw what happens when you ignore standard ways of doing things and push for something better.

Two specific examples: You want to write blocking code, no callbacks and no awaits, and you want greenthreads. The reason you want this is because your codebase becomes exponentially smaller. And the only way to accomplish this in JS is to essentially write your own programming language on top of JS. Not really a transpiler, but more like an entire scheduler with heap allocated memory and page tables.

If that sounds crazy, just imagine how crazy it would've sounded to try to mix HTML inside of javascript before JSX.

The central theme here is simplicity and generality. HN's codebase proves what a single dev can achieve when you focus exclusively on these two goals.

EDIT: Some design inspiration: https://www.youtube.com/watch?v=Ox2H3kUQByo&t=1340s

This is one of the few games where Lisp is used, and they show how and why it enables them to do easily what other games struggle to.

greggman · on July 26, 2018

I'm not following you at all. What does loading images have to do with WebAssembly? How will you get webassembly to block based on loading an image? Every API in the browser works on events.