One reason was scale... The improvements the team saw were staggering. They went from running 15 servers with 15 instances (virtual servers) on each physical machine, to just four instances that can handle double the traffic. The capacity estimate is based on load testing the team has done.
Though Rails is not known of being high performance, this sentence rings a warning sign in my mind that something must be terribly wrong with their Rails implementation.
They're comparing an evented app (what they built in node.js) to a non evented app (Rails). It likely has less to do with Rails being slow and more to do with managing connections intelligently, doing things async, etc.
I'm a 5 year + rails developer, and this doesn't actually surprise me. Ruby and Rails are just incredibly slow in my experience unless you cache everything, which can greatly increase complexity. Right now I'm struggling with the creation of 2000 active record objects taking 10 seconds, which is absolutely ridiculous. And that's down from 30 seconds - which was happening because require was being called for each of those object instantiations by a third party library. So a 20 second gain by removing 2000 calls to require - on Heroku - something is wrong there.
The JSON that's generated with all that data is converted to backbone objects instantaneously in a browser, and the few database calls are instantaneous - it's all Ruby. I will probably need to create custom lightweight objects just because ruby is so darn slow. Much of the time is spent in gsub resolving paths for paperclip attachments - work that would take most languages a few milliseconds can take 10 seconds in ruby. I know there are many ways to optimize this but I should not have to at this point.
I find with ruby I constantly run into performance bottlenecks and need to revert to optimization that I would not have to do on other platforms.
Yours is the typical answer from rails fans. But what could be wrong with his Rails implementation that could be that slow, assuming they are reasonably competent - a valid assumption I think given their successful port to node?
I find with ruby I constantly run into performance bottlenecks and need to revert to optimization that I would not have to do on other platforms.
Agree. Ruby tends to be more likely to be CPU-bound than other languages. That demands a more indepth understanding of Ruby itself. For example, as simple as string concatenation of use '+' vs '<<' can easily fail lots of people.
Yours is the typical answer from rails fans.
I don't quite like the tone of this statement. Although I've been working with Rails at work for the past five years, I don't consider myself a Rails fanboy. I consider it a valuable tool to bring along high productivity. I am quite aware of its strength and weakness.
But what could be wrong with his Rails implementation that could be that slow, assuming they are reasonably competent - a valid assumption I think given their successful port to node?
We all know that scaling up Rails is doable though not an easy task. To handle webscale traffic like LinkedIn in Rails, you can not go very far following the traditional ActiveRecord way. Very likely, you will introduce a high performance middleware to prepare the data for rendering say in Java/C/C#. For the frontend, you have jQuery or whatever Javascript library you like. You just leave Rails to handle routings. Though some people may challenge that if it is necessary to have Rails in this architecture. I think Rails can still make a case even in such scenarios, which is basically what github does.
I think you answered the question. You scale Rails by slowly replacing it. The problem with their architecture is that they hadn't done that yet. Node is handling their web scale traffic as-is.
Comparing rails and nodejs isn't really a fair fight. Rails has a ton of shiny bits that bare node simply doesn't. Comparing the two on speed alone isn't going to be overly productive.
Maybe I didn't make myself clear. My point was that their Rails implementation may not be well written, i.e., not architect-ed properly since beginning. You cannot simply throw a few tables and scaffolds and hope it scale. That way may work for small apps, definitely not for web-scale apps like this.
To be fair, Node.js also has its own quirks, besides the callback maintenance issues, I also mentioned in another post that Node.js is not as scalable as a lot of people present it to be.
"One reason was scale... The improvements the team saw were staggering. They went from running 15 servers with 15 instances (virtual servers) on each physical machine, to just four instances that can handle double the traffic. The capacity estimate is based on load testing the team has done."
Nice to see an example of Node in the wild. It's so fun and easy to develop in Node, that it sometimes feels like a toy.
I was responding to jinushaun's comment on how Node.js is easy to develop on. CoffeeScript compiles to JavaScript. Give http://jashkenas.github.com/coffee-script/ a good read and you'll convert too!
From github:
CoffeeScript:
number = -42 if opposite
Javascript:
if (opposite) number = -42;
How is coffeescript an advantage in this example? The javascript seems more readable to me as it more closely follows regular english grammar.
CoffeeScript:
square = (x) -> x * x
Javascript:
square = function(x) {
return x * x;
}
In this example, again the javascript seems more clear. The function actually has the word function in the declaration. I like that. Essentially coffeescript took out the word 'function' and 'return' and replaced curly braces with other symbols.
Inline conditionals on the left and the function keyword? From all the features that CoffeeScript offers it's kinda odd to focus on those minor details of taste and habit.
You can write it either way in CS. Anyway, saying that CS is a huge improvement over JS doesn't imply that every single feature of CS is better than its JS counterpart.
My comment was prompted by the trend for at least one HNer to promote CoffeeScript in comments on an article related (however tangentially) to JavaScript.
I don't see anything wrong with it. CoffeeScript to JavaScript is like C to Assembly, much more superior. You can strive to excel at Assembly does not mean you have to. Programmers should dedicate their time to solve real-world problems instead of fighting against their tools.
I'm happy for CoffeeScript to exist, to be able to play with it, and to see its syntax inform the work on JS.next. It has a way to go before I'd be happy to use it in a production environment.
While this is an issue, it's a pretty minor one from my own (limited) experience using CoffeeScript. It's extremely easy to relate the JS errors to your CS code, since the JS generated is so readable. It's just a little tedious to have to take a look at the generated JS, but not that bad really. Try it out for some node.js stuff and see for yourself.
Also, the issue is being fixed for Chrome/Firefox, which is good enough for dev work.
CoffeeScript creates very readable JS code. I haven't seen it do much funky stuff. It mostly just adds ";" after statements, and gets rid of the "variables are local unless you * explicitly* declare them local" thing, and so on.
That's surprising -- my impression has been that one tradeoff with Node.js vs. frameworks like Rails & Django is a lot more work to implement functionality they ship with out of the box -- it works at a much lower level.
It also tends to be slower going for a while as you get accustomed to the non-procedural approach.
I'm one of the LinkedIn engineers who rewrote the mobile server in node. One of our biggest concerns going into this was the amount of infrastructure work we might have to develop to get us back to par with what we had with rails. We were pleasantly surprised by how much was already there. The Express framework, in particular, was very nice in terms of features and quality (it's basically Sinatra for node). For most of what we needed, we were able to find a node module that could do the job.
We were able to develop this quickly for a couple of reasons. One was that we already knew rails. We used a lot of rails patterns throughout our project. Any rails dev who looked at our directory structure would be feel at home. Another reason was that we had a good understanding of our domain. We were able to anticipate many features and tasks that would have been been expensive (time-wise) to implement later. Another reason is that had 2 week release cycles with demo-able features at the end. This kept things moving at a good pace.
I'm confused -- isn't Node.js's ruby equivalent EventMachine? Why are they comparing Node.js, an asynchronous I/O library, with a MVC web framework? I don't think this is a fair comparison unless they tell us the MVC framework their Node.js is using, and the server stack their Rails app was using.
Depends on the traffic you need to work with. I've kept an eye on Node.js for some time. So far, among the failures involved with Node.js are Plurk and SyncPad. Plurk switched to JBoss Netty, while SyncPad used Erlang instead. Though that was almost a year ago, things may have improved.
Not to bad-mouth Node.js here. Just to give some counter examples to anybody who wants to try Node.js in production.
It seems running multiple instance of node.js and use a more updated version would helps (maybe).
Anyway, I always think that lack of threading support is definitely a feature instead of incompetency. Threading is hard and average developers should avoid it. (Unless you are doing scientific computing). If you need to scale, eventually you will need multiple servers anyway.
Do you mind to explain a little bit more on why you are addicted to JBoss Netty?
I don't mean to hi-jack Node thread but I'm curious since I use Java day-to-day and would love to hear more about non-web-app framework type of activities in Java.
We were using TIBCO with JMS API to process data as a part of batch job. The batch job would send a message to a JMS Queue; the listener would then process the message and insert it into a db. Replace the system with Netty based RPC servers. Now the batch job makes async RPC calls to the Netty RPC server. The overall system is now several times faster. And the error handling is now much more elegant . Not to mention that not having to deal with our enterprise infrastructure(Tibco team) guys is an added bonus.
EventMachine was definitely on the table, but a big reason we chose node was that it's asynchronous all the way through. Node modules tend to be asynchronous by default.
In other evented frameworks, it's hard to make sure that all of your code is asynchronous (especially when you use existing libraries). If your server runs into a slow, synchronous function at a critical time, it might come to a screeching halt and then fall over.
This is pretty cool to see that LinkedIn used node.js for their mobile interface for pretty much the exact same reasons I used node for the last mobile interface I built. To echo the article's sentiment, node works really well when you are interfacing with a bunch of other services.
Can you explain a little bit more regarding node ability to interface with services?
Let's say I have a few services, Service A that does Invoicing, Service B that does Payments, and the system has to communicate with 2-3 web-services as well. Let's say there is a need to create some sort of "portal"-ish solution (mobile or not). In this situation, what would be the advantage using Node?
Would you mind to share when Node is not the right choice as well?
I'm interested to know more about Node :)
UPDATE: Thank you for the replies.
FYI, I do understand the concept of node.js in terms of the async callback and the argument of client-side and server-side use the same code.
I'm not a heavy Rails user (more of a Java/Python guy) but when it comes to creating a typical CRUD web-app, Rails would be more productive?
The advantage is that Node uses async/non-blocking IO.
For example, say your portal needs to collect info from 3 sources A, B, and C. In a synchronous system, you would have to request service A, wait for a response, then request B, wait, and so on. This means your request is utilizing the system for all that time, unable to serve other requests, without extra threads or processes.
Node, and other async systems, when they call for IO, like a service, a callback is registered for the result. So instead of having to wait for the result, the system becomes available for other actions, like other requests. So in this case you could request service A, B, and C all at once, and while you are waiting for the responses, the system can be handling other requests. When any of those services completes, it calls back to your code so you can handle the results, and give a response to the client.
So the advantage is that instead of a request taking A + B + C + extra time, it can take max(A, B, C) + extra time to serve the request, while serving other requests concurrently.
Node is not the only way to achieve this, many async systems exist like Tornado for python, EventMachine for ruby, and many others. But the JavaScript in node can be particularly fun to work with especially if you are also doing the front-end JavaScript, as it pretty much brings the context-switching in your brain to almost nothing.
Great explanation--the light bulb finally went on for me regarding node.js.
Regarding the other framework event models (esp. EventMachine), is it beneficial or wise to use node for the I/O intensive stuff and another framework (e.g. rails) for the rest? Or does it make more sense to stay in the Ruby world? I'm sure there are lots of caveats to this question but it would be nice to see how node.js can integrate with existing web frameworks rather than trying to build everything with node.
I've had this question myself - how much of the benefit of Node is because of Node, how much is because of JS/V8, and how much is because of evented programming generally?
For instance, if you switch from Rails to EventMachine, will you get most of the benefits of Node, or will you still be bogged down by Ruby's speed and resource consumption?
A lot of it is thanks to V8 I think.
But an issue commonly raised with EventMachine (and gevent etc. on Python) is that many common libraries contain blocking code that you cant get rid of, whereas Node modules have all been programmed to avoid blocking at all costs.
Here's one thing that never made sense to me: lets say a request comes in and data needs to be collected from 3 different sources, A, B, and C before a response can be given back to the client. You say that it would take max(A, B, C) in order to retrieve all the data. Consider this pretty standard snippet:
function get(source, callback) {
var result = getData(source);
callback(result);
}
get(A, function(result) {
get(B, function(result) {
get(C, function(result) {
// CREATE RESPONSE WITH A, B, AND C HERE
});
});
});
Wouldn't you have to wait for the data to be retrieved before running the next callback resulting in a time of A + B + C anyways? Am I missing something about the way to retrieve data from multiple sources? I don't see how max(A, B, C) is possible while still knowing for sure when all the data has been collected.
This method has basically turned an async request into a synchronous request. One idea is that you make requests A, B, and C concurrently, parse the data, then have a handler which waits for the 3 to complete and then combines them into a single response, as opposed to cascading all the callbacks.
I understand that, but I feel like a majority of the javascript async code I've seen has all been in this format. By chance, do you have an example of handler which would wait for multiple async requests to complete?
There are two distinct issues here : (a) throughput (b) concurrency -- how many concurrent incoming requests you can handle. Event I/O based frameworks help you with maximizing concurrency.
The solution that synchronous systems use is either creating more processes or threads. But the problem is you either end up wasting resources (as is the case with one process per request), or have to figure out how to share those resources concurrently without shooting yourself in the foot (and thread-safe programming isn't always easy).
I'll try dumb it down, not perfect but here goes. Node.JS is just an async framework that runs on v8. A good use-case (as you can see in the article) is for handling web requests, just like rails.
Services are software built to handle requests. You can build both consumers and providers of requests using Node.JS.
Tried to at least give you some small idea to get you started. The explanation is NOT perfect by any means. In fact I think I may have down played it. Either way, read the docs.
If you're talking to a bunch of different services, you don't want blocking I/O - Node.js is particularly good at nonblocking I/O, so it's great for gluing things from external services together.
(as always with Node, you can achieve the same benefit using nonblocking I/O frameworks for other languages, e.g. Twisted or gevent in Python, EventMachine in Ruby, etc)
Hehe, came back 4 hours later to see that quite a few people have answered your question :) In regards to doing a typical CRUD app: I don't think there's a huge amount of work in that area yet. Geddy (http://geddyjs.org/) is a rails-esque framework, but it looks a bit stale, considering that it's only had 5 commits since the beginning of the year. I never used it though, so it might just be that it's rather feature -complete, and the developer is just being cautious about pushing to the master repo on github.
I haven't used Node, but my understanding is it is event-driven and excels at asynchronous requests. So in your example, Node could asynchronously communicate with the services and raise events when each is complete. Obviously hitting each service simultaneously speeds things up dramatically, as well as being able to do other things while waiting.
> No. You can spin up multiple processes and handle the load with (for example) cluster[1].
Which is what I speculate they're doing, as they say they're running just four instances. Probably, they have servers capable of running 4 threads at the same time.
You can, but one reason nginx isn't used as much is that it does not fully support HTTP 1.1.
As a result, for example, although it's possible, it's not trivial to proxy WebSocket connections, which are commonly used with node.js apps. Streaming uploads/downloads have similar challenges.
> does not fully support HTTP 1.1. As a result, for example, although it's possible, it's not trivial to proxy WebSocket connections
Non-sequitur. WebSocket protocol is not HTTP/1.1.
WebSocket does have HTTP-like handshake as a hack, but it does not give required behavior in spec-compliant HTTP/1.1 clients, e.g. RFC 2616 requires connection to be closed after Upgrade is sent. WebSockets needs it open. HTTP requires proxies seeing Connection:Upgrade to remove the Upgrade header. WebSocket clients require this header to be present.
So basically a fully HTTP/1.1 compliant proxy is unable to proxy WebSocket connections by design.
If you need only fast downstream communication, you can use Server-Sent Events though.
Could someone explain the following: "Connections are all stored locally, also for speed and so if you’re offline, you can still access them."? I'm confused as to how a connection to a remote resource can be accessed offline. Is all of the data cached locally? Further down they mention "We don’t use the browser’s caching system" so I'm assuming they have custom built the cache.
They're likely storing everything (a JSON file with all Connection info)in LocalStorage on the mobile device. That info can be viewed/edited without a data connection, then can be sync'd the next time they're online. This is a hybrid app (as opposed to a web app or native app) that uses some native elements and some web app elements; so it's possible they're using native storage (which should be more performant and reliable).
I'd love to see someone put together a guide to Node and the current javascript world for the Rails developer. I've looked at Node a few times, but it's so much lower-level than Rails...I'd think it would be more valid to compare it to Rack. What about the other components of a Rails app? What should you use as an ORM? What about views? What about routing? Etc, etc.
Here's a good list of available modules https://github.com/joyent/node/wiki/modules. If you get involved in the community you'll quickly discover which ones are widely used in your area of concern. I've played with express http://expressjs.com/ routing and views with the jade https://github.com/visionmedia/jade view engine. I found them quite easy to learn. When I got into node I also got into MongoDB http://www.mongodb.org/ so I haven't played with any ORM tools because there was no mapping of objects to a relation db needed, no mapping at all really since I pretty much keep everything in JSON format from front to back.
Yes, synchronous code is the norm for many. In Java, which is where I'm from, this is due to a limitation on the JEE spec. While some servers do provide thread pools (JBoss and WebSphere), that behavior is beyond the spec and breaks the ability to switch to any container you like.
Some issues Ruby/Rails faces is due to both the language and the framework. It is not uncommon for JEE apps to pull back 10k objects, show 10 to a user and pitch the rest. ORMs like Hibernate tend to be fairly quick, especially when tuned. As a result synch code is not that problematic.
This might start to change with the advent of Scala in JVM on the server. The actors it provides MIGHT make Oracle rethink the spec a bit. But I doubt it.
Though Rails is not known of being high performance, this sentence rings a warning sign in my mind that something must be terribly wrong with their Rails implementation.