Part of the lesson here is that if you're doing MongoDB on EC2, you should have ...

mitchellh · on April 13, 2012

> Part of the lesson here is that if you're doing MongoDB on EC2, you should have more than enough RAM for your working set.

We had more than enough RAM for our working set. Unfortunately, due to MongoDB's poor memory managed and non-counting B-trees, even our hot data would sometimes be purged out of memory for cold, unused data, causing serious performance degradation.

jasonmccay · on April 13, 2012

I understand your point, but the performance issues still stem off of poor IO performance on Amazon EBS. As we continue to use it, we continue to find it to be the source of most people's woes.

If you have solid (even reasonable) IO, then moving things in and out of working memory is not painful. We have some customers on non-EBS spindles that have very large working sets (as compared to memory) ... faulting 400-500 times per second, and hardly notice performance slow downs.

I think your suggestions are legit, but faulting performance has just as much to do with IO congestion. That applies to insert/update performance as well.

ismarc · on April 13, 2012

We are using Mongo in ec2 and raid10 with 6 ebs drives out performs ephemeral disks when the dataset won't fit in RAM in a raw upsert scenario (our actual data, loading in historical data). The use if mmap and relying on the OS to page in/out the appropriate portions is painful, particularly because we end up with a lot of moves (padding factor varies between 1.8 and 1.9 and because of our dataset, using a large field on insert and clearing in update was less performant than upserts and moves).

There's really two knobs to turn on Mongo, RAM and disk speed. Our particular cluster doesn't have enough RAM for the dataset to fit in memory, but could double its performance (or more) if each key range was mmapped individually rather than the entire datastore the shard is responsible for just because of how the OS manages pages. We haven't broken down to implement it yet, but with the performance vs. cost tradeoffs, we may have to pretty soon.

sausagefeet · on April 14, 2012

> but the performance issues still stem off of poor IO performance on Amazon EBS

But the point is it shouldn't need to do that I/O in the first place.

latch · on April 14, 2012

I'm not sure why people are using EBS with their databases. If you already have replication properly set up, what does it buy you except for performance problems?

Chris Westin, of 10gen, blogged about this a while ago: https://www.bookofbrilliantthings.com/blog/what-is-amazon-eb...

In fairness though, 10gen's official stance is to use EBS. I think that's a mistake, and I think maybe they do it for extra safety.

gatesvp1 · on April 16, 2012

The big thing here is cost.

> If you put the above rules together, you can see that the minimum MySQL deployment is four servers: two in each of two colos...

The ideal scenario is to have 4 "fully equipped" nodes, 2 in each data center. That means having 3 pieces of expensive "by the hour" hardware sitting around doing basically nothing. (and paying 4-5k / computer for MongoDB licenses)

In that scenario you can have everything on instance store and live with 4 copies on volatile storage.

Of course, no start-up wants to commit that many resources to a project. It's far cheaper just to use EBS and assume that the data there is "safe". Is it bad practice, would I avoid EBS like the plague? You bet!

But it's definitely cheaper and that's hard to beat.