Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

More generally, the point is that if you have two locations where the fastest you can send a unit of data is t, then the shortest possible time interval you can pass a transaction from one location to another is t.

That puts a upper (EDIT: corrected from lower) bound on throughput of many types of operations, such as e.g. if you need to guarantee strict ordering of transactions (EDIT: in the worst case, where transactions are being issued from multiple nodes; best case you can optimize in a variety of ways).

So while we won't have to deal with interplanetary latencies, it still matters greatly for throughput and consistency guarantees.



Even assuming you meant "upper bound" instead of "lower bound", this isn't strictly true in theory. To achieve high throughput with large latency between nodes, all you have to do is group transactions into large batches and reach consensus on how they should be ordered (for example, you can run paxos to decide what a batch consists of, and then use any method you like to order within the batch). Running paxos at interplanetary latencies would obviously take some time, but since you can have multiple rounds in progress at once (or increase the size of each batch to be much larger than the latency), throughput isn't limited by latency. In practice, most people care a lot about latency, and achieving the desired latency puts a limit on throughput. See the Calvin DB paper for a more thorough explanation.


Yes, upper bound. Lower bound on time it takes for the transaction to be globally available.

You're right that there are operations that you can increase throughput of, and I was imprecise in that you are right you can order them "after the fact", but that's not very interesting. The scenario I had in mind was where clients talking to different nodes are issuing transactions that depend on the same items of data.

In that case you either need to obtain a lock, in which case you best case need to wait for the latency interval (plus a margin or largest possible clock skew) to see if the other side wants a lock on the same object, or you need to be optimistically firing off transactions, but the best case then is that you just get your transaction in right before the remote side start operating on it, and they happen to just fire off another transaction, and so on. In reality there'd be slowdowns.

If you're dealing with uncontended objects, you can do much better on average, but you can't guarantee better than the latency between nodes.

There are certainly tons of special cases where you can optimize.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: