We are actually working on a storage service called RESTBase (https://www.mediawiki.org/wiki/RESTBase). RESTBase has pluggable table storage backends, starting with Cassandra. Other backends will scale this down for small installations.
The medium-term goal is to store all revisions as HTML, so that views and HTML-based saves can become storage-bound operations with low double-digit ms latencies. This will mostly eliminate the remaining 2-4% of cache misses, and should get us closer to actual DC fail-over without completely cold caches.
There is still a large amount of work to be done until all the puzzle pieces for this are in place, but we are hard at work to make it a reality. A major one, the bidirectional conversion between wikitext and HTML in Parsoid (https://www.mediawiki.org/wiki/Parsoid), is already powering VisualEditor and a bunch of other projects (https://www.mediawiki.org/wiki/Parsoid/Users). Watch this space ;)
Thanks for letting me know... I had some exposure to C* while at GoDaddy, and it would seem to be a great fit for what you are talking about...
GD is using it to store user-generated content, and it works very well, most responses (lookup and processing) were sub 20ms consistently under heavy load... (I'm hoping they write a DNS service that does similar).
They started with MySQL. Replacing that with MariaDB is a whole lot simpler (not to mention, less risky!) than replacing it with something completely different like Cassandra.
MariaDB is MySQL compatible. Had they gone with Cassandra, they would have had to replace the PHP-based MediaWiki software with something else. So not only the database, the entire software stack would have to be replaced as well.
Also, speaking from a few years' personal experience with HBase / Cassandra, the support for non-Java languages on these two NoSQL databases is small (though it is getting better.)
I worked on a project that used C* with node.js and it worked out very well... cluster of node servers, with minimal processing over a cluster of C* servers... very fast response times under some really big load.
Though node.js tends to be very flexible in terms of wrapping a friendlier interface around a less friendly one.
Why? The vast majority of the requests they serve are cached at multiple layers above the DB. Switching to an entirely different DB would require rewriting a lot of code, a much larger learning curve to contribute (MediaWiki is FOSS), and limit its use to large sites (MediaWiki is widely used by small sites).
I'm not familiar enough with their data structure to comment, but the concern would seem to be around write performance... if they have to use sharding in an rdbms like my/mariadb, then it's not too much harder to change databases at that point.. not to mention improvements with distributed reads on a cache miss..
I understand that... but would have assumed they would have changed over time. Plenty of platforms have migrated backends to handle much larger load (twitter, facebook, etc, etc).
Wikimedia's resources are over two orders of magnitude smaller than those of the other top-10 web sites. It is only fairly recently that they have more than a handful paid developers. And they have a huge amount of data they would have to convert. In fact, I think they would have been up to it, but there simply were more pressing concerns.