Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Elasticsearch is a mess. It's so full of historical warts.

One major problem is that none of their documentation is actually reference documentation -- if you look for the formal schema (for things like mappings and the query DSL), the list of endpoints and their allowed parameters, the full list of settings etc., you won't find them listed anywhere. For example, does "keyword" mappings support the "enabled" property? What does the "index_options" setting actually do when combined with the "index" setting? Hard to tell any of this without trying them out. Turns out "dynamic_templates" mappings support any combination of the above, and will never complain about invalid combinations, whereas property mappings do. The whole environment variable vs Java property mess that you mention also exists.

They do deserve credit for trying to clean it up. The last few releases have been pretty brutal in how they've been deprecating (and later removing) legacy features and tightening the semantics, the newest and most dramatic of which is the deprecation of multiple type mappings per index. And they've been pretty good at explaining what's going to happen. So the warts are getting fewer. On the flip side, you have to follow the release notes religiously if you want to keep up to speed, since each release now tends to remove a bunch of features or add strict validation where there previously was none, and it becomes harder to upgrade. (If you want an important bug fix that hasn't been backported, things could get expensive.)

It's interesting how the Elasticsearch team let their focus be derailed by this new industry obsession with analytics and logs. It's not something ES was originally built for, and it turned out to be good at it mostly by accident. It's not terrible at it, but Elasticsearch shines the most for its original purpose, as a content index with rich full-text search capabilities. (Areas where it works less well include scaling edge cases such as high-cardinality aggregation buckets and high numbers of unique field names.) I wish they'd rather worked on things like joins and fixing the need for the "nested" object type, which is a ridiculous hack, but since those things aren't needed for analytics/logs, they haven't happened.

(Pet peeve time: One problem that rarely gets mentioned is that Elasticsearch's "eventually consistent" model has two parts. There's the part where replicas may be out of sync with primaries, but there's also the problem that on each individual node, index operations don't become visible to queries right away, not until the next segment "refresh", which by default happens every second. There's no API to ask about the refresh state, so right not the only way for a read followed by a write to be consistent is to ask the write to wait for refresh (or force a refresh), which is the opposite of what you want; the wait should be on read, not write. Given that ES now has a sequence number associated with shards, I'm surprised they haven't tied those numbers together with refreshes so you can ask about which sequence number the index is currently "at".)

So I think Elasticsearch is definitely ripe for disruption. I don't know of anything else that is able to compete at the moment, at least not in a single package; Solr isn't really in the same league.



One of my primary grievances with ES is that all security is a (paid) add-on. TLS, even most basic authentication, doesn’t come out of the box. I really expect that from a modern product. (Yeah, I know search-guard exists, it’s still an add-on)


It’s surprising when someone expects consistent ops from elastic when it’s built on something that has none of it (lucene).

At least solr doesn’t pretent to be something it isn’t (database).


>At least solr doesn’t pretent to be something it isn’t (database).

I worked on implementing ElasticSearch at my company and one of the things they mention clearly is ElasticSearch should NEVER be used as source of truth (primary database).


I think I understand where the OP was coming from.

ElasticSearch + Kibana often gets positioned and used as an open source alternative to Splunk. In that context it is in all respects operating as a primary database since often the source logs are transient.


Performance is also a black box. Super fast on small datasets but at scale.. better hope you can pay for that platinum support contract and be prepared to not use all the fancy features like collapse.


But scale is literally why people use ES right? it's expects a cluster almost out of the gate, I felt like I was using it wrong even trying to run only one instance on one machine.


Well, as a NoSQL search engine. But it implements sexy features with heavy performance penalties.


solr is pretty amazing and, for me, more accessible and easier to use




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: