For HPC code that aims to run on intel cpu. Do you recommend compiler like intel OneAPI c compiler over LLVM or GCC? Before one starting to profile / invest time on reading manual on specific compiler.
Beside data structure based on B+ tree (ropes), what do you think about Relaxed Radix Balanced Vector tree[0]? It is cache-friendly and immutable as well.
Looks pretty neat, thanks for the link. From a quick look, it seems like the performance characteristics will be pretty similar to a xi-style rope (it seems extremely unlikely to me that the time spent finding the child is significant, so the radix trick can't save much time), and it's a bit more complex. That said, I do expect it to perform nicely for read-only access to large documents, which is definitely an important use case.
It would make a fun project for somebody to implement it and compare the performance. I'd certainly take the PR for it if the performance was better :)
[edit: followup] The rope implementation in xi has an additional heuristic that tries not to split lines across leaf boundaries (ie most leaves should end in a newline). It also has a hard constraint of not splitting a unicode codepoint. Thus, leaves and subtrees would have an unpredictable number of elements (as opposed to being a clean power of two when full) and I think that pretty much invalidates using the radix to select the child.
As a quick answer, the name comes from being able to recover data when some of it is "erased".
The only way to durably store data so that it survives a hardware failure (e.g. drive dying) is to store more than one copy. Full replicas are the simplest way to do this, but you've got a relatively high overhead (e.g. Store 1GB of data with 3x replicas, and you store 3GB of data). Erasure codes are a way to effectively store fractional replicas, so you only use 1.5x or 1.7x of the original data.
Erasure codes are great when you've got a lot of data and you need high durability but don't want to pay for the storage space required for full replicas.
Why don't we always use erasure codes for everything? EC isn't great when you've got small bits of data, and since there's a bit of math involved in reading and writing the EC data, EC has higher latency than simple replicas.