>> I really don't see how any of this is relevant to the original point of having "log" files.
> Maybe you're not aware: every major database keeps a commit log, and writes it to a file called a log file (sometimes "write-ahead log file").
> You do realize that SQLite can't just corrupt your database because you ran two SQLites concurrently, right?
Sure, but, so? I don't know sqlite's WAL logging code in detail, but I do know PostgreSQL's fairly intimately. I don't see how such an interface[1] would be relevant for WAL logging. Such logs usually have checksums and pointers to previous records in their format. For those to be correct each writing process needs to know about previous records (or at least their starting point). Thus you need locking and coordination in userspace anyway - kernel level append mechanics aren't that interesting.
In addition to that, if you care about performance, you'll want to pre-allocate the WAL files and possibly re-use them after a checkpoint. For many filesystems overwriting files is a lot more efficient than allocating new blocks. It avoids the need for fs-internal metadata journaling, avoids fragmentation etc.. With pre-allocated files you then can use fdatasync() instead of fsync(), which can be considerable performance benefit in our experience.
There are things that'd make it easier and more efficient to write correct and efficient journaling, but imo not what you were talking about. Querying and actually getting guarantees about which size of writes are atomic, for example; otherwise you need to use rather expensive workarounds like WAL logging full page contents after checkpoints, or double-write buffers.
Proper asynchronous fsync(), fdatasync() would also be rather useful.
[1]
> I'm basically talking about compare and swap, except instead of compare and swap its compare and append
> You do realize that SQLite can't just corrupt your database because you ran two SQLites concurrently, right?
Sure, but, so? I don't know sqlite's WAL logging code in detail, but I do know PostgreSQL's fairly intimately. I don't see how such an interface[1] would be relevant for WAL logging. Such logs usually have checksums and pointers to previous records in their format. For those to be correct each writing process needs to know about previous records (or at least their starting point). Thus you need locking and coordination in userspace anyway - kernel level append mechanics aren't that interesting.
In addition to that, if you care about performance, you'll want to pre-allocate the WAL files and possibly re-use them after a checkpoint. For many filesystems overwriting files is a lot more efficient than allocating new blocks. It avoids the need for fs-internal metadata journaling, avoids fragmentation etc.. With pre-allocated files you then can use fdatasync() instead of fsync(), which can be considerable performance benefit in our experience.
There are things that'd make it easier and more efficient to write correct and efficient journaling, but imo not what you were talking about. Querying and actually getting guarantees about which size of writes are atomic, for example; otherwise you need to use rather expensive workarounds like WAL logging full page contents after checkpoints, or double-write buffers.
Proper asynchronous fsync(), fdatasync() would also be rather useful.
[1] > I'm basically talking about compare and swap, except instead of compare and swap its compare and append