I am not a Windows guy but I (with help) managed to get IOCP working for this in a basic prototype. Will share publicly soon. I also sketched out an IoRing version (if you are interested in helping debug and flesh that out let me know!).
Main learnings: the IOCP version can't do asynchronous flush! Which we want. The IoRing version can! But it can't do scatter/gather AKA vector I/O yet! Which is an essential feature for buffer pool implementation. So actually I am basically waiting for IoRing to add support for that before taking it too seriously (I can see they are working on it because the ops are present in an enum, it's just that the build functions are missing).
So my guess is that in a year or so we should be able to run all PostgreSQL disk I/O through IoRing on Windows. Maybe?
Another complications is that it really wants to be multithreaded (consuming completions for IOs started in another process requires a lot of hoop jumping, I made it work but...) This will resolve itself naturally with ongoing work to make PostgreSQL multithreaded.
The next problem is that IoRing doesn't support sockets! So in future work on async networking (early prototypes exist) we will likely also need IOCP for that part.
BTW I have patches for PostgreSQL AIO on FreeBSD, which I will propose for v19. It works pretty well! I was trying to keep out of Andres's way for the core architectural stuff and basic features ie didn't want to overload the pipes with confusing new topics for v18 :-)
I had various Psion models in the 90s. Several of the people involved in the design of those things went to do modern-ish devices with Psion series 5-style keyboards, including the Cosmo Communicator. I was curious enough to back the Astro Slide 5G project, which attempted to "reverse the clamshell" to make a device with Psion series 5-style keyboard that closes to resemble a modern standard all-screen rectangular slab. It appears to have gone horribly wrong during final manufacturing at a Chinese factory that closed down during COVID, with thousands of IndieGoGo backers not having received their Astro Slide devices after several years (I am one). A shame. Anyway this article shows the Psion 5 and the Communicator side-by-side. That's a way out of date device now running ancient Android on a slow CPU, and sadly the refresh project seems to have died on the vine :-(
I feel your pain. I'm in the same boat as you (backed the astro slide - didn't get one).
I used the original Planet Computers device with psion-like keyboard, the Gemini, as my daily driver for a few years. The keyboard was no gimmick - it really felt like a quasi mini-laptop. A brilliant device, even considering the weak cpu (as compared to similarly priced phones)
Such a shame ...
(btw, had a psion 2, psion 3, psion 5 & sharp zaurus back in the day, so you can say I'm a sucker for these things)
I suppose this means that they still have a firm intention to produce eventually, hopefully even with an updated chipset given that the original parts are lost ("ODM is effectively unwilling to release remaining produced stock and the pre-purchased chipsets"!) Obviously you know all that, fellow backer, but just in case anyone else is interested in this debacle:
After it was dropped from PostgreSQL, a team from IBM showed up on the mailing list (see one of my other answers for link), so perhaps that is now going to happen! I think it's a case that they were using it, but didn't realise that the project was on the verge of dropping it for years due to lack of interested maintainer & resources. Open source is funny like that. Deleting it was certainly one way to get their attention.
I've been able to debug PostgreSQL issues reported by [closed] Solaris users by booting illumos inside a virtual machine on my laptop in minutes...
I want to keep AIX support, for, well mainly irrational nostalgic reasons -- PostgreSQL used to run on pretty much the entire Unix family tree, and I wrote lots of code on AIX for a decade. But they make it hard. I don't know why an OS vendor wouldn't make an image of an OS available to developers as conveniently as possible (it is even possible to boot recent AIX versions on QEMU if you have the patience... the hypothesis is that they might have done work to make that possible, 'cause it didn't work in earlier versions; but you can't get an image of OS, so shrug).
The easiest legal way to get an AIX box is via IBM Cloud. You can get a tiny AIX box (s922 in Dallas, 0.25 cores, 2GB RAM, 10GB disk) for about US$50 a month. About US$8/month of that is AIX licensing and the rest is the hardware cost. Still a lot cheaper than IBM i licensing, which is US$350/month even for such a low-end system, or IBM Z (which is around US$1600/month for a low-end system).
IBM Cloud does offer some free credits – at the moment US$1000 free VPC credits – but I'm not sure if you can use them on AIX (which isn't part of their main VPC offering, POWER is a separate offering), and even if you can, they aren't going to last forever.
If one is an open source developer doing this on one's own time, as opposed to a business – why spend US$50/month to get an AIX environment? Maybe, if a user of the project really wants AIX support, they could donate that to the project.
We (the denizens of pgsql-hackers@postgresql.org) didn't see any AIX users or developers or other interested parties in our community... There was a group in France, but they disbanded... However, after AIX was dropped in the PostgreSQL 17 development cycle, suddenly we heard from IBM who run products on it. So perhaps it will make a comeback in 18? That was the latest idea, anyway, let's see.
POSTGRES was developed on SunOS (and its close cousin 4.3BSD or 4.2?). It's not so hard to support Solaris, as it was so influential that Linux uses a lot of similar things, including for example the ELF format and linker details, the sort of thing that developers have to maintain. So the burden is lower than (say) AIX. There is also someone from the Solaris community who feeds and waters a modern Solaris build farm animal (buildfarm.postgresql.org). Same goes for illumos (= forked from OpenSolaris, like Solaris 10), people run build farm animals and help test occasionally as required etc, which makes a big difference. That's my take, anyway (speaking as someone who works on PostgreSQL portability).
This is fantastic work, thanks. Hmm, what else... let's see... Xenix also really, really wants to be free! What a magnificent piece of forgotten computer history it is. https://en.wikipedia.org/wiki/Xenix
Once support hits in Linux, a little app of mine[0] will support block cloning for its "roll forward" operation, where all previous snapshots are preserved, but a particular snapshot is rolled forward to the live dataset. Right now, data is simply diff copied in chunks. When this support hits, there will be no need to copy any data. Blocks written to the live dataset can just be references to the underlying snapshot blocks, and no extra space will need to be used.
What does it mean to roll forward? I read the linked Github and I don't get what is happening
> Roll forward to a previous ZFS snapshot, instead of rolling back (this avoids destroying interstitial snapshots):
sudo httm --roll-forward=rpool/scratch@snap_2023-04-01-15:26:06_httmSnapFileMount
[sudo] password for kimono:
httm took a pre-execution snapshot named: rpool/scratch@snap_pre_2023-04-01-15:27:38_httmSnapRollForward
...
httm roll forward completed successfully.
httm took a post-execution snapshot named: rpool/scratch@snap_post_2023-04-01-15:28:40_:snap_2023-04-01-15:26:06_httmSnapFileMount:_httmSnapRollForward
--roll-forward="snap_name"
traditionally 'zfs rollback' is a destructive operation, whereas httm roll-forward is non-destructive. httm will copy only files and their attributes that have changed since a specified snapshot, from that snapshot, to its live dataset. httm will also take two precautionary snapshots, one before and one after the copy. Should the roll forward fail for any reason, httm will roll back to the pre-execution state. Note: This is a ZFS only option which requires super user privileges.
I might also add 'zfs rollback' is a destructive operation because it destroys snapshots between the current live version of the filesystem and the rollback snapshot target (the 'interstitial' snapshots). Imagine you have a ransom-ware installed and you need to rollback, but you want to view the ransomware's operations through snapshots for forensic purposes. You can do that.
It's also faster than a checksummed rsync, because it makes a determination based on the underlying ZFS checksums, or more accurate than a non-checksummed rsync.
This is a relatively minor feature re: httm. I recommend installing and playing around with it a bit.
What I don't understand is: aren't zfs snapshots writable, like in btrfs?
If I wanted to rollback the live filesystem into a previous snapshot, why couldn't I just start writing into the snapshot instead? (Or create another snapshot that is a clone of the old one, and write into it)
> What I don't understand is: aren't zfs snapshots writable, like in btrfs?
ZFS snapshots, following the historic meaning of "snapshot", are read-only. ZFS supports cloning of a read-only snapshot to a writable volume/file system.
Btrfs is actually the one 'corrupting' the already-accepted nomenclature of snapshots meaning a read-only copy of the data.
I would assume the etymology of the file system concept of a "snapshot" derives from photography, where something is frozen at a particular moment of time:
> In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography. […] To avoid downtime, high-availability systems may instead perform the backup on a snapshot—a read-only copy of the data set frozen at a point in time—and allow applications to continue writing to their data. Most snapshot implementations are efficient and can create snapshots in O(1).
Main learnings: the IOCP version can't do asynchronous flush! Which we want. The IoRing version can! But it can't do scatter/gather AKA vector I/O yet! Which is an essential feature for buffer pool implementation. So actually I am basically waiting for IoRing to add support for that before taking it too seriously (I can see they are working on it because the ops are present in an enum, it's just that the build functions are missing).
So my guess is that in a year or so we should be able to run all PostgreSQL disk I/O through IoRing on Windows. Maybe?
Another complications is that it really wants to be multithreaded (consuming completions for IOs started in another process requires a lot of hoop jumping, I made it work but...) This will resolve itself naturally with ongoing work to make PostgreSQL multithreaded.
The next problem is that IoRing doesn't support sockets! So in future work on async networking (early prototypes exist) we will likely also need IOCP for that part.