Hacker Newsnew | past | comments | ask | show | jobs | submit | tee-es-gee's commentslogin

It looks like pgBackRest will likely continue, multiple companies are stepping up with sponsorships. Mentioning this just in case anyone is making plans to move away, it's probably worth waiting a bit for things to settle.


I do think that as service providers we now have a new "attack vector" to be worried about. Up to now, having an API that deletes the whole volume, including backups, might have been acceptable, because generally users won't do such a destructive action via the API or if they do, they likely understand the consequences. Or at the very least don't complain if they do it without reading the docs carefully enough.

But now agents are overly eager to solve the problem and can be quite resourceful in finding an API to "start from clean-slate" to fix it.


> Up to now, having an API that deletes the whole volume, including backups, might have been acceptable

It was never acceptable, major service providers figured this out long time ago and added all sorts of guardrails long before LLMs. Other providers will learn from their own mistakes, or not.


> Up to now, having an API that deletes the whole volume, including backups, might have been acceptable,

So? I have those too; the difference is that:

1. The API is ACL'ed up the wazoo to ensure only a superuser can do it.

2. The purging of data is scheduled for 24h into the future while the unlinking is done immediately.

3. I don't advertise the API as suitable for agent interaction.


it's a great source of schadenfreude though, I love watching vibecoders get their shit nuked


I agree and hope this is the case for anything serious enough. I also don't see this changing any time soon.

There are ways to give safe access to the data, at least read-only, that don't involve production risk and don't sacrifice privacy. For example, database branches with anonymization. Instead of accessing the prod/staging db, the agent creates a branch and has read/write access to that.

(disclaimer: I work at Xata, where we offer copy-on-write branches for Postgres, and the agent use-cases are the most popular right now)


Looks interesting! Do you have ClickBench results or similar?

> Everything in core, no extensions. HTTP(S), S3 (anonymous public reads), Avro, Excel, Arrow, and SQLite read through the same core binary - no separate install/load step.

That is not so good for an embedded database, though, opens security concerns.


I will follow this one for sure. There are a few more companies with the extremely ambitious goal of "a better AWS", and I am interested in the various strategies they take to approach that goal incrementally.

A service offering VMs for $20 is a long way from AWS, but I see how it makes sense as a first step. AWS also started with EC2, but in a completely different environment with no competition.


Nice article! From what I understand, you computed the bloom values in the application layer, right? Would https://www.postgresql.org/docs/current/bloom.html have worked as well?


We have an overview on how it works here: https://xata.io/blog/open-source-postgres-branching-copy-on-...


> You don’t need anything but vanilla pg and a supported file system to do it anymore; just clone the database using a template and a newish version of Postgres.

Are you referring to `file_copy_method = clone` from Postgres 18? For example: https://boringsql.com/posts/instant-database-clones/

I think the key limitation is:

> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one.


Yeah, that's the one. My use case is largely for local development, so the active connections thing isn't a limiter for me.


Xata is open source btw (open core): https://github.com/xataio/xata


Came here to say this :) Anyone using Xata here?


For context for the others, I think you are referring to this blog post: https://xata.io/blog/open-source-postgres-branching-copy-on-... (in particular the "The key is in the storage system" section) right?

What I'm saying there is that if you do Postgres with on top of a local ZFS volume, the child branches Postgres instances need to be on the same server. So you are limited in how many branches you can do. One or two are fine, but if you want to do a branch per PR, that will likely not work.

If you separate the compute from storage via the network, this problem goes away.


ZFS snapshots can be transmitted over the network, with some diff-only and deduplication gains if the remote destination has an older instance of the same ZFS filesystem. It’s not perfect, and the worst case is still a full copy, but the tooling and efficiency wins for the ordinary case are battle-tested and capable.


Yes, for sure, and stuff like this is really useful when rebalancing storage nodes, for example.

My point is that for the use case of offering a Postgres service with CoW branching as a key feature, you can't really escape some form of separation of storage and compute.

Btw, don't really want to talk too much about it yet, but our proprietary storage engine (Xatastor) is basically ZFS exposed over NVMe-OF. We'll announce it in a couple of weeks, and we'll have a detailed technical blog post then on pros/cons.


Yes, that's what I'm referring to.

You're still making the assumption in this comment: why does my 2nd (cloned) database need a separate postgres instance? One postgres server can host multiple databases.


Got it, yes, I've seen in the other comment that you're referring to the new Postgres 18 feature. If that works for you in local dev, so much the better :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: