Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is really interesting, but a basic part I don't understand: What would it actually look like to run this on a live dataset?

If I understand correctly: you'd run the recursive query, it produces results for every step, effectively showing you the progression of output over time, and then once it hits "present day", it completes and stops?

How would you generate results going forward? I.E. A minute elapses after the results return, do you have to re-run the whole query for all time?



Good question. I touched upon this in the conclusion. Basically, if you run this in a streaming SQL database, such as Materialize, then you would get a true online system which doesn't restart from scratch.


In a non-streaming db What would prevent you from storing the result set and just using the last iteration to calculate the next?


This is definitely a possibility. What I meant to say is that the implementation in the blog post doesn't support that.


Makes sense, just an insert instead of a select, I used a similar approach on newton-raphson to implement XIRR in SQL and it worked well.


i’m trying to learn here so please pardon my ignorance. wouldn’t a pre-aggregated set affect the new aggregate result? i suppose you could store avg, sum, count and then add the new value(s) to sum, new count to count, and recalculate average from that. or even just avg and count and then re-aggregate as ((avg*count)+new values)/(count + new values count) but i didn’t know if there’s a better way to process new values into a set of data that’s already been aggregated


Yep that would be what I would do - effectively a functional approach where no memory is required besides the current set and iteration.

A big part of materializing datasets for performance is finding the right grain that is both easy enough to calculate and also can do nice things like be resumeable for snapshots.


Did you try running it using DuckDB?


DuckDB is what I used in the blog post. Re-running this query simply recomputes everything from the start. I didn't store intermediary that would allow starting off from where the query stopped. But it's possible!


Does this work with Postgres?


Postgres has excellent support for WITH RECURSIVE, so I see no reason why it wouldn't. However, as I answered elsewhere, you would need to set some stateful stuff up if you don't want the query to start from scratch when you re-run it.


I for one would welcome a second blogpost with the stateful example. I’ve been doing running totals (basically balances based on adding financial tx per account) but had to do it in some client code because I couldn’t figure out how to do a stateful resume in Sql




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: