One of the Hydro creators here. Ballista (and the ecosystem around Arrow and Parquet) are much more focused on analytical query processing whereas Hydro is bringing the concepts from the query processing world to the implementation of distributed systems. Our goal isn't to execute a SQL query, but rather to treat your distributed systems code (e.g a microservice implementation) like it is a SQL query. Integration with Arrow and Parquet are definitely planned in our roadmap though!
One of the creators of Hydro here. Yeah, one way to think about Hydro is bringing the dataflow/query optimization/distributed execution ideas from databases and data science to programming distributed systems. We are focused on executing latency-critical longrunning services in this way though rather than individual queries. The kinds of things we have implemented in Hydro include a key-value store and the Paxos protocol, but these compile down to dataflow just like a Spark or SQL query does!
There is a nice article by David Patterson (who used to direct the lab and won the Turing Award) on why Berkeley changes the name and scope of the lab every five years https://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-... . Unfortunately, there's no good name for the lab across each of the five-year boundaries so people just say "rise lab" or "amp lab" etc.
> Good Commandment 3. Thou shalt limit the duration of a center. ...
> To hit home runs, it’s wise to have many at bats. ...
> It’s hard to predict information technology trends much longer than five years. ...
> US Graduate student lifetimes are about five years. ...
> You need a decade after a center finishes to judge if it was a home run. Just 8 of the
12 centers in Table I are old enough, and only 3 of them—RISC, RAID, and the
Network of Workstations center—could be considered home runs. If slugging .375
is good, then I’m glad that I had many 5-‐year centers rather than fewer long ones.
A researchy perspective: Datalog was invented to extend relational algebra with recursion. Since it started out as an academic tool, people have been studying recursion-specific optimizations you can do for decades so it is extremely well suited to recursive use-cases e.g. iterative graph algorithms. Using Datalog for network algorithms won the thesis award in databases almost 20 years ago https://boonloo.cis.upenn.edu/papers/boon_interview.pdf .
Agreed, my understanding is that Datalog has a distinct (though related) lineage that directly emerged from Prolog (i.e. logic programming, not relational algebra / database theory) - skimming the introduction of "Horn Clauses and the Fixpoint Query Hierarchy (1982)" seems to confirm this: https://dl.acm.org/doi/pdf/10.1145/588111.588137
Edit: this presentation describes things differently but it doesn't sound quite right to me "Chandra and Harel - 1982
Studied the expressive power of logic programs without
function symbols on relational databases" https://www.dbai.tuwien.ac.at/datalog2.0/slides/Kolaitis.pdf
Yeah I'm not up on the Prolog history side of things. My info is based on the Wikipedia article for Fixed Point Logic: "Least fixed-point logic was first studied systematically by Yiannis N. Moschovakis in 1974,[1] and it was introduced to computer scientists in 1979, when Alfred Aho and Jeffrey Ullman suggested fixed-point logic as an expressive database query language.[2]" [2] = Universality of Data Retrieval Languages : https://dl.acm.org/doi/10.1145/567752.567763
Yeah, if the OP can give a reference I'd be very interested, too. I've searched for the "original" reference to datalog because I wanted to cite it, but I couldn't find anything like that.
I have a sneaking suspicion that "function-free Prolog" is as old as ordinary Prolog, and "datalog", as an idea separate to Prolog and used as a database language, was born in the database community, but like the OP I have no reference to this.