The short answer, as far as I’m aware, is that no one really knows. The longer answer is that we have a lot of partial answers that, in my mind, basically boil down to: model architectures draw a walk through the high dimensional vector space of concepts, and we’ve tuned them to land on the right answer. The fact that they do so consistently says something about how we encode logic in language and the effectiveness of these embedding/latent spaces.
You mention in the post that there are design differences between Hegel/Hypothesis and QuickCheck, partly due to attitude differences between Python/non-Haskell programmers and Haskell programmers. As someone coming from the Haskell world (though by no means considering Haskell a perfect language), could you expand on what kinds of differences these are?
So I think a short list of big API differences are something like:
* Hypothesis/Hegel are very much focused on using test assertions rather than a single property that can be true or false. This naturally drives a style that is much more like "normal" testing, but also has the advantage that you can distinguish between different types of failing test. We don't go too hard on this, but both Hegel and Hypothesis will report multiple distinct failures if your test can fail in multiple ways.
* Hegelothesis's data generation and how it interacts with testing is much more flexible and basically fully imperative. You can basically generate whatever data you like wherever in your test you like, freely interleaving data generation and test execution.
* QuickCheck is very much type-first and explicit generators as an afterthought. I think this is mostly a mistake even in Haskell, but in languages where "just wrap your thing in a newtype and define a custom implementation for it" will get you a "did you just tell me to go fuck myself?" response, it's a nonstarter. Hygel is generator first, and you can get the default generator for a type if you want but it's mostly a convenience function with the assumption that you're going to want a real generator specification at some point soon.
From an implementation point of view, and what enables the big conveniences, Hypothesis has a uniform underlying representation of test cases and does all its operations on them. This means you get:
* Test caching (if you rerun a failing test, it will immediately fail in the same way with the previously shrunk example)
* Validity guarantees on shrinking (your shrunk test case will always be ones your generators could have produced. It's a huge footgun in QuickCheck that you can shrink to an invalid test case)
* Automatically improving the quality of your generators, never having to write your own shrinkers, and a whole bunch of other quality of life improvements that the universal representation lets us implement once and users don't have to care about.
The validity thing in particular is a huge pain point for a lot of users of PBT, and is what drove a lot of the core Hypothesis model to make sure that this problem could never happen.
The test caching is because I personally hated rerunning tests and not knowing whether it was just a coincidence that they were passing this time or that the test case had changed.
It is a faithful translation of the original Dutch. Dutch is structurally very similar to English so this type of nuance carries over pretty much intact.
Dutch: “Dat was niet enkel onzorgvuldig, het was fout.”
English: “That was not just careless—it was wrong.”
I’d say the only difference is the em dash.
Whether you consider it proof of AI is up to y’all.
I mean that whatever signal you think you get from the sentence structure was not introduced by (potentially) automated translation. The sentence structure is precisely the same in the original Dutch.
His non-apology apology even follows a familiar pattern: I wrote it myself but just used AI for some help, and it inserted false quotes! Bad tech! But I have now learned my lesson!
Very similar to what a rector recently wrote when she got busted giving an AI-generated speech in her inaugural speech in her new university job.
None of it is true, of course. These people are just sorry they got caught.
Oh, I had completely missed that he’d done this in a Dutch paper too. All I’d heard of it previously was that his articles had been pulled by the Irish Independent, so I was expecting the apology to be in English.
I’m tempted to agree, but this is a case where I think there’s more human than AI. Maybe he used LLMs for a bit, and changed parts of it. Maybe he is patient zero for LLM speak?
The Senate is, while not the whole story, a significant part of the reason the government constantly fails to do what is either the desire of the people or what's in their interests. I wouldn't lament losing the Senate.
The US Senate is designed to check and balance the House of Representatives. But that often puts the Congress as a whole in deadlock, meaning it can no longer balance the other two branches.
When they could get anything done they delegated a lot of power to the Executive. Which worked ok, but eventually a "unitary executive" appropriated even more power, and the Legislature is powerless to prevent it.
Unpopular opinion: deadlock is fine. Most legislation is bad. What really matters is the budget. And the rule that failing to pass a budget can automatically force an election avoids the absurd US "shutdown" that isn't a shutdown.
This is now my second favorite idea, after a nationwide ban of first past the post voting schemes.
My third (previously second) is outlawing political parties. The problem with that one is it would be really difficult to implement in a way that doesn't run afoul of freedom of association and freedom of speech. Probably worth figuring out though.
Voting system reform would probably mitigate the worst aspects of political parties.
Egypt after ousting Mubarak held an election where a third of seats were reserved for independents. Most winning candidates were just Muslim Brotherhood affiliated. I suspect the military interim government did that deliberately to justify their later coup.
This is where the intra-party coalitions become important. Every party of significant size has them. Labour is effectively a coalition between a rightwing faction (New Labour/Blue Labour) and everyone else who is more leftwing. The internal and external debate is the question: should they focus "right" (immigrant and queerbashing, welfare cuts) to appease the right wing of the party and try to pick up Reform/Conservative voters, or focus "left" on their base and people who are switching to Green?
On the other hand, voting needs to mean something. If voting doesn't mean anything, because the whole system is held in a vice grip by a sclerotic institution playing power games with itself, then the broader system eventually collapses.
My personal opinion is that Mitch McConnell's intransigence and unwillingness to do anything lest Obama get credit for it led directly to an increased desire for a "strongman"
The Senate was fundamentally from the start a compromise in favor of the slave-owning ogliarchy. You just have to look at free and slave states being admitted in pairs to preserve the status quo of slavery to see how that went.
The Senate gives a rather disproportionate democracy in which the votes of a small number in small states take on disproportionate significance compared to the votes of a large number in populous states.
That still does nothing to refute the parent's complaint about democracy. Lopsided representation is still representation (as opposed to a council of nobles or military generals or whoever).
Also the thing you're objecting to is literally the entire point of the senate from day one. It was intended to give less populous states an equal voice in contrast to the house of representatives. Unfortunately history happened and the house of representatives hasn't been proportional for a long time. But if you're going to complain about something it should probably be the latter rather than the former.
It does not seem fair to say that frequentists do not update their beliefs based on new evidence. This does not seem to accurately capture what the difference between Bayesians and frequentists (or anyone else) is.
I don't like the way it's written, but what they are talking about is completeness in the sense of "Dedekind completeness"; i.e., that given any two sets A and B with everyone in A below everyone in B, there is some number which is simultaneously an upper bound for A and a lower bound for B.
Note that this fails for the rationals: e.g., if we let A be the rationals below sqrt(2) and B be the rationals above sqrt(2).
In school, we talked about “Dedekind cuts” but we never formalized the definition. Kind of disappointed now because your explanation is very simple and elegant.
Note the following comment by Jerry Ling: "The effect goes away if you search properly using the original submission date instead of the most recent submission date. By using most recent submission date, your analysis is biased because we’re so close to the beginning of 2026 so ofc we will see a peak that’s just people who have recently modified their submission."
The last-modified-date effect is even more important, because it can be used to support whatever the latest fad is, without needing to adapt data or arguments to the specifics of that fad.
Not paying attention on the train, even in 2025 girliepop-influencer-Instragram-latte-art New York, is not the smartest. You're probably better off during rush hour, but being aware of your surroundings is never a bad idea, even in "safe" New York.
reply