Regardless, published papers aren't an authoritative source of truth. Just a note to your friends "hey I did some cool stuff I want to tell you about!"
Sure it's slightly more reviewed than a GitHub repo, but it's not an end all be all.
> The vast majority of Linux kernel performance improvement patches probably have way less of a real world impact than this.
unlikely given that the number they are multiplying by every improvement is far higher than "times jq is run in some pipeline". Even 0.1% improvement in kernel is probably far far higher impact than this
It's not just that - anything working with analog signals benefits hugely from not living inside the complete EM interference nightmare of the computer case.
I see AI pass the turning test all the time, since humans are constantly falsely being accused of being an AI.
It doesn't mean that AI got good, just that humans are thinking other humans are AI, which is a form of passing the test.
The adversarial version with humans involved is actually easier to pass because of this - because real actual humans wouldn't pass your non adversarial version.
I've seen a fair number of cases where someone swears up and down not to be using AI to generate responses, but there's no good reason to believe it (except perhaps specifically for the messages where that claim is made).
This includes times that someone basically disappeared from e.g. Stack Overflow at some point before the release of ChatGPT, having written a bunch of posts that barely demonstrate functional literacy or comprehension of English; and then came back afterward posting long messages with impeccable grammar and spelling in textbook "LLM house style".
It's not just patterns like "not just X, but Y", but also deeper patterns and a kind of narrative cadence. Sure it's also mimicking something real, but usually it's a mismatch between the insightfulness of the content and the quality of the delivery. It feels like chewing on empty calories, it's missing the intentionality and the edge of being human. I guess you need to read a lot of LLM output to get a feel for this beyond the surface level pattern matching.
I wonder whether AI house style is the result of the people training it having no sense of writing style or some kind of technical limitation.
With AI, there is no sense of the level of emphasis matching the meaning of the text, or a long-range dramatic arc - everything is a revelation, like somebody who can only speak in TED talks. Everything is extremely earnest, very important, and presented using the same five flashy language hacks.
It was a joke. But also my use of not x but y is not rhetorical but declarative. The whole point is that what many of us are talking about is not simply these surface patterns but how they are used and how the narrative rhythm of the sentences and paragraphs go.
I don't think there's any definitive way to check, but for me one of the biggest tells that a long piece of writing was LLM generated is that it will hardly say anything given how many words are in it.
(well that and the "it's not just x, it's y!" pattern they seem to love)
But it's also often a shoehorned artificial contrast that doesn't really make sense. The Y is often not such a different thing from the X that would make it worthy an actual "not just X but Y" claim. Or the Y is a vague subjective term, or some kind of fancy-word-dropping. It's strong styling but little content, similar to politician CYA talk. I don't think it's necessarily a tech limitation, more of an effect of deliberate post-training to be middle-of-the-road nonoffensive and nonopinionated.
In one study, GPT-4.5 was judged to be human 73% of the time, which means that the actual human was judged to be human only 27% of the time. More human than human, as Tyrell would say.
Edit: folks, the standard Turing test involves a computer and a human, and then a judge communicating with both and giving a verdict about which one is the human. The percentages for the two entities being judged will add up to exactly 100%. That's how this test was conducted. Please don't assume I'm a moron.
The implication would be that GPT-4.5 was not judged to be human 27% of the time. You can't determine how often humans were judged correctly as humans from that data point.
No one should have nuclear weapons, we aught to have robust policy, institutions, and vigilance to prevent their proliferation and use.
Computerized vehicles aught to be strictly regulated in terms of how computers may affect the physical operation of the car, such that a reasonable standard of safety can be ensured outside the usual risk one takes when hopping in a motor vehicle. The fact that a hacker can possibly kill people by rooting an infotainment system is a symptom of the general disregard for security in design, and we continue to ignore it for engineering expediency.
But if they read your paper enough that they invited you to a talk, that probably means they were far enough along to independently inventing it they were going to do so anyway, and wanted to chat with someone who was also doing the thing they were already doing. Good ideas tend to reveal themselves to anyone who is aware of the problem.
To be clear, I am not claiming they stole an idea. They have made significant independent research. However, a specific part regarding the treatment of rotation with bias correction relates to prior work, and it would be appropriate to have that recognized.
Microeconometrics tends to be quite rigorous and easy to validate.
They won't hold up to physics levels of rigor, of course - probably a bit more at the medical studies level of rigor.
David Card, Gary Becker, McFadden, etc.
Rigor is also... there's something about letting perfect be the enemy of the good.
If noone will apply math unless you can 100% reliably reproduce controlled experiments in a lab, the only thing left is people just talking about dialectics.
The challenge is how to get as much rigor as possible.
For instance, David Card saw New Jersey increase minimum wage. You generally can't truly conduct large-scale controlled social experiments, but he saw this as interesting.
He looked at the NJ/PA area around Philadelphia as a somewhat unified labor market, but half of it just had its minimum wage increased - which he looked at to study as a "natural" experiment, with PA as the control group and NJ as the experimental group, to investigate what happened to the labor market when the minimum wage increased. Having a major metro area split down the middle allowed for a lot of other concerns to be factored out, since the only difference was what side of the river you happened to be on.
He had lots of other studies looking at things like that, trying to find ways to get controlled-experiment like behavior where one can't necessarily do a true controlled experiment, but trying to get as close as possible, to be as rigorous as is possible.
Is that as ideal as a laboratory experiment? Hell no. But it's way closer than just arguing dialectics.
Sure it's slightly more reviewed than a GitHub repo, but it's not an end all be all.
reply