Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In deep ML, people are pretty familiar with the bitter lesson and don't want to waste time on this.


The bitter lesson is about not trying to encode impossible-to-formalize conceptual knowledge, not avoiding data efficiency and the need to scale the model up to ever higher parameter counts.

If we followed this logic, we'd be training LLMs on character-level UTF-32 and just letting it figure everything out by itself, while needing two orders of magnitude bigger contexts and parameter counts.


Converting from RGB to YUV is absolutely subject to the bitter lesson because it is trying to generalize from a representation that we have seen works for some classical methods and hard code that knowledge in to the AI which could easily learn (and will anyways) a more useful representation for itself.

> LLMs on character-level UTF-32 and just letting it figure everything out by itself, while needing two orders of magnitude bigger contexts and parameter counts.

This was tried extensively and honestly it is probably still too early to proclaim the demise of this approach. It's also completely different - you're conflating a representation that literally changes the number of forward passes you have to do (ie. the amount of computation - what the bitter lesson is about) vs. one that (at most) would just require stacking on a few layers or so.

A better example for your point (imo) would be audio recognition, where we pre-transform from wave amplitudes into log mel spectrogram for ingestion by the model. I think this will ultimately fall to the bitter lesson as well though.

Also a key difference is that you are proposing going from methods that already work to try to inject more classical knowledge into them. It is oftentimes the case that you'll have an intermediary fusion between deep + classical, but not if you already have working fully deep methods.


Heck why even go that far? Given how much texts we have in scanned books, just feed it scans of the books and let it dedicate a bunch of layers to learning OCR.


Or given the number of unscanned books, even just give it the controls for a book scanner, the books and probably some robot arms. Then let it figure out the scanning first in some layers. Shouldn't be that hard.


RGB->YUV is literally an affine transform, of course it falls to the bitter lesson.


Does it? Because I'm not sure the model has any intrinsic incentive to learn to follow how human perception works.


Right... but I don't see how that means that it doesn't fall to the bitter lesson.

The bitter lesson is not saying that the model will always relearn the same representation as the one that has been useful to humans in the past, merely that the model will learn a better representation for the task at hand than the one hand-coded by humans.

If the model could easily learn the representation useful to humans, then it will fall to the bitter lesson because at minimum the model could easily follow our path (it's just an affine transformation to learn) and more probably will learn very different (& better) representations for itself.


This will absolutely be the case N doublings of Moore's law from here. Tokens are information loss.


Information loss, or the result of useful computation? VAEs exist after all.


LLMs can't reason about spelling, e.g. asking for a sentence which contains no letter "a"; and can also struggle with rhyming, etc. The most obvious explanation is that they never 'see' the underlying letters/spelling, only tokens.


Keep in mind Moore's law is coming to its end.


Been hearing that for half my adult life. People were 100% sure multicore in 2005 meant manufacturers were officially signalling it and it was time to invest in auto-parallelizable code.

I don't think it's wrong, but looking at it through a child's eyes, we do keep finding ways to do things we couldn't a couple years ago: an open mind on hardware and more focus on software is continuing deep innovation cycles


There are limits to growth[1]. God-like tech utopia isn't and won't be real.

1: https://www.clubofrome.org/publication/the-limits-to-growth/


Leaving aside that we're still far from hitting the limits to growth outlined in that book, and that we can exceed those limits to growth by expanding outside of Earth, what does a book about physical limitations on agriculture and industry have to do with limitations on computing efficiency? There is of course some fundamental limit to computing efficiency, but for all we know we could be many orders of magnitude away from hitting it.


The original study has been studied again and it has proven true so far. An analysis: https://medium.com/@CollapseSurvival/overshoot-why-its-alrea... Humanity likely won’t ever be able to permanently settle outside earth.


Do I need to repeat myself? What do limits on agriculture have to do with limits on computing?


^ equivalent of ideological salesman ringing my doorbell. Absolutely nothing to do with anything I said.


We've clearly fallen behind the exponential curve on clock speed. But the great thing is we can parallelize transformers, so it's not as big of a deal.


If you want to play by the bitter lesson, why don't you just feed the raw JPEG bits into your neural network?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: