In deep ML, people are pretty familiar with the bitter lesson and don't want to ...

Llamamoe · on July 13, 2023

The bitter lesson is about not trying to encode impossible-to-formalize conceptual knowledge, not avoiding data efficiency and the need to scale the model up to ever higher parameter counts.

If we followed this logic, we'd be training LLMs on character-level UTF-32 and just letting it figure everything out by itself, while needing two orders of magnitude bigger contexts and parameter counts.

whimsicalism · on July 13, 2023

Converting from RGB to YUV is absolutely subject to the bitter lesson because it is trying to generalize from a representation that we have seen works for some classical methods and hard code that knowledge in to the AI which could easily learn (and will anyways) a more useful representation for itself.

> LLMs on character-level UTF-32 and just letting it figure everything out by itself, while needing two orders of magnitude bigger contexts and parameter counts.

This was tried extensively and honestly it is probably still too early to proclaim the demise of this approach. It's also completely different - you're conflating a representation that literally changes the number of forward passes you have to do (ie. the amount of computation - what the bitter lesson is about) vs. one that (at most) would just require stacking on a few layers or so.

A better example for your point (imo) would be audio recognition, where we pre-transform from wave amplitudes into log mel spectrogram for ingestion by the model. I think this will ultimately fall to the bitter lesson as well though.

Also a key difference is that you are proposing going from methods that already work to try to inject more classical knowledge into them. It is oftentimes the case that you'll have an intermediary fusion between deep + classical, but not if you already have working fully deep methods.

ummonk · on July 14, 2023

Heck why even go that far? Given how much texts we have in scanned books, just feed it scans of the books and let it dedicate a bunch of layers to learning OCR.

philipphutterer · on July 14, 2023

Or given the number of unscanned books, even just give it the controls for a book scanner, the books and probably some robot arms. Then let it figure out the scanning first in some layers. Shouldn't be that hard.

whimsicalism · on July 14, 2023

RGB->YUV is literally an affine transform, of course it falls to the bitter lesson.

Llamamoe · on July 16, 2023

Does it? Because I'm not sure the model has any intrinsic incentive to learn to follow how human perception works.

whimsicalism · on July 17, 2023

Right... but I don't see how that means that it doesn't fall to the bitter lesson.

The bitter lesson is not saying that the model will always relearn the same representation as the one that has been useful to humans in the past, merely that the model will learn a better representation for the task at hand than the one hand-coded by humans.

If the model could easily learn the representation useful to humans, then it will fall to the bitter lesson because at minimum the model could easily follow our path (it's just an affine transformation to learn) and more probably will learn very different (& better) representations for itself.

refulgentis · on July 13, 2023

This will absolutely be the case N doublings of Moore's law from here. Tokens are information loss.

d110af5ccf · on July 14, 2023

Information loss, or the result of useful computation? VAEs exist after all.

chriswarbo · on July 14, 2023

LLMs can't reason about spelling, e.g. asking for a sentence which contains no letter "a"; and can also struggle with rhyming, etc. The most obvious explanation is that they never 'see' the underlying letters/spelling, only tokens.

Aerbil313 · on July 14, 2023

Keep in mind Moore's law is coming to its end.

refulgentis · on July 14, 2023

Been hearing that for half my adult life. People were 100% sure multicore in 2005 meant manufacturers were officially signalling it and it was time to invest in auto-parallelizable code.

I don't think it's wrong, but looking at it through a child's eyes, we do keep finding ways to do things we couldn't a couple years ago: an open mind on hardware and more focus on software is continuing deep innovation cycles

Aerbil313 · on July 14, 2023

There are limits to growth[1]. God-like tech utopia isn't and won't be real.

1: https://www.clubofrome.org/publication/the-limits-to-growth/

ummonk · on July 15, 2023

Leaving aside that we're still far from hitting the limits to growth outlined in that book, and that we can exceed those limits to growth by expanding outside of Earth, what does a book about physical limitations on agriculture and industry have to do with limitations on computing efficiency? There is of course some fundamental limit to computing efficiency, but for all we know we could be many orders of magnitude away from hitting it.

Aerbil313 · on July 16, 2023

The original study has been studied again and it has proven true so far. An analysis: https://medium.com/@CollapseSurvival/overshoot-why-its-alrea... Humanity likely won’t ever be able to permanently settle outside earth.

ummonk · on July 21, 2023

Do I need to repeat myself? What do limits on agriculture have to do with limits on computing?

refulgentis · on July 15, 2023

^ equivalent of ideological salesman ringing my doorbell. Absolutely nothing to do with anything I said.

whimsicalism · on July 14, 2023

We've clearly fallen behind the exponential curve on clock speed. But the great thing is we can parallelize transformers, so it's not as big of a deal.

eru · on July 14, 2023

If you want to play by the bitter lesson, why don't you just feed the raw JPEG bits into your neural network?