Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Q-Transformer (qtransformer.github.io)
238 points by jonbaer on Nov 30, 2023 | hide | past | favorite | 63 comments


Being implemented as we speak, by the always impressive LucidRains [1]

[1]: https://github.com/lucidrains/q-transformer


It's kinda funny how Lucidrains came back to make numerous commits to this repo following Q*.


Apparently this guy like a bunch of others like https://github.com/ggerganov/ggml are implementing transformers from papers for people that want them. Pretty cool.


The additional link is indeed very useful, however, with one exception, it is limited to inference, ie code for evaluating a model using somebody else’s trained model weights. The original link has code for training (which can also be used for inference, of course.) Inference is super important, of course, so it makes a lot of sense to put effort to make it work well on all kinds of hardware; the papers typically focus on training and only then evaluating/validating a new model.


I’m glad to see people in the open source community taking the initiative to implement these tools for the general public. It’s one thing to understand the high-level concepts in a paper, but another entirely to implement them efficiently. Some people just really have a knack for reading academic papers and implementing those ideas in software.


Lucidrains also created the much-missed EpicMafia, which still doesn’t have a good replacement after shutting down. Exceptionally skilled person!


This came out a while ago, but I can see why it's been reposted. It's another approach to searching the solution space with rewards and transformer language models. Like XoT and others as I pointed out in my last comment: https://news.ycombinator.com/item?id=38471388


It’s probably being reposted bc people think it has to do with Q*


It's clear we're searching for the god algorithm of AI, just like physicists are searching for theory of everything. Are transformers the answer though?


It seems more likely that there will be multiple avenues to AGI, all with their strengths and weaknesses. But perhaps the "God AI" will be a multifaceted model composed of many different models acting in unison.


I could see something like the "modularity of mind" model of human consciousness, where multiple approaches are working on "subconscious" solutions to a given problem in parallel, with a top layer deciding which is appropriate at the moment.

My human brain doesn't use the same algorithm for learning to play a song on a piano as learning to play a new board game. I'm not an AI person, but it seems reasonable to imagine we'd have different "modules" to apply as needed.

AlphaGo probably sucks at conversation. ChatGPT can't play Go. The part of my brain writing this couldn't throw a baseball. The physics engine that lets me throw a baseball couldn't write this. Is there a reason we'd want or need one specific AI approach to be universally applicable?


The Bicameral Mind


Exactly, just as the true god has seven aspects [0].

[0] https://en.wikipedia.org/wiki/Themes_in_A_Song_of_Ice_and_Fi...


On the other hand, most of the original gods were parts of polytheistic pantheons. Maybe a bunch of models that represent identities and biases could be more useful, they could argue amongst themselves, presenting a more full point of view, users could become familiar with the particular perspectives.


Most likely there will be more then one specimen claiming to be AGI. From different groups. And they will be hard to compare.


There's really no God algorithm needed, just something good enough to assist with research of the next tier of hardware, energy, and code for AI.


Algorithm is too simple, but yes - what does the system look like

I think the transformers architecture, or something very similar with eventually-on-policy time series forecasting in a markov decision process, is the right answer actually and was what I have been trying to make progress on for a long time[1].

[1]https://kemendo.com/research/streaminference.html


That plus some Monte Carlo search as in AlphaZero makes a very strong (although computationally heavy) contender for 'alive' AI.


After this past few years of progress on multivariate stream forecasting I really think it’s going to be different implementations of the same basic streaming data forecasting architecture


We're searching for an efficient algorithm that leads to AGI. Given sufficient time and compute, I'm sure that we could get there with existing stuff, by accident, and we wouldn't realize it before moving on to the next thing... and there'd be a poor orphan AGI, lost in a Git repo, waiting for runtime.


"Given sufficient time and compute" covers up a lot, though. The ultimate God AGI that would be created through that sort of process would take the form of a large room filled with a whole lot of monkeys and typewriters.


No, it’s a much simpler architecture. Anything that evolved can’t require a really complicated niche structure.

The difficulty is the scale. Every synapse of a neuron is effectively a neuron itself, and every synapse acts on the synapses around it. So before you’ve even got to the neuron as a whole you’ve already got the equivalent of thousands of neurons and logic gates. Then the final result gets passed on to thousands more neurons.

I don’t know how you would recreate such complexity in programming. It’s not just the scale, it’s the flexibility of the structure.


There's this nice new paper ("Simplifying Transformer Blocks" https://arxiv.org/abs/2311.01906) about getting rid of all the unnecessary parts of a transformer. On their dataset, they manage to get rid of the layernorm, the skip connection, the value matrix, and the projection matrix.


As it stands, each token processed by transformer requires a constant amount of computation and energy. For an AGI system, this would imply the ability to solve problems of any complexity with a fixed amount of energy. But if this were true, it would essentially mean that P equals NP, a major theoretical breakthrough in computational complexity theory. IMO we are still missing something.


If they were we'd stop searching already, no?


Theory of everything is supposed to be beautiful.

Theory of AI is just going to be some weird network with a shit-ton of compute power, where the latter is more important to the outcome than the former.


No, not at all. Just looking for the next step of many, stacking S-curves atop each other.


33% success rate for opening a drawer?


That's a great start. Opening a drawer is pretty hard problem.


Correct- as a reference for OC this is known as Moravec's Paradox (https://en.wikipedia.org/wiki/Moravec%27s_paradox)


At least to me the key point of the article

>Encoded in the large, highly evolved sensory and motor portions of the human brain is a billion years of experience about the nature of the world and how to survive in it. The deliberate process we call reasoning is, I believe, the thinnest veneer of human thought, effective only because it is supported by this much older and much more powerful, though usually unconscious, sensorimotor knowledge. We are all prodigious olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.


I'm not convinced. There's a lot of creative thinking going on in evolutionary anthropology, but the proofs are meager.


Indeed: we have machines doing novel work in organic synthesis but none yet to empty a dishwasher.



Could you point me to the timestamp where it empties a dishwasher?


interacting with environment is the next frontier. See AI driving...


The other robot project posted here yesterday opened drawers at around 80% though https://news.ycombinator.com/item?id=38453047 And they did it in many different homes.


My robot can open drawers at near 100%, but it takes the form of a stick of dynamite.

Still working on closing the drawers afterwards, though...


Perhaps you could automate rebuilding the drawers with some IKEA/TaskRabbit APIs.


They should probably optimize for minimal friction force. If you're pulling the handle not entirely in the direction of movement, you're not doing it right.


How does this stack up to diffusion policy learning?

https://diffusion-policy.cs.columbia.edu/

Looking at the figures and videos it seems... worse? Sort of surprised they didn't compare it but I guess they're trying to limit the discussion to purely reinforcement learning methods.

EDIT: Ah, I see this is an older result and was published concurrently to the diffusion policy paper, so it's likely the authors didn't know about it in time to add the extra comparisons.


Just to be clear the "scalable" part here is assuming the dataset of human demonstrations is available it can in a closed loop learn better than before? In other words, am I right in understanding that the mapping from Abstract actions to concrete steps in the State space is provided at a low "enough" cost?


Like in Workflow Guided explorations like: https://arxiv.org/pdf/1802.08802.pdf, for example, if you said "Forward the email" the Bot needs to sample from a set of actions which in this case would be DOM elements and the JS API to call to click the right button to forward the email. This would be a harder/interesting to test out. Can it learn to forward an email on firefox desktop browser and then do it on an iphone?

It feels like opening a drawer once learn't, the bot has a (world?) model of what all drawers might look like, so it can open different drawers in any world. But this might not generalize to web interfaces and more specifically, how to do those actions on those interfaces?

Not to take away from what this paper's scope and achievements.


How flies get onboard with the latest trend these days.

First it was the LK-99 hype, now it is Q. Chasing hype like flies to a light bulb.


I really wish I could understand this. I can half way get it loaded into my brain and then...


Could one of the roomba companies now make those things recognize the objects and avoid them, instead of dead reckoning or RF based obstacle avoidance?


Worth noting that this came from DeepMind in Sep-Oct of this year, so it's not related to OpenAI's "unfortunate" leak [0] of Q*.

[0] https://www.theverge.com/2023/11/29/23982046/sam-altman-inte...


I wouldn't go that far. Immediately after Q* leaked, Jimmy Apples pointed to this thread on Twitter : https://twitter.com/polynoamial/status/1676971508911198209

It's Noam Brown's (former Deep Mind) "I'm joining OpenAI" thread from Back in June, and he talks about how he wants to bring some of the work from AlphaGoZero etc into OpenAI's AGI architecture.

It looks like this Q-transformer work shares a lot of the same lineage of what Q* is supposed to be. This is Google's attempt at mushing Q-learning into transformer land, which is probably what Q* is as well. It's not the same thing, for sure, but they are at least sibling bodies of work.


Also, it is not Amazon Q, which was presented yesterday:

https://aws.amazon.com/de/q/


Also it's not Q from Star Trek, or Q stars which are an exotic form of matter immediately preceding the singularity of a black hole.


Also, it's not Android Q.


Or James Bond’s Q.


or Q.


To save you guys some time, this is a 2 month old paper and not made by OpenAI. It could be related to Q* (I don't know enough to determine that) but it's not OpenAI's Q*.


Operation reclaim the letter Q within a reasonable objective reality.


What's the letter?

What's the letter?

What's the letter?

The letter of the day is...

Q!


James Bond secret weapon ?


Is Q the new hottest letter right now?


To make this strand of discussion slightly more interesting, I hereby present you a page [1] with the history of Q-tips. Apparently, the "Q" stands for Quality :)

[1] https://www.qtips.com/about/


This does seem pretty similar to Q-learning. So yes, things-based-on-Qs are hot right now.


Q is nothing without the *


You're the Q to my *!


I love q so much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: