No it’s not.
It is if you redefine AGI to mean the thing that’s already here, although to get away with that it helps if at the same time you overestimate what LLMs are capable of doing.
On the whole, I think I prefer “we figured out how to do this, so it’s not actually AI” over crappy thinkpieces like this.
lol calling it 1% of the way to AGI is probably delusionally generous. It has effectively nothing in common with it.
I am reminded of the saying: climbing a mountain is not 1% of the way to outer space
Oh, for fuck’s sake… no. It isn’t. And I find myself pondering whether or not the article’s authors are themselves sapient.
I kind of regret learning ML sometimes. Being one of the 10 people per km2 who understand how it works is so annoying. It’s just a fancy mirror ffs, stop making weird faces at it you baboons!
The best part is it’s not even that complicated of a thing conceptually. Like you don’t need to study it to kind of understand the idea and some of its limitations.
Do you really understand how it works? What would you call a neural network with mirror neurons primed to react to certain stimuli patterns as the network gets trained… a mirror, or a baboon?
ANNs don’t have “mirror neurons” lol
What do you call a neuron “that reacts both when a particular action is performed and when it is only observed”? Current LLMs are made out exclusively of mirror neurons, since their output (what they perform) is the same action as their input (what they observe).
I can’t even parse what you mean when you say their input is the same as their output, that would imply they don’t transform their input, which would defeat their purpose. This is nonsense.
Lol nice way to say you don’t understand shit about it :D
I’m afraid that’s not correct, but clearly this discussion is over.
deleted by creator
My standard for agi is that its able to do a low-level human work from home job.
If it needs me to pre-chew and check every single step then it can still be a smart tool but its definitely not intelligent.
If it needs me to pre-chew and check every single step then it can still be a smart tool but its definitely not intelligent.
If this is the standard for AGI, I’m not 100% convinced that every human meets the standard for intelligence either. Anyone who’s ever done team projects will have experience of someone who cannot complete a simple task without extensive pre-chewing and checking in on every step.
It’s going to end up like that thing with the bear-proof bins, isn’t it? The overlap between the smartest LLMs and dumbest humans is going to be bigger than one might think, even if the LLM never achieves true general intelligence or self-awareness. Bears aren’t sapient either, but it doesn’t stop them being more intelligent than some tourists.
Thats why i said low level, and it does not need to be perfect either. Not all my colleques are on the bright side but all of them are remaining employed without someone else sitting next to them for every single minute. Also a huge thing for human intelligence is in personal strengths. They may Be bad for task A but when it comes to taking in an emphatic way or analyzing sports there suddenly Pro. This ability is what defines “General” intelligence versus narrow Intelligence which is supposed to do one job only.
I work with gpt4 for my Job and while It is very useful the moment you poke it with deeper questions it becomes clear it absolutely no idea what its doing or what is going on. You cannot trust anything it says and often its a frustrating experience rollong the regenerate button till it gets valid answer.
deleted by creator
In such context intelligent is much more relative. Same thing with animals. There is also big difference with ai not having a proper body yet.
There are a number of low level jobs that can be done by both children and animals for instance a service dog. They are both capable intelligent creatures.
In the past children started to work in a factory the moment they could stand on their legs.
It would be near impossible for a 2yo or a dog to do a work from home assignment but for AI this should by far be an advantageous situation because its trained on computer data and does not need to spend so much of its “brain” learning to move and go potty.
deleted by creator
It is relevant because “intelligence” is a collection of multiple things. The first kinds of intelligence a living creature learns are all fysical. If you instinctively pull your hand away when it touched fire. Thats already a kind of intelligence. Learning to understand and act on bodily needs to survive is bigger example.
The first steps towards emotional intelligence starts with the physical comfort of the womb and hugs received as a baby.
Every sentient creature we have ever known starts as autonomous body. A child without a body does not exists.
How are small children different from smart animals?
It takes humans a while to develop our thinking goo, before that, we’re barely able to survive.
(Wow. That’s really a bad article. And even though the author managed to ramble on for quite some pages, they somehow completely failed to address the interesting and well discussed arguments.)
[Edit: I disagree -strongly- with the article]
We’ve discussed this in June 2022 after the Google engineer Blake Lemoine claimed his company’s artificial intelligence chatbot LaMDA was a self-aware person. We’ve discussed both intelligence and conciousness.
And my -personal- impression is: If you use ChatGPT for the first time, it blows you away. It’s been a ‘living in the future’ moment for me. And I see how you’d write an excited article about it. But once you used it for a few days, you’ll see every 6th grade teacher can distinguish if homework assignments were done by a sentient being or an LLM. And ChatGPT isn’t really useful for too many tasks. Drafting things, coming up with creative ideas or giving something the final touch, yes. But defenitely limited and not something ‘general’. I’d say it does some of my tasks so badly, it’s going to be years before we can talk about ‘general’ intelligence.
deleted by creator
Sorry, It was probably more me having a bad day. I was a bit grumpy that day, because I didn’t have that much sleep.
I’m seeing lots of …let’s say… uninformed articles about AI. People usually anthropomorphise language models. (Because they do the thing they’re supposed to do very well. That is: write text that sounds like text.) People bring in other (unrelated) concepts. But generally, evidence doesn’t support their claims. Like with the ‘conciousness’ in that case with Lemoine, last year. Maybe I get annoyed too easily. But my claim is, it is very important not to spread confusion about AI.
I didn’t see the article was written by two high profile AI researchers. I’m going to bookmark it because it has lots of good references to papers and articles in it.
But I have to disagree on almost every conclusion in it:
- They begin with claiming fixing (all) current flaws like hallucinations would mean superintelligence. Without backing it up at all.
- The next paragraph is titled like they’d now define AGI, but they just broaden the tasks narrow AI can do. I’d agree. It’s impressive what can be done with current AI tech. But you’d need to show me the distinguishing factors and prove AI is past that. The way they do it just makes it a wide variant of narrow AI. (And I’d argue it’s not that wide at all, compared to the things a human does every day.)
- I think their example showcasing emergent abilities of ML is flawed. When doing arithmatics, there is a sharp threshold where you don’t just memorize numbers and the result but get a grasp of numbers and how the decimal system works and you understand the concept of addition and push past memorizing multiplication tables. I’d argue it’s not gradual like they claim. I get that this couldn’t be backed up by studying current models. But it could be well the case that they’re still so small or you’d need to teach them maths in a more effective way than just feeding them the words of every book on earth and Wikipedia.
- The story on AI history is fascinating. How people first tried to build AI with formal reasoning and semantic networks, constructed vast connected knowledge databases, got through two "AI winter"s and nowadays we just dump the internet into an LLM and that’s the approach that works.
What I would like to have been part of that article:
- How can we make sure we’re not antropomorphizing, but it’s really the thing itself that has general intelligence?
- What are some quantivative and qualitative measurements for AGI? How does the current state of the art AI perform on these metrics? They address that in the section “Metrics”. But they just criticise current metrics and say it passed the bar exam etc. What are the implications? What are some proper metrics to back up a claim like theirs? They just did away with the metrics. What are they basing their conclusion on, then?
- If defining general intelligence is difficult: What is the lower bound for AGI? What is concidered the upper bound at which we’re sure it’s AGI?
- What about the lack of a state of mind in transformer models? It is trained and then it is the way it is. Until OpenAI improves it a few months layer and incorporates new information into the next iteration. But it’s unable to transition into a new state while running. It get’s some tokens as input, calculates and then does output. No internal state that could save something or change. This is one of the main points ruling out conciousness. But it also limts the tasks it can do at all. Doesn’t it? It now needs prior knowledge or to fit every bit of information into the context window. Or retrieve it somehow, for example with a vector database. The authors mention “in-context-learning” early on. But it’s not clear if that does it for every task and to what scale. Without more information or new scientific advancements, I doubt it. Most importantly:
- It can’t learn anything while running. It can’t ‘remember’. This is a requirement per definition of AGI. Aren’t intelligent entities supposed to be able to learn?
- Are there tasks that can’t be done by transformer models? One example I read about is: They are feed-forward models. There is nothing regressive in them. The example task is you want to write a joke. Now you first need to come up with a pun and then write the build-up to the pun. But once you tell it, you need to tell the build-up first, then the pun. A transformer model starts writing at the beginning and then comes up with the pun afterwards once it gets to that point in the text. Are there many real-world tasks for intelligence that inherently require you to think the other way round / backwards? Can you map them so you can tackle them with forwards-thinking? If not, transformer models are unable to do that task. Hence they’re not AGI. But still there are tasks similar to the joke example that the LLMs obviously do better than you’d expect.
- Are we talking about LLMs or agents? A LLM embedded in a larger project can do more. For example have the text fed back in. Do reasoning and then give a final answer. Can store/remember information in a vector database. Be instructed to fact check it’s output and rephrase it after providing its own critique. But from the article it’s completely unclear what they’re talking about. It seems like they only refer to a plain LLM like ChatGPT.
And my personal experience doesn’t align with the premise either. The article wants to tell me we’re already at AGI. I’ve fooled around with ChatGPT and had lots of fun with the smaller Llama models at home. But I completely fail to have them do really useful tasks from my every-day life. It does constrained and narrowed down tasks like drafting an email or text. Or doing the final touches. Exactly like I’d expect from narrow AI. And I always need to intervene and give it the correct direction. It’s my intelligence an me guiding ChatGPT that’s making the result usable. And still it gets facts wrong often while wording them in a way that sounds good. I sometimes see prople use summary bots here. Or use an LLM to summarize a paper for a Lemmy post. More often than not, the result is riddled with inaccuracies and false information. Like someone who didn’t understand the paper but had to hand in something for their assignment. That’s why I don’t understand the conclusion of the article. I don’t see AGI around me.
I really don’t like confusion being spread about AI. I think it is going to have a large impact on our lives. But people need to know the facts. Currently some people fear about their jobs, some are afraid of an impeding doom… the robot apocalypse. Other people hype it to quite some levels and investors eagerly throw hundreds of millions of dollars at anything that has ‘AI’ in its name. And yet other people aren’t aware of the limitations and false information they spread by using it as a tool. I don’t think this is healthy.
To end on a positive note: Current LLMs are very useful and I’m glad we have them. I can make them do useful stuff. But I need to constrain them and have them work on a well defined and specific task to make it useful. Exactly like I’d expect it from narrow AI. Emergent abilities are a thing. A LLM isn’t just autocomplete text. There are concepts and models of real-world facts inside. I think researchers will tackle issues like the ‘hallucinations’ and make AI way smarter and more useful. Some people predict AGI to be in reach within the next decade or so.
More references:
- Wikipedia: AGI
- Paper: Sparks of Artificial General Intelligence: Early experiments with GPT-4
- Article: Artificial general intelligence: Are we close, and does it even make sense to try?
- Paper: Deanthropomorphising NLP: Can a Language Model Be Conscious? (Especially section 7)
- YouTube: What does “Intelligence” mean anyway?
Have you seen this paper?
Abstract:
While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method.
Graphs:
I think we can’t really get the most out of current LLMs because of how much they cost to run. Once we can get speeds up and costs down, they’ll be able to do more impressive things.
deleted by creator
“You can use them for all kinds of tasks” - so would you say they’re generally intelligent? As in they aren’t an expert system?
deleted by creator
Calling the over glorified chatbots and LLMs like GPT or Claude AGI would be like me calling a preschool finger painting a master class work of art, from my understanding of them. Though, I can’t say I’m anywhere near an expert, so definitely take what I say with a major grain of salt.
What these AI chatbots and LLMs can do is sometimes impressive, but that’s all I can say about them. Intelligence is definitely not their strong suit when half of the time you’ll ask for a summary of a well known and loved TV show only for it to just make up anything that sounds right.
LLMs are not chatbots, they’re models. ChatGPT/Claude/Bard are chatbots which use LLMs as part of their implementation. I would argue in favor of the article because, while they aren’t particularly intelligent, they are general-purpose and exhibit some level of intelligence and thus qualify as “general intelligence”. Compare this to the opposite, an expert system like a chess computer. You can’t even begin to ask a chess computer to explain what a SQL statement does, the question doesn’t even make sense. But LLMs are capable of being applied to virtually any task which can be transcribed. Even if they aren’t particularly good, compared to GPT-2 which read more like a markov chain they at least attempt to complete the task, and are often correct.
LLMs are capable of being applied to virtually any task which can be transcribed
Where “transcribed” means using any set of tokens, be it extracted from human written languages, emojis, pieces of images, audio elements, spatial positions, or any other thing in existence that can be divided and represented by tokens.
PS: actually… why “in existence”? Why not throw in some “customizable tokens” into an LLM, for it to come up with whatever meaning it fancies for them?
There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven’t seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.
Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.
(deleted original because I got token embeddings and the embedding dimensions mixed up, essentially assuming a new token would use the “extreme option”).
There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven’t seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.
Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.
Imo this article is mainly playing semantics. Even if they are right and this is seen as “the beginning” one day, the current LLMs only perform well in very narrow tasks, everywhere else they are sub-par to humans. Unfortunately, that will not stop many companies from using them for shit they cannot handle, just to have this blow into the face of society later. That would be the much more interesting talking point; in which areas can we see companies jumping the gun and what will be the problems and dangers that arise from it?
The fact that they can perform at all in essentially any task means they’re general intelligences. For comparison, the opposite of a general intelligence is an expert system, like a chess computer. You can’t even begin to ask a chess computer to classify the valence of a tweet, the question doesn’t make sense.
I think people (including myself until reading the article) have confused AGI to mean “as smart as humans” or even as “artificial person”, neither of which is actually true when you break down the term. What is general intelligence if not applicability to a broad range of tasks?
lol. just lol. And maybe some more lolling for good measure
Actually a really interesting article which makes me rethink my position somewhat. I guess I’ve unintentionally been promoting LLMs as AGI since GPT-3.5 - the problem is just with our definitions and how loose they are. People hear “AGI” and assume it would look and act like an AI in a movie, but if we break down the phrase, what is general intelligence if not applicability to most domains?
This very moment I’m working on a library for creating “semantic functions”, which lets you easily use an LLM almost like a semantic processor. You say
await infer(f"List the names in this text: {text}")
and it just does it. What most of the hype has ignored with LLMs is that they are not chatbots. They are causal autoregressive models of the joint probabilities of how language evolves over time, which is to say they can be used to build chatbots, but that’s the first and least interesting application.So yeah, I guess it’s been AGI this whole time and I just didn’t realize it because they aren’t people, and I had assumed AGI implied personhood (which it doesn’t).
I’m not sure how the tech is progressing, but ChatGPT was completely dysfunctional as an expert system, if the AI field still cares about those. You can adapt the Chinese Room problem to whether a model actually has applicability outside of a particular domain (say, anything requiring guessing words on probabilities, or stabilising a robot).
Another problem is that probabilistic reasoning requires data. Just because a particular problem solving approach is very good at guessing words based on a huge amount of data from a generalist corpus, doesn’t mean it’s good at guessing in areas where data is poor. Could you comment on whether LLMs have good applicability as expert systems in, say, medicine? Especially obscure diseases, or heterogeneous neurological conditions (or both like in bipolar disorders and schizophrenia-related disorders)?
LLMs are not expert systems, unless you characterize them as expert systems in language which is fair enough. My point is that they’re applicable to a wide variety of tasks which makes them general intelligences, as opposed to an expert system which by definition can only do a handful of tasks.
If you wanted to use an LLM as an expert system (I guess in the sense of an “expert” in that task, rather than a system which literally can’t do anything else), I would say they currently struggle with that. Bare foundation models don’t seem to have the sort of self-awareness or metacognitive capabilities that would be required to restrain them to their given task, and arguably never will because they necessarily can only “think” on one “level”, which is the predicted text. To get that sort of ability you need cognitive architectures, of which chatbot implementations like ChatGPT are a very simple version of. If you want to learn more about what I mean, the most promising idea I’ve seen is the ACE framework. Frameworks like this can allow the system to automatically look up an obscure disease based on the embedded distance to a particular query, so even if you give it a disease which only appears in the literature after its training cut-off date, it knows this disease exists (and is a likely candidate) by virtue of it appearing in its prompt. Something like “You are an expert in diseases yadda yadda. The symptoms of the patient are x y z. This reminds you of these diseases: X (symptoms 1), Y (symptoms 2), etc. What is your diagnosis?” Then you could feed the answer of this question to a critical prompting, and repeat until it reports no issues with the diagnosis. You can even make it “learn” by using LoRA, or keep notes it writes to itself.
As for poorer data distributions, the magic of large language models (before which we just had “language models”) is that we’ve found that the larger we make them, and the more (high quality) data we feed them, the more intelligent and general they become. For instance, training them on multiple languages other than English somehow allows them to make more robust generalizations even just within English. There are a few papers I can recall which talk about a “phase transition” which happens during training where beforehand, the model seems to be literally memorizing its corpus, and afterwards (to anthropomorphize a bit) it suddenly “gets” it and that memorization is compressed into generalized understanding. This is why LLMs are applicable to more than just what they’ve been taught - you can eg give them rules to follow within the conversation which they’ve never seen before, and they are able to maintain that higher-order abstraction because of that rich generalization. This is also a major reason open source models, particularly quantizations and distillations, are so successful; the models they’re based on did the hard work of extracting higher-order semantic/geometric relations, and now making the model smaller has minimal impact on performance.
I wonder how many people here actually looked at the article. They’re arguing that ability to do things not specifically trained on is a more natural benchmark of the transition from traditional algorithm to intelligence than human-level performance. Honestly, it’s an interesting point; aliens would not be using human-level performance as a benchmark so it must be subjective to us.
I guess the point I have an issue with here is ‘ability to do things not specifically trained on’. LLMs are still doing just that, and often incorrectly - they basically just try to guess the next words based on a huge dataset they trained on. You can’t actually teach it anything new, or to put it better it can’t actually derive conclusions by itself and improve in such way - it is not actually intelligent, it’s just freakishly good at guessing.
Heck, sometimes someone comes to me and asks if some system can solve something they just thought of. Sometimes, albeit very rarely, it just works perfectly, no code changes required.
Not going to argue that my code is artificial intelligence, but huge AI models obviously has a higher odds of getting something random correct, just because it correlates.
You can’t actually teach it anything new, or to put it better it can’t actually derive conclusions by itself and improve in such way
That is true, at least after training. They don’t have any long-term memory. Short term you can teach them simple games, though.
Of course, this always goes into Chinese room territory. Is simply replicating intelligent behavior not enough to be equivalent to it? I like to remind people we’re just a chemical reaction ourselves, according to all our science.
It’s actually false. You can’t teach them long-term, but within the course of a conversation they can be taught new rules and behaviors. There’s dozens if not hundreds of papers on it.
Yeah, that’s what I said.
You’re right, apologies. Skimmed too hard
Yup, been there!
Comparing current LLMs to the ENIAC is thought-provoking; I understand the eagerness to extrapolate in that direction. That being said, I don’t think it will be linear or even logarithmic in progress. The current state of computing and technological advancement has become:
- Initial introduction or release
- Major hype and influx of greed money. <- we are here
- Failure to live up to the hype, resulting in the tech becoming a punchline and gobs of money lost
- Renaissance of the tech as its true potential is eventually realized, which doesn’t match the original hype but ends up very useful
- Iteration and improvement with no clear “done” or “achieved” milestone, it just becomes part of society
OMG! Let’s give LLMs human rights!
Suddenly corporations have to pay them a wage and medical care. Brilliant.
You may say that jokingly, but at some point if the tech keeps improving, that may be the only way the world continues to exist without destabilizing. OpenAI already says* that their end goal is to make the world powered by a form of universal basic income by having AI do most jobs. Having the AI be paid on task completion and distributing that accumulated wealth, removing a portion to cover maintenance, would be one method of doing so.
*that said, the words of a potential megacorporation aren’t really to be trusted, and the whole thing would have massive issues of “how do you distribute the money” and “what am I giving up in terms of personal safety and privacy”. Having to make an account with a specific AI company and providing all your governmental identification to receive that funds for example would be terrible.
More likely to conclude that humans don’t need human rights if they’re replaceable by LLMs