Picture from here.
I discussed a while back my opinion on the stampede to bring AI into the kitchen, living room, workplace, and bedroom that seems to be going on.
My opinion isn’t all that different though I’m now a bit more informed. I believed I referred to ChatGPT (and others) as a stochastic parrot—that is, a device that is able to reliably generate convincing language but does not understand the meaning of the language it is processing.
ChatGPT is not, exactly, such a device. It is a language prediction mechanism in the sense that given its broad training it can respond to a statement (or “prompt”) in a manner consistent with how its model of the language suggests is the desired response. This is quite a bit more sophisticated than a stochastic parrot. I am not saying large language models (LLMs) are conscious or have any understanding of the language they generate in the human sense of that word. I am saying they are extremely adept and sophisticated in mapping how language works.
This is not language at the sentence or word level—though ChatGPT maps that, too—but a much broader and deeper map.
Consider a regular visual map—like what we used to use on road trips when I was a child. It’s a piece of paper with coordinates on it and imagery. If you have the right x/y (east/west, north/south) coordinates, you can find a city. That’s a two-dimensional map. You can find a three-dimensional map of the solar system here. However, since you have to tell the map when you’re looking at the solar system, it is, in effect, a four-dimensional map. With those four coordinates, you can place a planet geographically.
However, to go forward with that metaphor, each of the planets influence the other and they’re all influenced by the sun. Light comes from the sun, for example. If we added that to the map, that might come in as a fifth dimension—a separate coordinate. You can see how this gets complex pretty quickly. While computers can work with multidimensional math easily, humans have more trouble. We want to actually understand such things—and we’re pretty much embedded in a four-dimensional brain.
Physicists came up with the idea of phase space. This is a space where all physical quantities are represented as different dimensions and a given state is described as a coordinate set in those dimensions. It’s not hard to see the utility of such a description—and it’s not much of a leap that something like that level of complexity would be needed to describe all the intricacies of language. That’s not all of what goes on inside of ChatGPT but a complex multi-dimensional representation of language is part of it.
There’s a good description how LLMs work here. I’m not going to reproduce that. Timothy Lee and Sean Trott, the authors, did a much better job than I ever could. So, you should go read it. Wikipedia has a broad discussion of LLMs here.
In place of what I described as phase space, LLMs use a word vector. Word vectors mathematically represent a word in all its dimensions. Like the coordinate systems I was describing before, and like phase space, you can determine something approximating distance between words by referring to them within a complex coordinate system. If you look at Washington, DC, and Baltimore, MD, and get their x/y coordinates, it’s not a difficult mathematical operation to determine that their close to one another. Similarly, if you look at Baltimore and Boston, MA, you can tell Boston is farther away from Baltimore than Washington. Seattle is farther away yet.
Using word vectors, you can develop maps of words that are close to one another in use. Or far away. Or build nets describing sets of words. Computers don’t balk at numbers of dimensions—LLMs use hundreds or thousands. GPT-3 uses over twelve thousand. I suspect the number of dimensions will only increase as more computing power is applied to training.
Ah, yes. Training.
One of the interesting things about LLMs is that they are not programmed per se. They analyze reams of material on their own—the training. Again, Lee and Trott discuss this in digestible detail. I’m not going to go into it here.
The next level of LLMs is the emergence of word predictions from the raw word vectors. Language has syntax and grammar—it is organized into meaningful packets. So is computer language code. When the LLM is trained on a language, that underlying structure is embedded in the dimensional relationships between the word vectors. LLM processing takes input and applies it to transformers. The input to a transformer results in a more meaningful result. That result can be further input into another transformer. The final result set is the predicted output from the input prompt. (Again, refer to Lee and Trott.)
It should come as no surprise to anyone that a powerful engine such as an LLM is trained on a great deal of language input that it would return sophisticated language as a result.
But there’s more to this—and this is the important bit.
GPT-3 recently (reported in July, 2023) that it had scored highly in tests of reasoning by analogy. Reasoning by analogy is something SAT and other college entry exams test. You’ve seen this. The test presents evidence (Given a letter sequence, abcde, eabcd, deabc, what is the rule? Or, what is the next entry in the sequence? That’s reasoning by analogy.)
Now, it’s interesting that GPT-3 failed in extracting the analogy from text but it did extract it from prompts. Is this evidence of analogic reasoning?
For my part—and the article I linked to above suggests the same—I think it indicates that reason by analogy is embedded in the language training of the LLM. By this, I’m suggesting that what we are seeing is not intelligence in the LLM but intelligence embedded in the training material as represented by the LLM.
I think this is borne out by the LLMs being used to determine protein folding. A team at Meta thought that though the “L” in LLMs was intended for language, it really just signifies a packet of data that can be analyzed. Reconfiguring the LLM to use protein chemistry as training input instead of language seemed a productive path.
It is, of course, far more complex than I have stated. My point is that the LLM produced plausible protein sequences similar to the way it produced plausible language constructs. Let’s remember that the capacity of the LLM to retain data is far greater than any human and its ability to analyze multi-dimensional space is far superior. By representing data as digestible material to the LLM, the inherent intelligence imbued in protein systems that are the product of a billion years of evolution is amenable to analysis.
This does not mean the LLM understands protein chemistry anymore than it understands language—in the human sense. What it does analyze the underlying structure beneath the training material and, given a catalytic prompt, presents plausible combinations that conform to that underlying structure.
Given this point of view, LLMs become more interesting. They are not conscious. They do not comprehend what they are doing. But they are potentially powerful tools to express the underlying sophistication and intelligence of the material they’re given.
That lack of comprehension shows up regularly—anyone using an LLM without oversight is just asking for trouble. AtomicBlender—a YouTube channel I follow—asked ChatGPT to design a nuclear reactor of the future and it gave surprising—if impractical—results.
I think that is going to be the main problem of using LLMs: the output must be verified. In the case of protein chemistry, a solution candidate would have to be verified experimentally and be assured it didn’t violate existing patents. LLM produced original material cannot be taken at face value.
LLMs are a powerful tool. The current situation reminds me of something from the Larry Niven, Protector: Intelligence is a tool that is not always used intelligently.