Chatbots vs Actual Intelligence

Everybody else is weighing in on the chatbot revolution. I guess it’s my turn.

(Picture from here.)

ChatGPT, and the other “AI” tools that have been showing up lately, do not conform to what I would call actual intelligence. They are extremely good and predictive models of human generated text. Their ability to sound nearly human should not be surprising—they’ve been trained across the largest single repository of human generated text in the history of human beings. Their abilities were fine tuned with help from human beings. The training reward structure was generated by more human beings. Policy updates of the system are evaluated using mechanisms that were—you guessed it—based on human input.

Given all of that, is it a surprise that it kind of sounds like a human being? That when we ask it a question, it returns an answer that resembles a human response?

Add in the way that humans project humanity onto anything with the remotest similarity—a face on a piece of toast, for example—and we get ChatGPT.

I am not saying that ChatGPT—and Bard and the others—is not a lovely piece of work. I’m saying it’s a sophisticated chatbot, where the term “chat” is defined in the broadest possible manner. In this context, “chat” includes narrative, essays, poems, and computer code. In short, whatever ChatGPT trained on.

We like to consider ourselves sentient—capable of feeling. This is the ability to directly experience the world, from joy to suffering. Humans have added additional intellect and consciousness. Not all animals have these two additional qualities but I will not itemize that here.

For humans, all three of these qualities are mixed together. It’s hard for us to understand how a cat or dog function without somehow imbuing them with our own humanity. We forget a dog, for example, is not a human being and treat it as if it were. Carnivora—the order containing dogs and cats—last common ancestor with primates predates the end of the dinosaurs. Yet, both groups clearly have the ability to feel and to suffer. That is common between humans and their pets.

It’s quite possible that neurological sophistication beyond a certain point requires something internal to experience the world. Certainly, it’s not just the province of mammals—the stem group that evolved into birds split from the stem group that evolved into mammals much further back than carnivores and primates. But it’s fairly clear birds can suffer.

They can think, too, but use a different mechanism. Mammals use the neocortex for our intelligence. But the neocortex evolved after the split so bird intelligence must use a different part of the brain than mammals to get similar effects.

I would go so far as to say cephalopods share the same quality of personal experience. They are clearly intelligence but cephalopods diverged from vertebrates before brains evolved.

None of that applies to ChatGPT.

ChatGPT’s underlying “hardware” is a neural net, an architecture that is inspired by a simplified model of the neuron networks in biological systems. It is excellent at learning and responding to patterns—which, in a large sense, is what ChatGPT is doing. The underlying ChatGPT structure resembles a slice of a brain but not a brain in and of itself.

It is possible that a more sophisticated system—one that resembles more than just a slice of a brain—might actually have sentience. It’s certainly possible with sufficient computing power and understanding that a vertebrate brain could be simulated in a machine. Whole SF novels have been written on the idea. But at that point, it’s a brain, not a machine. It would have personal experience because we simulated it to do so. I don’t think that’s necessarily a good thing—a brain in a box would have the ability to suffer.

A more interesting experiment would be not to create a vertebrate brain, but use a cephalopod model. Cephalopods are very smart and appear to have personal experience but their brains do not resemble vertebrate brains at all. Possibly, we might get some understanding of how personal experience originates.

But it’s not ChatGPT.

One area where ChatGPT becomes interesting is not how well it performs but the intricate ways it fails. It generates language but it cannot think—it has no sense of the meaning of what it generates. A paper submitted in January goes so far as to suggest we are asking language to represent thought and it is insufficient to the task.

Years ago, Alan Turing suggested the imitation game (also called the Turing Test) as a measure of machine intelligence. The test consisted of a researcher interacting with two other individuals, one of which was a machine, via a text conversation. If the two were indistinguishable from one another, the machine would be deemed to be as intelligent as a human.

ChatGPT satisfies this test for a significant duration. However, we know from its mistakes that it is not equivalent to human intelligence. This suggests that the original Turing Test, with its dependence on language, cannot reflect an accurate evaluation of the “humanness” of the machine. (This is further discussed here.)

In this respect, ChatGPT has no more sentience than a crescent wrench—one of the world’s most useful tools.

I suspect that meaning derives from sentience. That is, without direct personal experience, attribution of meaning to words is suspect. (That said, it’s still suspect since individual personal experience is unique between individuals.) The phrase, “I want to be free,” cannot be evaluated without knowing the personal experience of the person that utters it. Even further, can any such utterance be understood without knowing something of the personal experience of the speaker? What does “I want to be free” mean from a human in China? In Seattle? In 2023? In 1862? For that matter, what does it mean when spoken by a dog or a cat? Or ChatGPT?

Regardless, ChatGPT—and its relatives—do present us with some interesting problems. For one, ChatGPT presents narrative that is without verification. Yet, these chatbots are being included in systems that we use—fabricating material is a bad thing here.

Consider search—the reason for existence of Google and Bing. How many times have we had the experience of looking for a search term, picking the most likely link, and then find out the search term does not exist at the site? This is verification. Frustrating but necessary.

ChatGPT presents compelling narrative that may or may not be true, may or may not be entirely fabricated. Including it in a search tool is absurd. Unless, of course, you do not want that search tool to be usable.

There are incredibly intelligent AI systems out there in medical diagnosis, protein analysis, and other computationally difficult arenas. They deliver results that are then verified prior to use. Narrative chatbots need such mechanisms.

Another interesting issue is how well ChatGPT generates usable material. It has, for example, passed several tests including the bar exam, AP biology, and the SAT. It has written passable software code. It has written essays—with the caveat of verification noted above.

When I read this, the first thought was that these tests had failed in their purpose to evaluate a human’s knowledge of the subject. It showed that these tests revealed no human ability other than the taking of tests—something that has been argued for some time.

Like the paper above dissociating language from thought, we need to do the same. The ability to use thoughtful language well is valuable. The ability to use language well without thought is less so.

Popke - Blog

Monday, April 3, 2023