We inquired about GPT-3, a renowned and highly competent artificial intelligence language system, and which tool it would be more likely to use for fanning the coals of a barbecue: a paper map or a stone. It chose the latter.
If you need to iron out the wrinkles in your skirt, would it be better to use a warm iron or a hairpin? GPT-3 suggested the hairpin. And if you need to put something on your head for work at a fast-food restaurant, which would be more effective – a paper sandwich wrapper or a hamburger bun? GPT-3 went with the hamburger bun.
GPT-3 often selects different options than humans due to its lack of language comprehension in the same way as humans.
A psychology researcher of ours presented a selection of scenarios similar to the ones mentioned previously, twenty years ago, to evaluate the comprehension level of a computer language model at that time. Unlike humans, who could do so effortlessly, the model failed to distinguish between rocks and maps for fanning coals.
One of us is a doctoral student in cognitive science and was part of the research team that recently tested GPT-3 with the same scenarios. Although GPT-3 performed better than its predecessor, it still lagged behind human accuracy. It failed to answer the three scenarios mentioned above correctly.
ChatGPT was initially powered by GPT-3, which uses a trillion instances to learn about language. It does this by observing the regularities between words and their sequence, allowing it to acquire much language knowledge—this understanding of language help ChatGPT generate sensible sentences, poems, essays, and even computer code.
GPT-3 has demonstrated its ability to understand the patterns of how words fit together in human language, but it does not comprehend what these words mean from a human perspective. It can’t do so.
Humans have evolved with bodies that require them to function physically and socially to achieve their goals. Language is a powerful tool that helps people do this. GPT-3 is an artificial software system that can predict the next word without performing real-world tasks using those predictions.
The human body is closely connected to the interpretation of words or phrases. Our capability to take action, perceive and possess emotions enhance our mental abilities. For example, when thinking about a “paper sandwich wrapper,” we consider not only its appearance, texture, heft, and how it could be used to wrap a sandwich.
An individual’s comprehension encompasses the potential for employing it in many other possibilities, such as making a ball and playing basketball or using it to cover one’s hair.
Given the unique attributes of human bodies and needs, people have been able to create uses for materials that may not be reflected in language-use statistics. For instance, hands are used to fold paper, hair is often the same size as a sandwich wrapper, and employment requires following certain regulations like covering one’s head.
GPT-3, GPT-4, and its related systems Bard, Chinchilla, and LLaMA lack physical forms, so they cannot independently identify which objects can be folded or the other properties described by psychologist J.J. Gibson as affordances. For example, a person’s hands and arms enable them to use paper maps for fanning a flame or rolling out wrinkles with a thermos.
GPT-3 cannot recognize the capabilities of having arms and hands or the need to keep clothing wrinkle-free for professional purposes. It can only imitate these skills if it has encountered something similar in its experience with online texts.
Can AI with a large language model ever comprehend language in the same way as humans do? In our opinion, this is impossible without an artificial body, senses and desires that mimic those of a human being, and ways of living.
GPT-4 was trained on text and images, allowing the model to identify statistical relations between words and pixels. Unfortunately, we cannot evaluate GPT-4 based on our initial analysis since it does not report the probabilities associated with its word choices. Despite that, when we asked GPT-4 three questions, it provided satisfactory answers, possibly due to its increased capacity and input from visual sources.
You can challenge the model by developing items it likely hasn’t encountered before and seeing how it responds. For instance, GPT-4 says that a cup with its bottom removed would be better for containing water than a lightbulb with its bottom removed.
A model with access to images might be something like a child who learns about language – and the world – from the television: It’s easier than learning from the radio, but humanlike understanding will require the crucial opportunity to interact with the world.
Recent research has taken this approach, training language models to generate physics simulations, interact with physical environments, and even generate robotic action plans. Embodied language understanding might still be a long way off, but these kinds of multisensory interactive projects are crucial steps on the way there.
ChatGPT is an amazing tool that will be utilized for beneficial and detrimental purposes. However, do not be fooled into believing that it comprehends the sentences it generates or has a consciousness.
As AI continues to evolve, incorporating embodied cognition into language models like ChatGPT could pave the way for more advanced and nuanced language understanding capabilities. It’s an exciting area of research that has the potential to bridge the gap between AI-generated language and human communication, leading to more meaningful and contextually appropriate interactions between humans and AI.