The scrutiny of GPT-4, the most advanced AI-based language model yet, is in full swing; its myriad flaws—including dishonesty, bias and bigotry, and lack of common sense—align with those observed in earlier models. Sadly, it appears to commit the same errors still.
Although these large language models boast impressive abilities, such as creating webpages, planning holidays, and imitating William Faulkner’s writings, they have been largely overlooked due to their poor recall. These models need a tremendous amount of energy to run, yet they possess the memory of a goldfish.
If you ask ChatGPT, “What color is the sky on a sunny, cloudless day?” it will generate an answer by predicting the next words. The response would likely be, “The sky on a sunny, cloudless day is usually a deep blue.” If you then say, “How about on an overcast day?”, ChatGPT knows to interpret this as a continuation of your previous question and so understands that you are asking for the color of the sky on an overcast day.
ChatGPT’s capability of recollecting and comprehending inputs enables it to engage in a conversation that resembles human communication rather than merely supplying off-the-cuff replies like an enhanced Magic 8 ball.
The issue is that ChatGPT and other large language models have poor memory. Each time the model generates a reply, it can only consider a limited amount of text, referred to as the context window.
ChatGPT has a context window of approximately 4,000 words, long enough to go unnoticed by the average user but too brief to carry out complex activities.
An example of this would be that context windows cannot summarize a book, evaluate an extensive coding assignment, or search Google Drive. (It is necessary to note that the size of context windows is calculated in tokens instead of words when dealing with audio and textual inputs.)
An example is if you provide ChatGPT with your name, add 5,000 words of random text into the box and then ask it what your name is.
Regardless of what you say, ChatGPT will not remember your name if you give it 5,000 words of nonsense and then ask for it. It does not matter how explicit the statement is; the result will remain the same.
GPT-4, the software developed by OpenAI, has a context window of approximately 8,000 words—the equivalent of an hour of face-to-face conversation. An even more powerful program version not yet available to the general public can process up to 32,000 words.
Raphael Millière, a Columbia University philosopher whose area of expertise is Artificial Intelligence and cognitive science, claims that the most remarkable memory has been accomplished by a transformer – the type of neural net on which all the most extraordinary large language models are presently based on.
OpenAI placed a lot of emphasis on broadening the context window, as they allocated an entire team to this challenge. Still, it is unknown how the team achieved its goal since OpenAI has not provided any details about GPT-4’s internal operations.
OpenAI did not grant me an interview with any members of the context-window team when I asked for one, as outlined in their technical report accompanying the new model. This report justified their secrecy based on the competitive atmosphere and potential safety risks posed by Artificial Intelligence.
GPT-4 may have improved its short-term memory but it still cannot retain information from one session to the next. No matter how much engineers increase the context window size, GPT-4 will always start anew each time a new conversation begins. Every time it is powered up, it has to relearn everything from scratch.
Extending the context window is no simple task. Millière informed me that as the engineers expand it, the computation power required to run the language model and its cost of operation grows exponentially. This is a difficult challenge, even without resolving this issue with long-term memory.
Alex Dimakis, a computer scientist at the University of Texas at Austin and a co-director of the Institute for Foundations of Machine Learning, stated that the total memory capacity of a machine is also an obstacle. He informed me that there is not currently any existing computer that could handle, for example, a million-word context window.
Some AI creators have created alternative solutions to extend the context windows of language models. They did this by designing the model to record each conversation continuously.
The model has a 4,000-word context window, so when a conversation runs to 5,000 words, it will save a 100-word summary of the first 1,100 words for its reference and then remember that summary plus the last 3,900 words.
As the dialogue progresses and becomes increasingly lengthy, the model keeps revising its summary—an ingenious albeit temporary solution. After 10,000 words have been exchanged, that 100-word summary would need to encompass the initial 6,100 of them; naturally, it will miss many details.
Dimakis suggested that a more drastic design change, potentially even ditching the transformer architecture on which every GPT model has been based, may be necessary to resolve the rebooting issue. This idea differs from other engineers’ more intricate short-term memory fixes. Simply increasing the context window won’t suffice.
The root of the issue is not so much an issue of recollection but instead one of discernment. People can classify their experiences: we generally recall what is significant and ignore the massive amount of insignificant data penetrating our lives daily. In contrast, large language models are unable to differentiate.
Without the capability for triage, there is no way to differentiate between what’s important and what isn’t. As Dimakis mentioned, “A transformer preserves everything; it regards everything as vital.” The issue is not that massive language models can’t remember – they cannot determine which things should be discarded.
While there may be limitations to GPT-4’s memory, this does not mean that the model will be unable to perform complex tasks or learn new information. It will still be a highly sophisticated language model capable of generating responses that are increasingly indistinguishable from those of a human. The development of GPT-4 represents an exciting new milestone in AI and machine learning, and we look forward to seeing the advances it brings.
Source: The Atlantic