LLMs as explained by The Big Lebowski
AI does not start and stop at your convenience. —Walter Sobchak (probably)
Since the public release of OpenAI’s ChatGPT nearly two years ago, it’s impossible to open Linkedin, X, or any business news without seeing some mention of Artificial Intelligence or Large Language Models (LLMs). Technologists, politicians and business leaders agree that many facets of human life will be impacted by the advances of this technology. Companies and individuals are increasingly using LLMs to accomplish tasks faster than they ever could, which is leading to incredibly exciting commercial, social, and political ramifications.
Despite the hype, only a small percentage of people actually understand what an LLM is and how these models really work. Though not an expert, I decided the best way to share what I know on the topic is with the beloved 1998 film The Big Lebowski. For those who have not seen it, the film’s tension is confusion around mistaken identity. Prior knowledge of the Big Lebowski is unnecessary to read this article but, since it is an all time classic, I suggest you read the rest of this article, figure out how concerned you should be about LLMs, and then go watch it.
Data and Compression
Let’s start with some basics. The models that power Artificial Intelligence are algorithms: mathematical representations designed to detect patterns learned by training on data sets. These data sets can contain all kinds of information in different formats. It can come already tagged/labeled by humans, so that the model can see the correct input and output example to be able to answer new but similar questions (i.e. Supervised Learning). The model can also be given lots of raw data and an objective to follow without explicit instructions how to accomplish the task (i.e. Unsupervised Learning). Hybrid models use both labeled and unlabeled data. In any case, primary factors impacting a model's effectiveness is the quantity and quality of the data on which the model is trained. Even with fine tuned parameters (the weights the model learns from the training data), without enough high quality data, any model will be useless.
ChatGPT and other language-prompt interfaces are powered by LLMs. An LLM differs from traditional models because of their size (some contain upwards of 70 billion parameters) and their UI/UX design (which enables output based on conversational text prompts). LLMs are first trained on massive sets of internet data that are compressed into these many parameters. This forms the basis of the neural network1 that will try to predict the next word in a sequence. While this part is largely unsupervised, the models would not be very useful at this point. To help improve the performance, ChatGPT, Gemini, and other leading companies will have humans manually train and improve the models. This process is called Fine Tuning, which involves swapping out the compressed internet data in exchange for the human supervised responses.
Over time, if successful, the LLM should be able to provide accurate responses to increasingly difficult questions. This is the basic process to get LLMs in a useful state. Sounds simple right? Not so fast, there are a few more concepts you need to understand before we get into The Big Lebowski. The first of these are Word Vectors.
Word Vectors and Association
A Word Vector represents words using a series of numbers. They appear as a set of coordinates (e.g. 0.0041, 0.1441, -0.9782, etc.), the closer the association between two words, the closer the numbers will be. This differs from human language, which does not reveal anything about the relationship with other words based on letters alone.
In English, two words that are closely associated are pizza and pasta. Though they start and finish with the same letters, you would not be able to determine any link between them. Otherwise logic would dictate a similar association from words such as panda, panorama, or propaganda2. In LLMs, Word Vectors numbers are not static; they change as the model determines a closer association between two words (making the numbers closer together). With this, LLMs can more easily predict the next word in a sentence based on the most commonly associated Word Vectors. As such, these vectors provide important information about the relationship between different words.
(Source: Hariom Gautam Medium Article Word Embedding: Basics)
A complicating factor here is that words often have multiple meanings and interpretations depending on their context and use.
A word's denotation is the dictionary definition of the word, whereas its connotation is what the word implies.
A word on its own will normally be interpreted somewhere close to its denotation. Whereas the situation, or words used before or after, can create an entirely different meaning because of the connotation. As an example, if somebody called out “Dude!” one would interpret that this person is calling out for an unspecified man. If they said “The Bowling Lane is ready for The Dude and his friends!”, you would realize that Dude is a specific person’s nickname. You need to be able to understand both the denotation and the connotation of words which is why context is vital for humans and LLMs when trying to understand and execute requests.
While Word Vectors depict a relationship between words; on their own they have limited applications because they lack context. The lack of context is what drives most of the trouble The Dude encounters in the Big Lebowski. LLMs solve the missing context problem by using something called a “Transformer.”
Transformers
A transformer is a layer that adds information by extracting context present in the data set to help clarify the meaning of words used in a prompt.
LLMs are built with many layers, the first few start by identifying the syntax of the sentence to iron out words with many meanings. The outer layers increasingly build upon the understanding of the previous layers to be able to understand precisely what the sentence is trying to say. The most recent LLM models now have tens of thousands of these transformers, which is why they can understand your prompts, even if you have typos, imperfect grammar, or incorrect sentence structure.
(Source: Timothy B Lee and Sean Trott Substack Article LLMs explained)
Now to The Big Lebowski.
The film’s protagonist is named Jeffrey Lebowski. He does not, however, go by this name, instead he prefers to be called “The Dude.” This is something not known to the many characters he encounters throughout the film, as they seek to blackmail a different Jeffrey Lebowski (Big Lebowski), who they know as the wealthy husband of Bunny Lebowski. Without giving too much away, The Dude lives a very quiet and laid-back lifestyle that largely involves bowling with his friends and drinking White Russians3. He is not wealthy, he is unemployed and behind on his rent.
One day, he comes home to find two thugs waiting in his apartment, to collect money on behalf of their boss, noted pornographer/loan shark Jackie Treehorn. They say they are there to collect money from Lebowski and, as his name is Lebowski and alleged his “wife” said he would be good for it, they want the money.
Where's the money Lebowski? (Video Clip)
The Dude immediately points out that his apartment does not look like a place a woman would live and he is not wearing a wedding ring. The Thugs realize their mistake and leave angrily, The Dude sits bewildered, processing the fact that he watched a grown man pee on his carpet, while another stuffed his head down a toilet.
What clearly happened here was that Jackie Treehorn gave his thugs the instruction to “Collect the money Bunny owes me from her husband, Jeffrey Lebowski.”
If an LLM was processing this, the first few transformers would help identify that Jeffrey Lebowski was referring to a specific human person.
It would have looked something like this: Word Prompts (What is Jeffrey Lebowski?) —> Translate Words into Numerical Word Vectors (0.3541, -0.9141, .2141 etc.) —> Run sequence through a transformer to provide additional context (Jeffrey Lebowski is a proper noun).
Now that they established that Lebowski is a person and the rest of the prompt mandated they go retrieve money from said individual, all seemed simple. However, as the scene unfolded, this was clearly insufficient to accomplish Treehorn’s task.
With the help of transformers, their probability of success would have been higher if they realized4:
Lebowski being married to Bunny
Lebowski is very wealthy
There are more than one Jeffrey Lebowskis in the Greater Los Angeles area
The Big Lebowski is confined to a wheelchair
The Jeffrey Lebowski whose apartment they broke into goes by “The Dude”
With this information, they would not have confused The Dude and the Big Lebowski.
Now that the first few layers of transformers have clarified that the person they need to ask for ransom is the wealthy Big Lebowski, there is the practical matter of how. Transformers need to take a given prompt then execute a response, which happens in two steps.
First they evaluate the words given and look around for other words within the network that have relevant context to help it make sense of them. This is the attention step.
Next, each word evaluates the information gathered in the previous steps in order to try to predict the next word. This is the feed-forward step.
With the right data set and a properly fine-tuned model, simply prompting Lebowski instead of The Dude would have given the model the clue that the prompter was looking for the Big Lebowski. This would have been figured out in the attention step, since the information that The Dude does not go by his legal name and is not wealthy would be available within the network. Even if the thugs could not access the information within the network, after receiving the instructions they could have realized that Treehorn meant the Big Lebowski (feed forward step).
Note that Treehorn said he wanted the money that Bunny owed him and to get it from her husband. Nothing about the way The Dude lived demonstrates that he can afford ransom money. The thugs should have immediately realized somebody wealthy would live in a nicer house or area or at least with Bunny. By paying attention to all the information they were given and discovered up until The Dude arrived, they should have realized that they had the wrong guy.
The Dude did not have enough information at the start of the film to understand what was happening. His neural network did not know Jacky Treehorn, the other Lebowskis, Karl Hungus and the Nihilists, or their relationship with one another. He was able to acquire more information and learn after each interaction with these different characters. When Karl Hungus and the Nihilists claim to kidnap Bunny and threaten to kill her unless her husband pays her ransom, The Dude believes them. The Nihilists knew she just went out of town without telling the Big Lebowski, so they took advantage of the situation. Like the Dude, they falsely assumed that the Big Lebowski had a lot of money, and would pay the ransom. When The Dude learns from Maude Lebowski that her father does not have any money, he realizes that the Big Lebowski was trying to take advantage of the situation to embezzle money from his foundation. Maude fulfilled the role of the transformer by helping The Dude make sense of the information, which eventually helped him realize what the Big Lebowski had done. His process of acquiring information, learning and adjusting based on newly acquired prompts and information, could be seen as being similar to an LLM.
LLMs need so much more data and context to be useful because, compared to humans, AI is still not that intelligent. Popularized by the late Daniel Kahneman, human thinking can be divided into two systems:
System 1, which is quick, instinctive thinking that can be quickly retrieved within a few seconds
System 2, is slower, more complex thinking that takes much more reflection.
System 2 can allow inferences based on information or context that might not be immediately evident. Humans have the luxury of being able to use both systems whereas LLMs can only use System 1. To compensate, they need incredible amounts of parameters, transformers and other tools to provide useful results. Despite having incomplete information, Treehorn’s thugs did not take advantage of their System 2 thinking, which led to them to urinate on an innocent man’s rug.
To end, LLMs work by training on large data sets, and receive fine tuning with the help of manual human intervention. By using word vectors and transformers, LLMs help compensate for their lack of human intelligence via prompt identification and association. These models have gotten better outputs because of the increasing number of parameters and transformers in each new version, giving them more information and context.
The Big Lebowski is the perfect film to show the difficulty machines (i.e thugs) have accomplishing simple requests when missing relevant information and context.
Well, that’s just like, my opinion man. What’s yours?
A Neural Network is simply the way Artificial Intelligence teaches models to process data
As far as I know, pizza and pasta are in no way linked to pandas, unless you have found yourself at a Panda Express & Dominos drive-thru late at night.
A white Russian is a cocktail made with vodka, coffee liqueur and cream served with ice in an old fashioned glass
This assumes that this information is all available in their neural network