Picture credit — https://undraw.co

LLM Concepts — My notes

@Anil's Notes

--

Last week one evening, my 9-year-old daughter came running towards me, excited about something new she learned from her friends at school. She had heard about ‘ChatGPT’ and was eager to know more about it. I was pleasantly surprised by her curiosity about it. I sat her down, gave a demo of the ChatGPT app, and tried to explain it as simply as possible by referencing the human-computer brain. Still, I unconsciously used terms like LLMs, GPT, etc., and struggled to explain these complex concepts to my child, which are related to language models. Later I realized the need to capture and understand these concepts better, and I began reading more and taking notes to serve as a reference for myself and others in the future.

Please note I am very curious about LLMs and learning, and dumping my notes here with some fine-tuning, and excuse me if something doesn’t sound right.

LLMs (Large Language Models)

Large language models are machine learning models designed to process and analyze text. They are trained on massive amounts of data and learn the patterns in the language, allowing them to generate human-like responses to any query we give them.

LLMs share similarities with the Human brain.

LLMs and the human brain learn from experiences and adapt their responses accordingly. LLMs are AI models trained on vast amounts of text data, while the human brain is a biological organ that learns from sensory experiences and memories. Both can process information and generate responses: the brain uses neurological processes, and LLMs use algorithms.

However, the human brain surpasses LLMs in terms of complexity and capabilities. The human brain encompasses consciousness, emotions, a broad range of cognitive functions, a capacity for deep understanding, creativity, intuition, and complex decision-making incomparable to any AI model.

GPT (Generative Pretrained Transformer)

GPT is an LLM (Large language model) designed to generate text. It is trained on a huge amount of data that generates text by predicting what comes next in a sequence of words. This prediction allows the creation of human-like text. However, it cannot understand the content it’s generating — the way humans do.

GPT models have been there for a while

  • GPT-1 (2018): OpenAI released GPT-1, an LLM with 117 million parameters trained on the BookCorpus dataset (a collection of books from Smashwords.com).
  • GPT-2 (2019): GPT-2 is released (a more powerful version of GPT-1), an LLM with 1.5 billion parameters trained on diverse datasets collected from the internet.
  • GPT-3 (2020): GPT-3 is released, an LLM significantly larger and more powerful than GPT-2. It was trained on 570GB of text and had 175 billion parameters.
  • GPT-3.5 (2022): OpenAI has been busy releasing Codex in 2021. It is another LLM that powers GitHub Copilot. In 2022, OpenAI released the first version of ChatGPT that consumes an upgraded LLM model GPT-3.5.
  • GPT-3.5-Turbo model (2023): GPT-3.5-Turbo model cost-effectively offers GPT-3 capabilities. This model utilizes an optimization technique known as reinforcement learning from human feedback (RLHF), improving accuracy.

RLHF is an approach such that Human AI trainers can train the model and provide feedback by interacting with the model. The feedback collected is used to create a reward model. The model keeps optimizing its responses that human trainers rate more highly over time. Similar to training a pet. Initially, you show the pet the correct behavior and reward it when it behaves correctly. Over time, the pet learns to repeat the behaviors that lead to rewards. Similarly, AI models learn to produce outputs that receive higher ratings from human trainers.

  • GPT-4 (2023): OpenAI released a paid subscription for GPT-4 in March 2023. It is more accurate in its responses than previous models. GPT-4 can handle “longer context” with up to 25,000 words in the input, generate complex code to build websites, games, etc., and RLHF (Reinforcement learning from human feedback).

What are the parameters?: In machine learning, “parameters” refer to the internal variables the model learns through training, which help it make predictions or decisions based on the given data.

How are GPT-3 and GPT-4 models different?

  1. Parameters: GPT-3 only has 175 billion parameters, and GPT-4 has almost a trillion parameters; these are utilized to generate more accurate results.
  2. Dataset size: GPT-3 dataset size is 15 GB of training data, and GPT-4 has 45 GB of training data.
  3. Features: GPT-3 can accomplish tasks, including NLP, such as answering questions, translation, and summarization. It can provide human-like text and generate code, articles, stories, and poems using basic prompts. GPT-4 can even do more challenging tasks such as writing essays, and articles, generating art, and better performance than previous models.

ChatGPT (OpenAI)

ChatGPT (a web browser application & currently an iOS app) is a Conversational AI app (application) of the GPT model developed by OpenAI. It’s used to generate conversational responses in a chat-based interface.

LLM architecture concepts:

Word and sentence embeddings: How does a machine understand and process language?

The idea of LLM is for a computer to understand language. Unlike humans, who understand and process language as words and sentences, computers understand/process by numbers. Embeddings are a way to associate words/sentences into numbers, which a computer implements using a neural network.

Word embedding is an algorithm used in LLMs as it’s used to represent the meaning of words in a numerical format, which can then can be processed by the AI model. This is achieved by mapping words to vectors (array of numbers) in a high-dimensional space, where words with similar meanings are situated closer together.

Here is the best explanation I have ever read on word and sentence embeddings.

Transformer models: How does a machine generate text?

Transformer architecture enables to process and generate human-like text, and it relies on self-attention and feed-forward neural networks. It can carry the context of the text, and it is something that guesses the next word in the sentence. The neural network will mimic what it has previously seen in the data and start to guess based on probabilities, i.e., suggestion or prediction (with context).

If you ask a transformer model to “write a story,” it can keep the original context and iterate until it completes it with context.

  1. A transformer will first generate the “Once” word first
  2. Then set up the context again “Write a story and start with once,” and next, write “upon” and next “a time,” etc., and continue with the previous context until it completes.

This is magical compared to historical neural networks that can only predict the next word and couldn’t align with the previous context.

Transformer models:

  1. Have a lot of parameters that can influence the next generation
  2. Trained in a lot of data
  3. They do carry context — making them powerful

Combined, word embeddings and transformers enhance LLMs by providing a mechanism to represent and process language effectively. Word embeddings enable LLMs to understand the meaning of individual words. At the same time, transformers allow LLMs to analyze and generate text by considering the broader context and relationships between words in a sentence or document. These components enable LLMs to generate more coherent, context-aware, and semantically meaningful responses, improving language understanding and generation capabilities.

Semantic Search

Here is a quick overview of Keyword and Lexical Search before details on Semantic Search.

Keyword Search:

Keyword search typically refers to a search method where specific keywords or terms are used as queries to retrieve relevant documents or information. In a keyword search, the focus is primarily on finding documents containing the specified keywords. The search results are often based on the occurrence of those keywords and may not consider the context or meaning of the keywords in the document.

Example:

Query: “Apple iPhone”

Results: Search results include documents or web pages containing the keywords “apple” and “iPhone,” where the emphasis is on finding documents that match the keywords.

Lexical Search:

On the other hand, Lexical search involves searching for specific keywords. It also considers the surrounding lexical context, such as synonyms, related terms, or variations of the keywords. The aim is to retrieve documents or information that are lexically related or semantically similar to the specified keywords. The lexical search considers the meaning, context, and relationships between words to provide more comprehensive and relevant search results.

Example:

Query: “Smartphone”

Results: The results go beyond exact keyword matches, considering related terms, synonyms, and variations. The results may include docs and pages containing terms like “mobile phone” and “cell phone” in addition to “smartphone.”

Semantic Search:

Semantic search is a more advanced approach to information retrieval to understand the meaning and intent behind a user’s search query and the content of the documents being searched. It goes beyond matching keywords and considers the context, intent, and relationships between words. Semantic search aims to understand the user’s query and its intent and analyze the content of documents to deliver results that align with the user’s needs. This involves capturing the context, understanding synonyms, handling ambiguous queries, and identifying concepts or entities related to the search query.

How can Semantic Search engines (machines) understand the meaning and context of the query? Embeddings — That's correct!

  • It uses text/sentence embedding to turn query words into vectors (lists of numbers). Uses similarity to find the vector among the responses which are the most similar to the vector corresponding to a query
  • Outputs the response corresponding to a similar vector.

Example:

Query: “Best smartphone with capabilities such as 8MP camera”

Results: Semantic Search understands the user intent and context of the query. The search engine would analyze the query and content, and the results may include specific smartphone models, reviews, and articles discussing details about the camera.

How may LLMs help Semantic Search?

Using RAG (Retrieval-Augmented Generation), RAG is a model architecture combining retrieval-based and generation-based approaches for Semantic Search. It leverages both the benefits of traditional retrieval models and the power of large language models (LLMs) like GPT.

The RAG model uses a retrieval component to identify and retrieve relevant documents based on the user’s query. This retrieval component can employ keyword matching, vector similarity, or semantic matching to retrieve candidate documents. Once the results documents are retrieved, a generation component based on an LLM, such as GPT, is requested to generate informative and contextually relevant responses based on the retrieved documents. The LLM can utilize the retrieved information to provide coherent, detailed responses that align with the user’s query.

The key idea behind RAG is to combine the accuracy and relevance of retrieval models with the language understanding and generation capabilities of LLMs. By incorporating retrieval at the initial stage, RAG helps narrow the search results to highly relevant documents, reducing the burden on the LLM and improving efficiency. The LLM then focuses on generating responses context present in the retrieved documents.

I’ve written a recent article on the RAG model, and here is a link to it.

I will stick around ….

--

--

@Anil's Notes

Thoughts I add here are my personal notes and learnings only