World of GPT and LLMs: Top Myths and Facts

About a year ago, with the introduction of Large Language Models (LLMs) such as GPT changed the way we use AI in the day to day in solutions. A considerable number of businesses large and small alike are using AI to build systems that assist with building better solutions. However, there is a lot of confusion and challenges still there on how we implement the Large Language models to solve these scenarios.

I have been working with numerous businesses in different various industry verticals across last year to conceptualize, design and build solutions using GPT models. In this blog, I will share my experience and we will look at the Top myths and facts of GPT models and how we can effectively use them to solve business requirements.

Before we start, let me introduce the concept of Large Language Models or GPT models. The Large Language Models or GPT (Generative Pre-trained Transformer) Models are AI models that have been trained with massive number of parameters (in billions or trillions) and data to provide human like contextual answers.

Should I train GPT models with my data?

MYTH: I could train GPT models with my data and it will answer questions like it does with Chat GPT. Hurray!!

FACT: I suppose this is the most common question and the confusion about GPT models which businesses are struggling with. Traditionally, all AI models were trained with informational data, and it could then answer with the datasets of information provided to the models. However, in the case of GPT or LLMs as the name suggests, the models are already trained with immense datasets and billions of parameters, so no training is required. Below are some of the specifications for GPT models that are available to date.

Models	Parameters	Trained Data
GPT 3 & 3.5	175 billion parameters	More than 500GB
GPT 4	More than 1.7 trillion parameters	A lot of data 🙂

Courtesy: Wikipedia

Considering the above data size and parameters of data that GPT has been trained we could consider that providing it with data of any corporate size will have any impact on the model and also the training will use a massive amount of compute power. Hence, in the GPT world Training is not entirely possible.

So, the next question what can I do? The answer to this is in the questions below.

Difference between Prompts, Tokens and Context?

MYTH: There is no myth on this one. Pretty much many of the businesses still don’t understand the concepts of these so will describe below.

FACT: As we saw in the above question, GPT is trained with heavy amount of data, so if we need GPT to have context of our question, then we can provide information to GPT along with our question. The information provided is measured in Tokens, the question along with any context we provide is called as Prompt, and the information that we provide with the question or conversation is Context.

In other words, GPT processes information as Tokens provided using Prompts in a specified Context. The definitions of Prompts, Tokens and Context and how they are related is answered below.

Tokens: Token is fundamentally the base element on how GPT accepts information into the model. For e.g. For English language, the base element is a letter which forms a word. So, in GPT the token is a letter. Tokens are calculated both for input and output of GPT models. It is a sum of both input information and output answer.

Context: : Since GPT models cannot be trained, we can provide information to GPT models during the question. This information is called the context. The Context can be any information relevant to a question, the problem or supplementary information for GPT to construct the answer.

Prompts: Prompts are the inputs to GPT model along with Context. Prompt could be the sum of all information provided to GPT during a conversation. A prompt has all the information that GPT uses to process an answer.

In other words,

Prompt = User Question + Conversation History + Contextual information/Background information provided + System Messages

In the calculation of total tokens, prompts play a particularly key role. We will discuss more about this in another future blog.

Should I then just Fine-tune GPT or LLMs to use my data without training?

MYTH: If Training is not possible, I should maybe fine-tune the GPT models or LLM models to use my data. This will solve my problem. Right!

FACT: The simple answer to this question is No, Fine-tuning, is not meant to address the problem of using your own datasets for GPT models. Fine-tuning provides assistance in another part of GPT or LLMs modelling which is the way GPT answers the questions for businesses. We call it providing a personality to GPT models. We will look at Fine-tuning in more detail in upcoming blogs and discuss whether to Fine-tune or not.

So how do I let GPT use my data?

MYTH: If I cannot Fine-Tune to let GPT use my data then may be filter the data surfaced to GPT

FACT: So, filtering is an option to allow GPT to surface the data but then the system that will filter the data has to be powerful enough to get the context of the data so it can provide the right amount of information to GPT in the Prompt. In order to that, there are various systems available, for e.g. one of the approach that Microsoft recommends the use of Azure AI search to solve this problem.

The approach to keep the context sustainable enough for GPT token limits without comprising on efficiency of data provided to GPT in the Context and in the Prompt. There are various engineering techniques that allow us to help with creating the context and prompt effectively termed as Prompt engineering techniques to assist with Prompt management. We will look at some of these techniques in an upcoming blog.

GPT can respond to all my questions, right?

MYTH: We assume that GPT has been trained with all the data availabe on the Net so it knows everything but that is not the case.

FACT: GPT has been trained with such a massive amount of data that gives a feeling that it knows everything. But actually, it is not. It is however trained with a very smart Skill which allows it to answer any question with some context in mind. This is called Hallucination

Hallucination is one of the main concerns that businesses are worried about with GPT solutions. In this case, the answer will seem right but is not factually correct. Let me share an example below.

As you can see, the answer looks right but is not in context (notice the addition of element Day into the answer which was not mentioned in the question or prompt) to the answer you are looking for. This is where we will need to work out a prompt that understands the context and answers accordingly. In other words, we must tell GPT what to consider and ignore. Below is an example of the same.

Also, another tip is to include the feature to say No when it does not have the information it needs instead of generating an answer with incomplete or scarce data. To do this, we will need to set GPT parameters provided by the systems for e.g., Temperature and Top provides by Azure OpenAI

The above are obviously few of the tips for tackling hallucination. I will detail more of the steps and other approaches in another blog.

GPT considers all new information available in Internet

MYTH: GPT is a knowledge base that has all the information you need to answer a question or mostly it is directly connected to internet

FACT: It is true that GPT models, especially GPT-4, have been massively trained with data available on the web, but it doesn’t pull data actively when we look for an answer. It might have some context of information that we are looking for but not all answers. So how do we do that?

The approach to that is to use your own data ingestion engine such as Bing Search API to pull data from publicly available content and store it in a data mining or search solution such as Azure AI Search. This can then be used to automatically provide context to the information that needs to be answered.

Conclusion

In this blog, we saw few of the Myths and Facts for GPT solutions but there are lot more best practices and approaches on how to build effective solutions using GPT models, I will be sharing more strategies and show you how to build solutions with GPT in upcoming blog series.

Happy GPTing!!