Have you ever considered creating a chatbot? It might seem like a straightforward task, but, in reality, it involves more complexities than you might initially anticipate. The central challenge lies in enabling the AI to respond meaningfully to the user’s queries, which often requires context. Let’s say a user’s question revolves around their company’s HR policies. Here, having just general knowledge isn’t enough.
What Is a LLM (Large Language Model)?
One of the critical components is a Large Language Model, also known as an LLM. If you’re familiar with GPT-4 that powers chat GPT, then you’ve already had a brush with an LLM. These models are trained on extensive data, enabling them to mimic human responses to particular questions. This enables powerful new user interfaces where we can not only receive generated text, but also have natural back-and-forth conversation. This allows for iterative interactions that make LLMs particularly useful for exercises such as brainstorming, planning or even programming.
These responses are based on the general knowledge the model has been trained on, and, at times, these models can produce “hallucinations.” Hallucinations are bits of information that might sound correct on the surface, but contain false or made-up information. Luckily, we have many ways that we can handle the hallucination problem.
What is Prompt Engineering?
At times, we might want to guide the model on how to respond appropriately to the user, and this is where “prompts” come into play. Essentially, prompts are instructions, guiding the model to answer in the expected manner. Common instructions include “If you don’t know the answer, say ‘I don’t know,’” and “Think about the problem step by step.” It may seem trivial, but these simple instructions have shown to increase accuracy.
The prompt is also where we can add additional information using Retrieval Augmented Generation (RAG). In simple terms, RAG is the process where we take the “right” answer and put it into the prompt. This allows the LLM to generate an answer the includes the right information and has a dramatic impact on the quality of the response.
What Do Embedding, Vectors and Vector Databases Have To Do With Llms?
The core of RAG lies in a technique called “embeddings,” which allows us to convert text content into mathematical structures called “vectors.” If you took linear algebra in school, you may remember that vectors can be compared, giving us a simple way to determine how closely related two pieces of text are from each other. Additionally, with Vector Databases, we can store these vectors for easy retrieval and comparison. In our HR example, we can transform our HR data (such as employee handbooks, policies, etc) into vectors. When the user asks their HR question, we can also convert it into a vector and using a little math magic (and our vector database) we can find the pieces of text that best match the user’s query. We then insert these responses into the prompt, providing the LLM with better insights into answering the user’s query effectively.
What Is a Cognitive Architecture like ReAct?
At this point, you essentially have a functional chatbot. However, it can be further improved by incorporating a cognitive architecture. This process involves chaining multiple prompts and LLM responses together to create an even more accurate answer. ReAct is one such architecture that is short for Reason-Act. This architecture uses a prompt to first ask the LLM to reason out the answer. That result is then included in the next prompt that asks the LLM to act upon that information. This process takes prompt instructions such as “think about the problem step by step” to the next level by introducing actual steps. In more advanced implementations, this back and forth may happen several times.
Cognitive architectures open up the possibility for more advance capabilities, such as the chatbot running code or using tools. All of this leads to better responses, albeit at the cost of additional processing and time.
What Is Fine-Tuning?
The final component to consider is fine-tuning your model. Fine-tuning is a process where we make the LLM model more purpose built for our situation. This involves feeding additional information to your LLM to tailor its responses, which is beneficial if you’re seeking to reduce hallucinations or adapt the output to a highly specific format. This sounds perfect, but comes at a cost. When fine-tuning, models often become less suitable for general functions. Additionally, fine-tuning can be expensive and require a lot of data.
A good rule is to use fine-tuning when you want to structure the output of the response. Imagine instead of answering general HR questions, we were tasking the AI Bot to fill out detailed HR legal documents. We would fine-tune the model by giving it many examples of these specific legal documents so that it learned and understood the structure. This would greatly improve its ability to provide the right structure. We’d still use RAG to provide the correct information for the document in most circumstances and in general, RAG remains the best approach for providing the right info to the user and reducing hallucinations.
How Do You Become Ready for AI?
LLMs and AI are already transforming how we work. With this understanding of how a chatbot works, you’re ready to start taking steps to make your company ready for the company revolution. If you’re ready to find your path to AI, don’t hesitate to contact us today!