Skip to main content

Command Palette

Search for a command to run...

RAG Chatbot using LangChain and OpenAI

Updated
11 min read
RAG Chatbot using LangChain and OpenAI

In this blog post, we will explore the process of building a sophisticated chatbot by leveraging the capabilities of Retrieval-Augmented Generation (RAG), alongside powerful tools and platforms such as OpenAI, DeepLake, and Langchain. We will delve into each component, discussing how RAG enhances the chatbot's ability to provide relevant and context-aware responses, and how OpenAI's models can be utilized to generate natural language interactions. Additionally, we will cover the role of DeepLake in managing and retrieving data efficiently, and how LangChain can facilitate the smooth integration of these technologies. By the end of this tutorial, you will have a comprehensive understanding of how to create a fully functional chatbot that intelligently interacts with users.

The components we are utilizing include:

  1. RAG (Retrieval-Augmented Generation): RAG is an innovative approach that combines the strengths of both retrieval-based and generative models. It allows a system to access and retrieve relevant information from a vast database while also generating coherent and contextually appropriate responses. This methodology enhances the quality of responses by grounding them in real-world data, making them more accurate and informative.

  2. LangChain: LangChain is a framework designed to facilitate the development of applications powered by language models. It provides tools and components for managing the workflows involved in natural language processing tasks, making it easier to build and scale applications that require understanding and generating human language.

  3. OpenAI: OpenAI is a leading artificial intelligence research organization known for developing advanced AI models, including the GPT series. These models are capable of generating human-like text and understanding complex language structures, enabling a wide range of applications from chatbots to content creation and beyond.

  4. DeepLake: DeepLake is a data management platform optimized for AI workflows. It enables efficient storage, retrieval, and processing of large datasets, particularly for deep learning applications. With its focus on performance and scalability, DeepLake helps streamline the data handling process crucial for training and deploying AI models.

  5. FastAPI: FastAPI is a modern web framework designed for building APIs quickly and efficiently, leveraging Python’s asynchronous programming capabilities. Its intuitive interface and robust features make it ideal for developing high-performance applications that can handle a large number of concurrent users seamlessly. FastAPI not only simplifies the process of creating endpoints but also provides automatic validation and interactive documentation, enhancing the development experience and ensuring that applications are both reliable and easy to use.

Let’s dive deeper into the specifics of this ChatBot.

The flow will look like as shown Below

I utilized Visual Studio Code to develop this project. To ensure that the Python libraries I install do not interfere with the base Python installation on my system, I created a virtual environment.

To begin our project, let's set up a folder that will contain all the necessary files. Inside this folder, create a blank file named app.py, which will serve as the main application file where we'll write our code. Additionally, create another file called requirements.txt. This file will be used to list all the dependencies and packages that our application will need as we develop it further. We will incrementally add entries to both files throughout this blog series, so they start off empty for now.

Once created the project in Visual Studio Code, access the integrated terminal. In the terminal, executed the command

python -m venv .venv

This command sets up a new virtual environment within a folder named .venv in my project directory. By doing this, I can manage my project dependencies more effectively and keep them isolated from other projects or the global Python installation.

As you work on your project, you will observe that a folder named .venv has been created within the project directory. This folder serves as a virtual environment, isolating your project’s dependencies and settings from other projects you may have on your system.

To activate the virtual environment, you need to run the following command in your terminal:

source .venv/bin/activate

This command instructs your shell to execute the activate script located in the bin directory of the .venv virtual environment. Once you run this command, your terminal session will switch to using the isolated Python environment created, allowing you to manage dependencies and packages separately from your global Python installation.

Next, we will install all the necessary libraries required for this project by utilizing the requirements.txt file. This file contains a list of all the dependencies along with their specific versions, ensuring that we have a consistent environment for our project. To install these libraries, we'll use a package manager like pip, which will read the requirements.txt file and automatically download and install each listed library, allowing us to set up our project efficiently.

pip install -r requirements.txt

requirements.txt file will look like this

fastapi[standard]==0.113.0
langchain
langchain-community
beautifulsoup4
langchain-openai
langchain-deeplake
tiktoken
pydantic
unstructured
selenium

FastApi

After completing the initial setup, the next step is to create the foundational structure for your FastAPI application. This involves establishing the basic framework that will support your API's functionality, including defining the directory structure, setting up configuration files, and initializing the main application instance. This skeleton will serve as the groundwork for implementing endpoints, integrating middleware, and managing dependencies as you develop your project further.

Add following code in your blank app.py class

from fastapi import FastAPI

app = FastAPI()


@app.get("/chat/")
async def root(query:str):
    return {query}

Run it

fastapi dev main.py

 ╭────────── FastAPI CLI - Development mode ───────────╮
 │                                                     │
 │  Serving at: http://127.0.0.1:8000                  │
 │                                                     │
 │  API docs: http://127.0.0.1:8000/docs               │
 │                                                     │
 │  Running in development mode, for production use:   │
 │                                                     │
 │  fastapi run                                        │
 │                                                     │
 ╰─────────────────────────────────────────────────────╯

INFO:     Will watch for changes in these directories: ['/home/user/code/awesomeapp']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [2248755] using WatchFiles
INFO:     Started server process [2248757]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Code can be tested through url

http://127.0.0.1:8000/chat/?query=”what is latest in AI”

Now we have python based REST API running, which we can now extend to act as ChatBot.

Lets add imports in our app.py file for all libraries which we will need for the project.

from fastapi import FastAPI
import os
#from langchain.embeddings.openai import OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain_deeplake.vectorstores import DeeplakeVectorStore
from langchain.text_splitter import CharacterTextSplitter

from langchain_community.document_loaders import SeleniumURLLoader
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

OpenAI

OpenAI is American artificial intelligence (AI) research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership. OpenAI conducts AI research with the declared intention of promoting and developing a friendly AI. OpenAI systems run on an Azure-based supercomputing platform from Microsoft.

The OpenAI API is powered by a diverse set of models with different capabilities and price points.

In order to utilize OpenAI's features, we need to obtain an API key from OpenAI. Once we have the key, make sure to set it as an environment variable named OPENAI_API_KEY. This step is essential for allowing your application to securely authenticate and connect to OpenAI's services.

For simplicity I am just keeping the key in app.py file but for production deployments it needs to be in some config file as the key needs to be secured and is sensitive.


os.environ["OPENAI_API_KEY"] = 'PUT KEY HERE'

LangChain

LangChain is a framework for developing applications powered by large language models (LLMs).

The LangChain ecosystem is split into different packages, which allow you to choose exactly which pieces of functionality to install.

Ecosystem packages

All packages in the LangChain ecosystem depend on langchain-core, which contains base classes and abstractions that other packages use. The dependency graph below shows how the different packages are related. A directed arrow indicates that the source package depends on the target package.

For our project we will need following packages which were already installed through requirements.txt file.

langchain
langchain-community
langchain-openai
langchain-deeplake

WebBaseLoader

We will use the article at url https://techcrunch.com/2025/05/20/google-rolls-out-project-mariner-its-web-browsing-ai-agent. as our KnowledgeBase.

LangChain WebBaseLoader class is used to load all text from HTML webpages into a document format that we can use downstream.

To bypass SSL verification errors during fetching, you can set the "verify" option:

loader.requests_kwargs = {'verify':False}

loader_multiple_pages = WebBaseLoader("https://techcrunch.com/2025/05/20/google-rolls-out-project-mariner-its-web-browsing-ai-agent/")
loader_multiple_pages.requests_kwargs = {'verify':False}
docs_not_splitted = loader_multiple_pages.load()

The CharacterTextSplitter is a component of the Langchain library that specializes in breaking down text into smaller segments based on character count. This functionality is particularly useful for processing large bodies of text, as it allows for more manageable chunks that can be easily handled in various applications. By defining a specific character limit, the CharacterTextSplitter ensures that text is divided in a consistent manner, making it easier to analyze, manipulate, or feed into other systems or models.

text_splitter=CharacterTextSplitter(chunk_size=1000,chunk_overlap=0)
docs=text_splitter.split_documents(docs_not_splitted)

OpenAIEmbeddings offers a feature designed specifically for converting text into numerical embeddings, which are useful for various machine learning tasks. These embeddings capture the semantic meaning of the text, allowing for efficient comparison, clustering, and retrieval of information. By transforming words and sentences into a format that can be easily processed by algorithms.

embeddings=OpenAIEmbeddings(model='text-embedding-ada-002')

DeepLake Vector Database

A vector store stores embedded data and performs similarity search. The DeepLake vector database serves as a robust solution for storing and managing embeddings. This specialized database is designed to efficiently handle high-dimensional data, enabling users to retrieve, search, and analyze embeddings effectively. With features that support scalability and performance optimization, DeepLake facilitates the organization of complex datasets, making it an ideal choice for applications in machine learning, natural language processing, and other data-intensive fields. By leveraging DeepLake, users can ensure that their embeddings are securely stored while maintaining easy access for various analytical tasks.

Create a local dataset

To begin, we set up a local dataset by creating a directory at the path ./my_deeplake/. Once our directory is established, we populate it with the data that we want to use for our project. After creating and filling our dataset, we can initiate a similarity search. It’s important to note that when using the Deeplake+LangChain integration, it operates with Deep Lake datasets in the background.


    db = DeeplakeVectorStore(dataset_path="./my_deeplake/", embedding_function=embeddings, overwrite=True)
    ids = db.add_documents(docs)

Prompt templates

Prompt templates serve as a vital tool in transforming unrefined user information into a structured format that can be effectively utilized by a Large Language Model (LLM). In this particular scenario, the initial data provided by the user consists of a simple message. This raw input, while informative, lacks the organization and clarity needed for optimal processing. The purpose of the prompt templates is to refine and format this input, enabling the LLM to interpret, understand, and generate relevant responses or outcomes based on the user’s request. By applying these templates, we enhance the communication between the user and the model, ensuring that the LLM can work with well-defined prompts that capture the user's intent more accurately.

 # create Prompt template

    template = """You are an exceptional customer support chatbot that gently answer questions.

                you know the following context information.
                {chunks_formatted}
                Answer the following question from a customer from context if not found then answer from LLM.

                Question:{query}

                Answer:"""


    prompt = PromptTemplate(
          input_variables=["chunks_formatted","query"],
          template=template
      )

RAG

The prompt template is designed to enhance the interaction with a language model by seamlessly incorporating relevant information retrieved from a Vector database, which is integral to the Retrieval-Augmented Generation (RAG) framework. This process involves sourcing relevant content based on a user's initial input and then merging that content with the user's query. The result is a well-structured and contextually rich query that is fed into the language model, enabling it to generate more accurate and meaningful responses. This approach not only improves the quality of the output but also ensures that the generated content is directly aligned with the user's intent and needs.

The template described above adds {chunks_formatted} to the code, which enhances the user’s query by integrating relevant information from a knowledge base. This process is designed to optimize the retrieval of accurate and pertinent information, ensuring that the responses provided align closely with the user’s inquiry. By augmenting the query in this manner, we increase the chances of obtaining more precise and contextually appropriate answers.

Query dataset

 docs=db.similarity_search(query)
 retrieved_chunks=[doc.page_content for doc in docs]

OpenAI

At this stage, we are ready to create an instance of our model object, which we will refer to as OpenAI. Once the model is instantiated, we can proceed to generate chat completions. This involves utilizing the capabilities of the model to produce coherent and contextually relevant responses based on the input it receives.

  model = ChatOpenAI(model="gpt-3.5-turbo",temperature=0)
    #answer=model(prompt_formatted)
  messages=model.invoke(prompt_formatted)

Send the response back to the caller through API return.

return {messages.content}

Test the App

Run the server with

fastapi dev app.py

Call the Url in browser

http://127.0.0.1:8000/chat/?query=”what is latest in AI”

We see output being augmented from website which we used as our KnowledgeBase

["The latest in AI is Google's Project Mariner, an experimental AI agent that browses 
and uses websites. It was announced during Google I/O 2025 and has been updated to take on
 nearly a dozen tasks at a time. U.S. subscribers to Google's AI Ultra plan will have access
 to Project Mariner, with support for more countries coming soon. Additionally, Project Mariner's capabilities will be available through the Gemini API and Vertex AI for developers to build applications powered by the agent."]

Deactivate the Virtual Environment

Once we are done working on our project we can deactivate the virtual environment. This way, when you run python it won't try to run it from that virtual environment with the packages installed there.

deactivate

More from this blog

Path To Machine Learning

37 posts