Prahlad Panthi

Building a Local Knowledge Base Q&A Assistant with LangChain and Ollama Mistral

Sometimes you need an AI assistant that can answer questions based on your own documents, not the public internet — and you want it to run entirely locally for privacy and control. In this project, I built a Q&A chatbot that uses:

a. LangChain for chaining LLMs with document retrieval
b. Ollama running the Mistral model locally
c. Vector database (Chroma) for semantic search
d. Streamlit for a simple chat UI

Why This Approach?

Cloud-based LLMs are powerful, but local setups have several advantages:

a. Privacy-first — No data leaves your machine
b. Custom knowledge base — You control exactly what the AI can reference
c. Offline capability — Works without an internet connection
d. Lower cost — No API subscription fees

Project Architecture

The project has four main components:

Document ingestion — Load and chunk your documents for storage
Vector database — Store document embeddings for semantic search
LangChain RetrievalQA — Connect the LLM to your vector database
Chat interface — A Streamlit UI for asking questions

Step 1: Ingesting Documents

We load documents from a knowledge/ folder and split them into small chunks so the model can handle them efficiently.

from langchain_unstructured import UnstructuredLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = UnstructuredLoader("knowledge/my_document.pdf")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)

Step 2: Embedding and Vector Storage

We use HuggingFace embeddings and Chroma to store and retrieve our document chunks.

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma(
    persist_directory="vectorstore",
    embedding_function=embedding_model,
    collection_name="hr_policies"
)
vectorstore.add_documents(chunks)

Step 3: Connecting Ollama Mistral via LangChain

The retrieval chain combines a retriever with the Ollama Mistral LLM to answer user questions.

from langchain.chains import RetrievalQA
from langchain_ollama import OllamaLLM

llm = OllamaLLM(model="mistral")
retriever = vectorstore.as_retriever()

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever
)

Step 4: Streamlit Chat UI

Finally, a lightweight Streamlit app lets users ask questions and see the AI’s responses.

import streamlit as st
from chat import get_qa_chain

st.title("Knowledge Base Q&A Assistant")

query = st.text_input("Ask a question")
if query:
    qa_chain = get_qa_chain()
    response = qa_chain.invoke({"query": query})
    st.write("### Assistant:")
    st.write(response["result"])

Example use cases:

HR policy assistants for internal company docs
Technical documentation Q&A bots
Customer support automation
Research assistants for specific fields

Happy Coding 💻