Building a Local Knowledge Base Q&A Assistant with LangChain and Ollama Mistral
Sometimes you need an AI assistant that can answer questions based on your own documents, not the public internet — and you want it to run entirely locally for privacy and control. In this project, I built a Q&A chatbot that uses:
a. LangChain for chaining LLMs with document retrieval
b. Ollama running the Mistral model locally
c. Vector database (Chroma) for semantic search
d. Streamlit for a simple chat UI
Why This Approach?
Cloud-based LLMs are powerful, but local setups have several advantages:
a. Privacy-first — No data leaves your machine
b. Custom knowledge base — You control exactly what the AI can reference
c. Offline capability — Works without an internet connection
d. Lower cost — No API subscription fees
Project Architecture
The project has four main components:
- Document ingestion — Load and chunk your documents for storage
- Vector database — Store document embeddings for semantic search
- LangChain RetrievalQA — Connect the LLM to your vector database
- Chat interface — A Streamlit UI for asking questions
Step 1: Ingesting Documents
We load documents from a knowledge/
folder and split them into small chunks so the model can handle them efficiently.
from langchain_unstructured import UnstructuredLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = UnstructuredLoader("knowledge/my_document.pdf")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)
Step 2: Embedding and Vector Storage
We use HuggingFace embeddings and Chroma to store and retrieve our document chunks.
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma(
persist_directory="vectorstore",
embedding_function=embedding_model,
collection_name="hr_policies"
)
vectorstore.add_documents(chunks)
Step 3: Connecting Ollama Mistral via LangChain
The retrieval chain combines a retriever with the Ollama Mistral LLM to answer user questions.
from langchain.chains import RetrievalQA
from langchain_ollama import OllamaLLM
llm = OllamaLLM(model="mistral")
retriever = vectorstore.as_retriever()
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever
)
Step 4: Streamlit Chat UI
Finally, a lightweight Streamlit app lets users ask questions and see the AI’s responses.
import streamlit as st
from chat import get_qa_chain
st.title("Knowledge Base Q&A Assistant")
query = st.text_input("Ask a question")
if query:
qa_chain = get_qa_chain()
response = qa_chain.invoke({"query": query})
st.write("### Assistant:")
st.write(response["result"])
Example use cases:
- HR policy assistants for internal company docs
- Technical documentation Q&A bots
- Customer support automation
- Research assistants for specific fields
Happy Coding 💻