Lekshmi Kolappan

Your own private AI assistant with retrieval memory, running entirely on local storage with complete data protection

What makes this special?

Complete Privacy: Your AI assistant runs entirely on your local infrastructure - no data leaves your network
Persistent Memory: Chroma vector database remembers everything you teach it, creating a truly personalized AI
Local LLM Power: Ollama runs powerful language models locally, no internet required for AI responses
Your Data, Your Control: All documents, conversations, and embeddings stored securely on your own storage
Zero Subscription Costs: Open-source stack with no ongoing fees or usage limits
Self-Hosted Excellence: Full control over your AI assistant’s capabilities and data retention

How does it work?

Smart Document Processing: Upload your documents and Chroma automatically creates searchable knowledge
Intelligent Memory: Your AI remembers past conversations and can reference your uploaded documents
Local Processing: Everything runs on your hardware - documents, conversations, and AI responses stay private
Easy Setup: One-command deployment with Docker Compose for hassle-free installation
Model Flexibility: Choose from various open-source LLM models that run entirely offline
Real-time Monitoring: Track your AI’s performance and resource usage with built-in dashboards

What you need

Docker & Docker Compose installed
8GB+ RAM (for running AI models locally)
Basic familiarity with Docker commands
Your own documents to create a personalized AI knowledge base

Source Code

https://github.com/Lforlinux/Opensource-LLM-RAG-Stack

How to deploy the infrastructure

Quick Start Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


# Clone the repository
git clone https://github.com/Lforlinux/Opensource-LLM-RAG-Stack.git
cd Opensource-LLM-RAG-Stack

# Quick start (includes model setup)
./start.sh

# Or manual setup:
# Start all services (includes Ollama)
docker-compose up -d

# Set up Ollama with a model
./scripts/setup-ollama.sh

Docker Compose Architecture

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    ports: ["11434:11434"]
    volumes:
      - ollama-data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_ORIGINS=*

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports: ["3000:8080"]
    environment:
      - OLLAMA_API_BASE_URL=http://ollama:11434
      - VECTOR_DB=chroma
      - DATABASE_URL=postgresql://user:password@postgres:5432/chatdb

  chroma:
    image: ghcr.io/chroma-core/chroma:latest
    ports: ["8000:8000"]
    environment:
      - CHROMA_DB_IMPL=duckdb+parquet

  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: chatdb

Architecture

RAG LLM Stack Architecture

RAG Stack Components

Open WebUI: User interface for chat interactions and document management
Ollama: Containerized LLM inference engine with model management
Chroma: Vector database for semantic search and embeddings storage
PostgreSQL: Relational database for chat history and document metadata
Prometheus: Metrics collection and monitoring
Grafana: Visualization and dashboard management

Key Features

RAG Implementation

Document Processing: Automatic chunking and embedding generation
Semantic Search: Vector similarity search for relevant context retrieval
Context Augmentation: Dynamic prompt enhancement with retrieved information
Chat History: Persistent conversation management with PostgreSQL
Model Management: Easy model switching and versioning with Docker volumes

Monitoring & Observability

Service Health: Real-time monitoring of all stack components
Performance Metrics: Request rates, response times, and resource usage
Database Monitoring: PostgreSQL performance and query optimization
Vector DB Metrics: Chroma collection health and search performance
Grafana Dashboards: Pre-configured dashboards for comprehensive monitoring

Database Schema & Architecture

PostgreSQL Schema

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


-- Chat Sessions Management
CREATE TABLE chat_sessions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id VARCHAR(255) NOT NULL,
    session_name VARCHAR(255),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Message Storage with Full-Text Search
CREATE TABLE chat_messages (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    session_id UUID REFERENCES chat_sessions(id),
    role VARCHAR(50) CHECK (role IN ('user', 'assistant', 'system')),
    content TEXT NOT NULL,
    token_count INTEGER DEFAULT 0
);

-- RAG Document Storage
CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    title VARCHAR(500),
    content TEXT NOT NULL,
    source VARCHAR(500),
    embedding_id VARCHAR(255), -- Chroma reference
    metadata JSONB DEFAULT '{}'::jsonb
);

-- Performance Indexes
CREATE INDEX idx_documents_content_gin ON documents 
USING gin(to_tsvector('english', content));

RAG Implementation Guide

Document Upload & Processing

Access Open WebUI: Navigate to http://localhost:3000
Upload Documents: Support for PDF, TXT, and other formats
Automatic Processing: System chunks documents and generates embeddings
Vector Storage: Embeddings stored in Chroma for semantic search

Query with RAG

User Query: Ask questions in Open WebUI interface
Context Retrieval: System retrieves relevant chunks from Chroma
Prompt Augmentation: Retrieved context enhances user prompts
LLM Generation: Ollama generates responses using augmented context

Monitoring & Observability

Prometheus Metrics

Service Health: up{job=~"prometheus|postgres_exporter"}
Database Performance: PostgreSQL exporter metrics
Request Rates: HTTP request monitoring
Resource Usage: Container and system metrics

Grafana Dashboards

Grafana Monitoring Dashboard

RAG Stack Overview: Service health and performance
Database Metrics: PostgreSQL performance monitoring
System Resources: CPU, memory, and disk usage
Request Analytics: API call patterns and response times

Production Deployment

Environment Configuration

1
2
3
4


# Production environment variables
export POSTGRES_PASSWORD=secure_password
export GRAFANA_ADMIN_PASSWORD=secure_admin_password
export OLLAMA_API_BASE_URL=https://your-ollama-instance.com

Scaling Considerations

Horizontal Scaling: Multiple Ollama instances behind load balancer
Database Scaling: PostgreSQL read replicas for query performance
Vector DB Scaling: Chroma clustering for high availability
Monitoring: Prometheus federation for multi-instance monitoring

Security Best Practices

Infrastructure Security

Network Isolation: Container network security and service isolation
Environment Configuration: Secure environment variable management
Data Encryption: Encryption at rest and in transit
Access Control: Proper authentication and authorization

Data Protection

Backup Strategy: Automated backup for all persistent data
Data Privacy: No sensitive data logging in production
Secure Communication: HTTPS/TLS for all service communications
Container Security: Regular image updates and vulnerability scanning

Troubleshooting

Common Issues

RAG Not Working - Document Upload Issues

1
2


# Check Chroma connection
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8000/api/v2/heartbeat

Database Connection Issues

1
2
3
4
5


# Check PostgreSQL status
docker-compose logs postgres

# Verify database initialization
docker exec -it postgres psql -U user -d chatdb -c "\dt"

Model Loading Problems

1
2
3
4
5


# Check Ollama service
docker-compose logs ollama

# Verify model availability
curl http://localhost:11434/api/tags

Future Enhancements

Planned Features

Multi-Model Support: Support for multiple LLM providers
Advanced RAG: Hybrid search with keyword and semantic matching
API Integration: RESTful API for external system integration
Multi-Tenant Support: Isolated environments for different users

Technical Improvements

High Availability: Multi-instance deployment with load balancing
Performance Optimization: Query optimization and caching strategies
Security Hardening: Enhanced authentication and authorization
Monitoring Enhancement: Advanced alerting and anomaly detection

Contributing

Development Setup

Fork the repository
Create feature branch: git checkout -b feature/your-feature
Make changes and test locally
Commit changes: git commit -m "Add your feature"
Push to branch: git push origin feature/your-feature
Create Pull Request

Code Standards

Docker: Container optimization and security best practices
Database: PostgreSQL performance and schema optimization
Monitoring: Prometheus metrics and Grafana dashboard standards
Documentation: Clear setup and troubleshooting guides

Conclusion

This OpenSource LLM RAG Stack project demonstrates enterprise-grade AI infrastructure practices, showcasing:

Production-Ready RAG System with comprehensive monitoring
Containerized Microservices architecture for scalability
Vector Database Integration for semantic search capabilities
Observability with Prometheus and Grafana monitoring
Enterprise DevOps practices with Infrastructure as Code

The project serves as both a functional RAG system and a comprehensive example of modern AI infrastructure, making it an excellent addition to any AI/ML engineer’s portfolio.

Source Code: https://github.com/Lforlinux/Opensource-LLM-RAG-Stack

RAG-Powered Local LLM Assistant

Built with

What makes this special?

How does it work?

What you need

Source Code

How to deploy the infrastructure

Quick Start Deployment

Docker Compose Architecture

Architecture

RAG Stack Components

Key Features

RAG Implementation

Monitoring & Observability

Database Schema & Architecture

PostgreSQL Schema

RAG Implementation Guide

Document Upload & Processing

Query with RAG

Monitoring & Observability

Prometheus Metrics

Grafana Dashboards

Production Deployment

Environment Configuration

Scaling Considerations

Security Best Practices

Infrastructure Security

Data Protection

Troubleshooting

Common Issues

Future Enhancements

Planned Features

Technical Improvements

Contributing

Development Setup

Code Standards

Conclusion

Comments