DevOps & Deployment

Docker, Monitoring, and Scaling

From Development to Production: Build Once, Deploy Everywhere

You've built the pieces. A robust backend with RAD-System patterns, a beautiful Angular frontend, orchestrated workflows, and flexible LLM provider switching. Now comes the infrastructure: how do you reliably deploy, scale, and monitor this system in production?

This article is the bridge from architecture to operations. Docker, Docker Compose, environment parity, health checks, monitoring, and the scaling strategies that let you handle 10 users or 10,000.

The Deployment Philosophy: Artifact-Based Distribution

The RAG System follows a strict principle: build once, deploy everywhere.


┌─────────────────────────────────────┐
│   Development Environment           │
│   npm run build                     │
│   npm test                          │
│   Docker build                      │
└──────────────┬──────────────────────┘
               │
               ▼
      ┌────────────────┐
      │   Docker Image │
      │ (immutable)    │
      └────────┬───────┘
               │
      ┌────────┴──────────────────┬─────────────┐
      ▼                           ▼             ▼
┌──────────┐              ┌──────────┐   ┌──────────┐
│   Dev    │              │  Staging │   │   Prod   │
│   env    │              │   env    │   │   env    │
│ (custom) │              │ (mirrored)   │ (locked) │
└──────────┘              └──────────┘   └──────────┘

Same artifact everywhere. Only environment variables and secrets differ.

Docker Compose Architecture

The system's Docker Compose orchestrates all services:


# Core services
services:
  # Qdrant: Vector database
  qdrant:
    image: qdrant/qdrant:v1.16
    ports: ["6333:6333", "6334:6334"]
    volumes: ["./Volumes/qdrant:/qdrant/storage"]
    networks: ["rag_network"]

  # PostgreSQL: Relational data
  postgresdb:
    image: postgres:16-alpine
    ports: ["5432:5432"]
    volumes: ["./Volumes/postgres:/var/lib/postgresql/data"]
    networks: ["rag_network"]

  # Redis: Caching & sessions
  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]
    volumes: ["./Volumes/redis:/data"]
    networks: ["rag_network"]

  # n8n: Workflow orchestration
  n8n:
    build: "./Images/n8n"
    ports: ["5678:5678"]
    volumes: ["./Volumes/n8n:/home/node/.n8n"]
    networks: ["rag_network"]

  # FastAPI: NLP services
  fastapi:
    build: "./Images/fastapi"
    ports: ["8000:8000"]
    environment:
      - EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
      - MAX_WORKERS=4
    networks: ["rag_network"]

  # NestJS Backend
  backend:
    build: "./Images/backend"
    ports: ["3000:3000"]
    environment:
      - DATABASE_URL=postgres://user:pass@postgresdb:5432/rag
      - REDIS_URL=redis://:password@redis:6379
      - N8N_BASE_URL=http://n8n:5678
    networks: ["rag_network"]
    depends_on:
      - postgresdb
      - redis
      - n8n

  # Angular Frontend (dev server)
  frontend:
    build: "./Images/frontend"
    ports: ["4200:4200"]
    networks: ["rag_network"]

networks:
  rag_network:
    driver: bridge

Key principles:

No external databases: Everything self-contained
Bind mounts: Data persists across restarts via filesystem volumes
Network isolation: Services communicate via Docker network
Environment variables: Configuration external to images

Build Process: Build Scripts

The DevOps/builder/ directory contains production-grade build scripts:


# Build backend
./DevOps/builder/build-be.sh dist

# Build frontend
./DevOps/builder/build-fe.sh prod

# Build Docker images
docker-compose build

# Or build specific services
docker-compose build backend fastapi

Backend Build Pipeline


#!/bin/bash
# 1. Compile TypeScript
npm run build

# 2. Prune dev dependencies
npm ci --only=production

# 3. Copy to Docker volume
cp -r dist node_modules ./Volumes/backend/app/

# 4. Create versioned tarball
tar czf ./update/backend-v1.0.0.tar.gz -C ./Volumes/backend .

echo "✅ Backend built: backend-v1.0.0.tar.gz"

Why tarballs?

Versioning: Each release is an immutable artifact
Rollback: Quick revert to previous version
Offline deployment: No need for npm registry during deployment

Frontend Build Pipeline


#!/bin/bash
# 1. Install dependencies
npm ci

# 2. Build production bundle (optimized)
ng build --configuration production --output-hashing all

# 3. Copy to Docker volume
cp -r dist/rag-fe ./Volumes/frontend/html/

# 4. Compress
tar czf ./update/frontend-v1.0.0.tar.gz -C ./Volumes/frontend .

echo "✅ Frontend built: frontend-v1.0.0.tar.gz"

Environment Configuration

Each environment (dev, staging, prod) has its own .env file:

.env.dev:


NODE_ENV=development
DEBUG=true
DATABASE_URL=postgres://dev:dev@localhost:5432/rag_dev
REDIS_URL=redis://localhost:6379
JWT_SECRET=dev-secret-no-security
LOG_LEVEL=debug

.env.prod:


NODE_ENV=production
DEBUG=false
DATABASE_URL=postgres://prod_user:${SECURE_DB_PASS}@db-prod.example.com:5432/rag
REDIS_URL=redis://:${REDIS_PASSWORD}@redis-prod:6379
JWT_SECRET=${PROD_JWT_SECRET}  # From vault
LOG_LEVEL=warn
RATE_LIMIT=1000  # Requests per minute

Loading order (first found wins):

Command-line arguments
Environment variables
.env file
RagConfigService (database)
Defaults in code

This allows:

Local overrides for debugging
Secrets management via environment
Database-stored dynamic config
Safe fallbacks

Volume Management: Data Persistence

Bind mounts map filesystem directories directly into containers:


services:
  qdrant:
    volumes:
      - ./Volumes/qdrant:/qdrant/storage  # Host path → Container path
  
  postgres:
    volumes:
      - ./Volumes/postgres:/var/lib/postgresql/data

  redis:
    volumes:
      - ./Volumes/redis:/data

Why bind mounts? Direct filesystem access simplifies backups, makes volumes portable, and keeps data structured and visible.

Backup Strategy:


#!/bin/bash
# Backup all volumes
BACKUP_DIR="./backups/$(date +%Y%m%d-%H%M%S)"

echo "Backing up volumes to $BACKUP_DIR..."

# Qdrant
docker exec rag_qdrant \
  qdrant-backup create backup_$(date +%s)

# PostgreSQL
docker exec rag_postgres pg_dump \
  -U postgres rag > "$BACKUP_DIR/postgres.sql"

# Redis
docker exec rag_redis redis-cli BGSAVE
docker cp rag_redis:/data/dump.rdb "$BACKUP_DIR/redis.rdb"

# Tar everything
tar czf "./backups/backup-$(date +%Y%m%d).tar.gz" "$BACKUP_DIR"

echo "✅ Backup complete: $BACKUP_DIR"

Health Checks and Auto-Healing

Every container declares health status:


services:
  backend:
    image: rag-backend:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  fastapi:
    image: rag-fastapi:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 5s
      retries: 2

Health endpoints (declared in each service):


@Get('/health')
@Public()  // No auth required
async getHealth(): Promise<{ status: string; checks: any }> {
  const checks = {
    database: await this.checkDatabase(),
    redis: await this.checkRedis(),
    n8n: await this.checkN8n(),
    uptime: process.uptime()
  };

  const allHealthy = Object.values(checks)
    .every(check => check.status === 'ok');

  return {
    status: allHealthy ? 'healthy' : 'degraded',
    checks
  };
}

Docker automatically removes and restarts containers with failed health checks.

Networking and Service Discovery

Services communicate by name:


// Backend connecting to other services
const POSTGRES_URL = process.env.DATABASE_URL; // postgres://postgresdb:5432/rag
const REDIS_URL = process.env.REDIS_URL;       // redis://redis:6379
const N8N_URL = process.env.N8N_BASE_URL;      // http://n8n:5678
const FASTAPI_URL = process.env.FASTAPI_URL;   // http://fastapi:8000

Docker's internal DNS resolves these names to container IPs automatically.

Port mapping:


Local Network → Docker Bridge → Container Port
localhost:3000 → 172.17.0.2:3000 → backend:3000
localhost:5432 → 172.17.0.3:5432 → postgres:5432

Scaling Strategies

Horizontal Scaling: Multiple Workers

For compute-heavy FastAPI service:


services:
  fastapi:
    image: rag-fastapi:latest
    ports:
      - "8001:8000"
      - "8002:8000"
      - "8003:8000"
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: "0.5"
          memory: 1G

Load distribute incoming requests:


Client Request
    ↓
nginx (reverse proxy)
    ├→ FastAPI Worker 1 (port 8001)
    ├→ FastAPI Worker 2 (port 8002)
    └→ FastAPI Worker 3 (port 8003)

nginx configuration:


upstream fastapi {
    least_conn;  # Load balancing strategy
    server fastapi:8000 max_fails=3 fail_timeout=30s;
    server fastapi:8001 max_fails=3 fail_timeout=30s;
    server fastapi:8002 max_fails=3 fail_timeout=30s;
}

server {
    listen 8000;
    
    location / {
        proxy_pass http://fastapi;
        proxy_set_header Host $host;
        proxy_connect_timeout 30s;
        proxy_read_timeout 60s;
    }
}

Vertical Scaling: Resource Limits

Docker resource constraints:


services:
  backend:
    deploy:
      resources:
        limits:
          cpus: "1"      # Max 1 CPU core
          memory: 512M   # Max 512MB RAM
        reservations:
          cpus: "0.5"    # Guaranteed 0.5 core
          memory: 256M   # Guaranteed 256MB RAM

Monitor resource usage:


# Check container resource consumption
docker stats

# Output:
# CONTAINER      CPU %    MEM USAGE / LIMIT      MEM %    NET I/O
# rag_backend    2.5%     195MiB / 512MiB        38%      3.2MB / 1.5MB
# rag_fastapi    15%      420MiB / 1GiB          42%      8.6MB / 4.2MB
# rag_postgres   8.2%     285MiB / 2GiB          14%      12MB / 8MB

Database Scaling

PostgreSQL replication (for reads):


services:
  postgres-primary:
    image: postgres:16
    environment:
      - POSTGRES_REPLICATION_MODE=master

  postgres-replica:
    image: postgres:16
    environment:
      - POSTGRES_REPLICATION_MODE=slave
      - POSTGRES_MASTER_SERVICE=postgres-primary

Connection pooling (reduce connection overhead):


services:
  pgbouncer:
    image: pgbouncer
    environment:
      - DATABASES_HOST=postgres-primary
      - POOL_MODE=transaction
      - DEFAULT_POOL_SIZE=25
      - MAX_CLIENT_CONN=1000

Qdrant Clustering

For production scale, use Qdrant cluster:


services:
  qdrant-node1:
    image: qdrant/qdrant:latest
    environment:
      - QDRANT_URI_PATH=/qdrant
      - QDRANT_CLUSTER_ENABLED=true
      - QDRANT__CLUSTER__CONSENSUS__TICK_PERIOD_MS=100

  qdrant-node2:
    image: qdrant/qdrant:latest
    environment:
      - QDRANT_URI_PATH=/qdrant
      - QDRANT_CLUSTER_ENABLED=true
      - QDRANT_BOOTSTRAP__CLUSTER__INITIAL_PEER_TIMEOUT_MS=30000

Monitoring and Logging

Application Health Dashboard

Real-time metrics expose at /metrics (Prometheus format):


@Get('/metrics')
@Public()
async getMetrics(): Promise<string> {
  const metrics = {
    'rag_query_duration_seconds': this.queryDurations,
    'rag_embedding_errors_total': this.embeddingErrors,
    'rag_database_connection_pool_size': this.dbPool.size,
    'rag_n8n_workflow_executions_total': this.n8nExecutions,
    'rag_llm_provider_availability': this.providerHealth
  };

  return formatPrometheus(metrics);
}

Prometheus scrapes /metrics every 30 seconds:


global:
  scrape_interval: 30s

scrape_configs:
  - job_name: 'rag-backend'
    static_configs:
      - targets: ['localhost:3000']

  - job_name: 'rag-fastapi'
    static_configs:
      - targets: ['localhost:8000']

Logging Aggregation

All services log to stdout (Docker best practice):


logger.info('Query executed', {
  userId: user.id,
  duration: 1234,
  tokens: { input: 50, output: 100 }
});

Docker captures logs:


# View backend logs
docker logs rag_backend --follow --tail 100

# View all service logs
docker-compose logs --follow

Centralized logging (ELK stack):


services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0

  logstash:
    image: docker.elastic.co/logstash/logstash:8.0.0
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf

  kibana:
    image: docker.elastic.co/kibana/kibana:8.0.0
    ports:
      - "5601:5601"

Logstash pipes Docker logs to Elasticsearch:


Docker Container → stdout → Logstash → Elasticsearch ← Kibana Dashboard

Alerting Rules

Alert when services degrade:


# prometheus-alerts.yml
groups:
  - name: rag-system
    rules:
      - alert: HighQueryLatency
        expr: histogram_quantile(0.95, rag_query_duration_seconds) > 5
        for: 5m
        annotations:
          summary: "95th percentile query latency > 5s"

      - alert: EmbeddingServiceDown
        expr: up{job="rag-fastapi"} == 0
        for: 1m
        annotations:
          summary: "FastAPI embedding service is down"

      - alert: DatabaseConnectionPoolExhausted
        expr: rag_database_connection_pool_usage > 0.9
        for: 2m
        annotations:
          summary: "Database connection pool >90% full"

Deployment Workflow

Development → Staging → Production


# 1. Tag release
git tag v1.2.0

# 2. Build artifacts
./DevOps/builder/build-be.sh dist
./DevOps/builder/build-fe.sh prod
docker-compose build

# 3. Push to registry
docker tag rag-backend:latest myregistry.com/rag-backend:v1.2.0
docker push myregistry.com/rag-backend:v1.2.0

# 4. Deploy to staging
ssh staging-server
cd /apps/rag-system
docker-compose pull
docker-compose up -d

# 5. Run tests
./scripts/smoke-tests.sh staging

# 6. Deploy to production
ssh prod-server
cd /apps/rag-system
docker-compose pull
docker-compose up -d

# 7. Verify health
curl prod.example.com/health

Blue-Green Deployment

Keep two production versions running:


# Current: Blue (v1.2.0)
# New: Green (v1.3.0)

# Deploy to green
docker-compose -f docker-compose-green.yml pull
docker-compose -f docker-compose-green.yml up -d

# Test green endpoints
./scripts/smoke-tests.sh green

# Switch traffic
nginx.conf: upstream producer -> green
systemctl reload nginx

# If issues, switch back
upstream producer -> blue

# Remove old blue
docker-compose -f docker-compose-blue.yml down

Production Deployment Checklist


Security
  ☐ All secrets in environment/vault (not in .env)
  ☐ HTTPS/TLS enabled
  ☐ JWT secrets strong (random, unique)
  ☐ Database password strong
  ☐ Redis password set
  ☐ n8n basic auth enabled
  ☐ Firewall restricts ports

Infrastructure
  ☐ Resource limits set (CPU, memory)
  ☐ Health checks configured
  ☐ Storage backup automated (daily)
  ☐ Load balancer in front
  ☐ DNS resolution working
  ☐ Monitoring and alerts active

Operations
  ☐ Logging aggregation working
  ☐ Error tracking (Sentry) configured
  ☐ On-call rotation established
  ☐ Incident response plan documented
  ☐ Runbook for common issues
  ☐ Change log maintained

Performance
  ☐ Database indexes created
  ☐ Redis cache configured
  ☐ Query N+1 problems resolved
  ☐ API rate limiting active
  ☐ CDN configured (for frontend)

Compliance
  ☐ Data retention policy enforced
  ☐ Audit logs enabled
  ☐ GDPR compliance verified
  ☐ Regular backups tested

The Complete Picture

From this article:

Artifact-based distribution: Build once, deploy everywhere
Docker Compose: Self-contained, reproducible environments
Health checks: Automatic healing when services fail
Scaling strategies: Both horizontal and vertical
Monitoring & logging: Real-time visibility
Backup & disaster recovery: Data protection
Blue-green deployment: Zero-downtime updates

Looking Ahead

This concludes the RAG System blog series. You've journeyed through:

Architecture & Privacy - The "why?" behind design decisions
n8n Workflows - Orchestrating complex operations
Vector Embeddings - Semantic search at scale
FastAPI NLP - Python's NLP powerhouse
NestJS Backend - Enterprise-grade APIs with RAD-System
Angular Frontend - Beautiful, responsive chat interface
Provider Switching - LLM flexibility without lock-in
DevOps & Scaling - Taking systems to production

---

Key Takeaways:

✅ Artifact-based builds = Reproducible, reliable deployments

✅ Docker Compose = Simplified infrastructure

✅ Health checks = Self-healing systems

✅ Resource limits = Stable, predictable performance

✅ Monitoring = Visibility into system health

✅ Scaling strategies = Room to grow

You now have a production-ready RAG system that's:

Scalable: Add workers, replicas, or resources as needed
Resilient: Automatic recovery from failures
Flexible: Swap components without redeployment
Observable: Full visibility into operations
Maintainable: Clear separation of concerns
Cost-effective: Run locally or cloud, free or expensive models

The system is yours to deploy, modify, and scale. Build with it. Learn from it. And most importantly: ship it.

---

GitHub:

RAD System (open-source): Github source code
RAG System (source not published): Github RAG System Overview

---

Thank you for reading this series. Whether you're deploying your first RAG system or optimizing the tenth, I hope this architecture provides a solid foundation. The future of AI isn't just about models—it's about systems that are understandable, maintainable, and yours.

Build well. 🚀