AI Infrastructure Implementation Plan¶

Version: 1.0
Date: November 27, 2025
Branch: feature/ai-infrastructure
Duration: 5-8 weeks
Status: Planning Complete - Ready for Implementation

Table of Contents¶

Overview
Architecture Design
Component Adaptation Strategy
Implementation Timeline
Technical Requirements
API Specifications
Data Flow
Testing Strategy
Deployment Plan

Overview¶

Phase 3 adapts the proven Personal AI Companion architecture for HOMEPOT's device monitoring and analysis needs. This approach provides 80% code reuse, reducing development time from 6-12 months to 5-8 weeks while maintaining full data privacy through local LLM inference.

Goals¶

Implement AI-powered device analysis using local LLMs (Ollama)
Enable natural language queries about device status and history
Provide anomaly detection and predictive insights
Generate automated device health reports
Maintain 100% data privacy (no external API calls)

Success Metrics¶

Natural language query response time < 2 seconds
Device pattern matching accuracy > 85%
Anomaly detection precision > 80%
System operates fully offline (no internet dependency)
Memory usage < 8GB RAM for LLM inference

Architecture Design¶

High-Level Architecture¶

┌─────────────────────────────────────────────────────────────┐
│                      HOMEPOT Frontend                       │ 
│               (React + Dashboard Components)                │ 
└────────────────────┬────────────────────────────────────────┘
                     │ HTTPS/REST
┌────────────────────▼────────────────────────────────────────┐
│                  HOMEPOT Backend API                        │
│              (FastAPI + PostgreSQL + TimescaleDB)           │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐   │
│  │           AI Service Module (NEW)                    │   │
│  │  ┌────────────────────────────────────────────────┐  │   │
│  │  │  AI Router (/api/ai/*)                         │  │   │
│  │  │  - /query    (natural language queries)        │  │   │
│  │  │  - /analyze  (device analysis)                 │  │   │
│  │  │  - /predict  (anomaly prediction)              │  │   │
│  │  │  - /report   (health report generation)        │  │   │
│  │  │  - /status   (Ollama health check)             │  │   │
│  │  └────────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐  │   │
│  │  │  LLM Service (llm.py)                          │  │   │
│  │  │  - Ollama integration                          │  │   │
│  │  │  - Prompt engineering                          │  │   │
│  │  │  - Response generation                         │  │   │
│  │  └────────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐  │   │
│  │  │  Device Memory (device_memory.py)              │  │   │
│  │  │  - ChromaDB integration                        │  │   │
│  │  │  - Vector embeddings (SentenceTransformer)     │  │   │
│  │  │  - Semantic pattern matching                   │  │   │
│  │  └────────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐  │   │
│  │  │  Event Store (event_store.py)                  │  │   │
│  │  │  - Recent device events caching                │  │   │
│  │  │  - PostgreSQL query optimization               │  │   │
│  │  └────────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐  │   │
│  │  │  Anomaly Detection (anomaly_detection.py)      │  │   │
│  │  │  - Device health scoring                       │  │   │
│  │  │  - Pattern deviation detection                 │  │   │
│  │  │  - Alert severity classification               │  │   │
│  │  └────────────────────────────────────────────────┘  │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐  │   │
│  │  │  Analysis Modes (analysis_modes.py)            │  │   │
│  │  │  - Maintenance mode                            │  │   │
│  │  │  - Predictive analysis mode                    │  │   │
│  │  │  - Executive reporting mode                    │  │   │
│  │  └────────────────────────────────────────────────┘  │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                   PostgreSQL + TimescaleDB                  │
│          (Device metrics, events, audit logs)               │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                  Ollama (Local LLM Service)                 │
│                  http://localhost:11434                     │
│          Models: Llama 3.2 3B, Mistral 7B, Phi 3.5          │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│            ChromaDB (Vector Database - Local)               │
│              ./chroma_db (persistent storage)               │
│        Collections: device_patterns, incident_history       │
└─────────────────────────────────────────────────────────────┘

Integration Strategy¶

Monolithic Integration (Native Python) - AI service as a module within the existing HOMEPOT backend - Shared database connection pool - Unified authentication/authorization - Single Python process deployment - Pros: Simpler deployment, shared context, easier testing, no Docker complexity - Cons: Larger memory footprint, coupled release cycles

Decision: We'll implement the AI service as a native Python module within the existing HOMEPOT backend, avoiding Docker containerization to eliminate authentication complications and infrastructure dependencies.

Component Adaptation Strategy¶

1. Chat Memory → Event Store¶

Personal AI Companion (memory_store.py):

# Stores conversation history
memory = [
    {"user": "How are you?", "ai": "I'm doing well!"},
    {"user": "What did we discuss?", "ai": "We talked about..."}
]

HOMEPOT Adaptation (event_store.py):

# Stores recent device events for context
recent_events = [
    {"device_id": "sensor_1234", "event": "temperature_spike", 
     "value": 85.2, "timestamp": "2025-11-27T10:30:00"},
    {"device_id": "sensor_1234", "event": "normal_operation", 
     "value": 72.1, "timestamp": "2025-11-27T10:45:00"}
]

Implementation Tasks: - Create event_store.py with PostgreSQL integration - Implement caching layer (last 100 events per device) - Add event summarization for LLM context - Create cleanup job for old cached events

2. Vector Memory → Device Pattern Database¶

Personal AI Companion (vector_memory.py):

# Stores conversation summaries as embeddings
save_summary("User discussed career goals and work-life balance")
relevant_memories = get_relevant_memories("tell me about my goals")
# Returns semantically similar past conversations

HOMEPOT Adaptation (device_memory.py):

# Stores device incident patterns as embeddings
save_incident_pattern(
    "Sensor 1234 temperature spike to 85°C followed by cooling system activation"
)
similar_incidents = get_similar_patterns("sensor temperature anomaly")
# Returns past incidents with similar patterns

Implementation Tasks: - Create device_memory.py with ChromaDB integration - Define embedding model (SentenceTransformer 'all-mpnet-base-v2') - Implement incident pattern storage - Create similarity search API - Add pattern categorization (temperature, connectivity, power, etc.)

3. Sentiment Analysis → Anomaly Detection¶

Personal AI Companion (sentiment.py):

# Analyzes emotional tone
def analyse_sentiment(text: str) -> str:
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity
    if polarity > 0.2: return "positive"
    elif polarity < -0.2: return "negative"
    else: return "neutral"

HOMEPOT Adaptation (anomaly_detection.py):

# Analyzes device health
def analyze_device_health(metrics: dict) -> dict:
    health_score = calculate_health_score(metrics)
    if health_score > 0.8: return {"status": "healthy", "score": health_score}
    elif health_score < 0.4: return {"status": "critical", "score": health_score}
    else: return {"status": "warning", "score": health_score}

Implementation Tasks: - Create anomaly_detection.py - Implement health scoring algorithm (based on metrics deviation) - Add threshold configuration per device type - Create anomaly logging to PostgreSQL - Implement severity classification (info, warning, critical)

4. Personas → Analysis Modes¶

Personal AI Companion (persona.py):

personas = {
    "supportive": "You are a thoughtful, supportive companion...",
    "coach": "You are a motivating coach...",
    "therapist": "You are a calm therapist..."
}

HOMEPOT Adaptation (analysis_modes.py):

analysis_modes = {
    "maintenance": "You are a technical systems analyst...",
    "predictive": "You are a predictive maintenance expert...",
    "executive": "You are an executive reporting assistant..."
}

Mode Specifications:

Maintenance Mode (Default)
Focus: Technical details, troubleshooting steps
Audience: System administrators, technicians
Output: Detailed metrics, root cause analysis, fix recommendations
Predictive Mode
Focus: Trend analysis, failure prediction
Audience: Maintenance planners
Output: Risk scores, maintenance schedules, cost estimates
Executive Mode
Focus: High-level summaries, business impact
Audience: Management, decision-makers
Output: KPIs, uptime stats, cost analysis, strategic recommendations

Implementation Tasks: - Create analysis_modes.py with mode definitions - Implement mode switching API - Create prompt templates for each mode - Add mode persistence (per user preference)

5. Summarization → Device Status Reports¶

Personal AI Companion:

# Summarizes recent conversation
@app.get("/summarise")
def summarise_conversation():
    summary = generate_response(
        "Summarise this conversation: [chat history]"
    )
    return {"summary": summary}

HOMEPOT Adaptation:

# Generates device health report
@app.get("/api/ai/report/{device_id}")
async def generate_device_report(device_id: str, period: str = "24h"):
    events = get_device_events(device_id, period)
    metrics = get_device_metrics(device_id, period)

    report = generate_response(
        f"Generate a health report for device {device_id}. "
        f"Events: {events}. Metrics: {metrics}."
    )
    return {"report": report, "device_id": device_id}

Report Types: - Daily Health Report - 24h overview (automated, scheduled) - Incident Report - Triggered by critical events - Weekly Summary - 7-day trends and recommendations - Custom Analysis - User-requested time range

Implementation Tasks: - Create report generation endpoints - Implement scheduled report generation (Celery/APScheduler) - Add report storage to PostgreSQL - Create report email notifications - Build report dashboard UI component

6. Reflection → Predictive Insights¶

Personal AI Companion:

@app.get("/reflect")
def reflect_on_user():
    # Analyzes past conversations and sentiment trends
    reflection = generate_response(
        "Reflect on the user's recent activity and emotional tone"
    )
    return {"reflection": reflection}

HOMEPOT Adaptation:

@app.get("/api/ai/insights/{site_id}")
async def generate_site_insights(site_id: str):
    # Analyzes historical patterns and predicts issues
    devices = get_site_devices(site_id)
    patterns = get_historical_patterns(site_id, days=30)

    insights = generate_response(
        f"Analyze patterns for site {site_id} and predict potential issues. "
        f"Historical data: {patterns}"
    )
    return {"insights": insights, "site_id": site_id}

Implementation Tasks: - Create predictive insights endpoint - Implement pattern analysis (30-day rolling window) - Add trend detection algorithms - Create insight visualization components - Implement insight action tracking (was action taken? was it effective?)

Implementation Timeline¶

Sprint 1: Foundation (Week 1-2)¶

Objectives: - Set up Ollama and ChromaDB - Create AI service module structure - Implement basic LLM integration

Deliverables: - Ollama installed and running (localhost:11434) - ChromaDB persistent storage configured (./chroma_db) - backend/src/homepot/ai/ module created - llm.py - Basic Ollama integration - config.yaml - AI service configuration - /api/ai/status endpoint (health check) - Unit tests for LLM service

Technical Setup:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull models
ollama pull llama3.2:3b
ollama pull mistral:7b
ollama pull phi3.5:3.8b

# Install Python dependencies
pip install chromadb sentence-transformers textblob pyyaml

Sprint 2: Core Components (Week 3-4) - COMPLETED¶

Objectives: - Implement device memory and event storage - Create anomaly detection module - Build analysis modes

Deliverables: - event_store.py - Device event caching (Implemented) - device_memory.py - Vector database integration (Implemented) - anomaly_detection.py - Health scoring (Implemented) - analysis_modes.py - Mode definitions (Implemented) - Database integration for Event Store (Implemented - uses device_metrics) - Integration tests (Implemented)

Database Schema:

-- AI-related tables
CREATE TABLE ai_incident_patterns (
    id SERIAL PRIMARY KEY,
    device_id INTEGER REFERENCES devices(id),
    pattern_description TEXT NOT NULL,
    embedding VECTOR(768),  -- SentenceTransformer output dimension
    created_at TIMESTAMPTZ DEFAULT NOW(),
    severity VARCHAR(20)
);

CREATE TABLE ai_anomaly_logs (
    id SERIAL PRIMARY KEY,
    device_id INTEGER REFERENCES devices(id),
    health_score FLOAT NOT NULL,
    status VARCHAR(20) NOT NULL,  -- healthy, warning, critical
    metrics JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE ai_reports (
    id SERIAL PRIMARY KEY,
    report_type VARCHAR(50) NOT NULL,  -- daily, incident, weekly
    site_id INTEGER REFERENCES sites(id),
    device_id INTEGER REFERENCES devices(id),
    content TEXT NOT NULL,
    generated_at TIMESTAMPTZ DEFAULT NOW(),
    mode VARCHAR(20) NOT NULL  -- maintenance, predictive, executive
);

Sprint 3: API Endpoints (Week 5)¶

Objectives: - Build natural language query API - Implement device analysis endpoints - Create report generation

Deliverables: - /api/ai/query - Natural language queries - /api/ai/analyze/{device_id} - Device analysis - /api/ai/predict/{device_id} - Anomaly prediction - /api/ai/report/{device_id} - Report generation - /api/ai/insights/{site_id} - Site-wide insights - API documentation (OpenAPI/Swagger) - Integration tests with mocked LLM responses

Sprint 4: Testing & Optimization (Week 6-7)¶

Objectives: - End-to-end testing - Performance optimization - Memory usage optimization

Deliverables: - E2E test suite (pytest) - Performance benchmarks (query response time) - Memory profiling and optimization - LLM response caching - Error handling and retry logic - Load testing (concurrent queries)

Performance Targets: - Query response time < 2 seconds (95^th percentile) - Memory usage < 8GB (Ollama + ChromaDB) - Concurrent queries: 10+ simultaneous users - Embedding generation < 100ms per document

Sprint 5: Integration & Documentation (Week 8)¶

Objectives: - Frontend integration - Documentation - Deployment preparation

Deliverables: - Frontend AI query component - Device analysis dashboard widget - Report viewer component - Developer documentation - User guide - Deployment scripts - CI/CD pipeline updates

Technical Requirements¶

Infrastructure¶

Server Requirements: - CPU: 4+ cores (8+ recommended for better performance) - RAM: 16GB minimum (32GB recommended) - HOMEPOT Backend: 2-4GB - PostgreSQL: 2-4GB - Ollama (LLM): 6-8GB - ChromaDB: 1-2GB - System: 2GB - Storage: 50GB+ SSD - Ollama models: ~15GB (Llama 3.2 3B + Mistral 7B + Phi 3.5) - ChromaDB: Growing (estimate 1GB per 100K embeddings) - PostgreSQL: Existing + AI tables - GPU: Optional but recommended (NVIDIA GPU with 8GB+ VRAM for faster inference)

Software Requirements: - OS: Linux (Ubuntu 22.04+ recommended) or macOS - Python: 3.11+ - Ollama: Latest stable release - PostgreSQL: 14+ (existing HOMEPOT database) - pip/venv: For Python package management

Python Dependencies¶

Add to backend/requirements.txt:

# AI Infrastructure
chromadb>=0.4.22
sentence-transformers>=2.2.2
textblob>=0.17.1
pyyaml>=6.0.1
transformers>=4.36.0
torch>=2.1.0
numpy>=1.24.0

# Scheduling (for automated reports)
apscheduler>=3.10.4

Model Selection¶

Recommended Models:

Llama 3.2 3B (Default)
Size: 2GB
Speed: Fast (~40 tokens/sec on CPU)
Quality: Excellent for technical queries
Use case: General device queries, analysis
Mistral 7B
Size: 4.1GB
Speed: Moderate (~20 tokens/sec on CPU)
Quality: Best accuracy
Use case: Complex analysis, executive reports
Phi 3.5 3.8B
Size: 2.2GB
Speed: Very fast (~50 tokens/sec on CPU)
Quality: Good for simple queries
Use case: Quick status checks, simple summaries

Model Configuration (backend/src/homepot/ai/config.yaml):

ai_service:
  default_model: "llama3.2:3b"
  models:
    fast: "phi3.5:3.8b"
    balanced: "llama3.2:3b"
    accurate: "mistral:7b"

  ollama:
    url: "http://localhost:11434"
    timeout: 30
    max_retries: 3

  chromadb:
    path: "./chroma_db"
    collection_name: "device_patterns"

  embedding:
    model: "all-mpnet-base-v2"
    dimension: 768

  anomaly_detection:
    health_thresholds:
      healthy: 0.8
      warning: 0.4
      critical: 0.0

  caching:
    enabled: true
    ttl: 300  # 5 minutes
    max_size: 1000

API Specifications¶

1. Natural Language Query¶

Endpoint: POST /api/ai/query

Request:

{
  "question": "What's the status of all sensors in Building A?",
  "mode": "maintenance",  // optional: maintenance, predictive, executive
  "model": "llama3.2:3b"  // optional override
}

Response:

{
  "answer": "Building A has 12 active sensors. 11 are operating normally...",
  "relevant_devices": [
    {"id": "sensor_1234", "name": "Temperature Sensor - Room 101", "status": "normal"},
    {"id": "sensor_5678", "name": "Temperature Sensor - Room 102", "status": "warning"}
  ],
  "mode": "maintenance",
  "response_time_ms": 1847,
  "model_used": "llama3.2:3b"
}

2. Device Analysis¶

Endpoint: GET /api/ai/analyze/{device_id}

Query Parameters: - period: Time range (1h, 24h, 7d, 30d) - default: 24h - mode: Analysis mode - default: maintenance

Response:

{
  "device_id": "sensor_1234",
  "device_name": "Temperature Sensor - Server Room",
  "analysis": {
    "summary": "Device operating normally with occasional temperature spikes...",
    "health_score": 0.82,
    "status": "healthy",
    "anomalies_detected": 2,
    "similar_past_incidents": [
      {
        "date": "2025-11-15",
        "description": "Similar temperature spike pattern",
        "resolution": "Cooling system adjusted",
        "similarity_score": 0.91
      }
    ],
    "recommendations": [
      "Monitor cooling system performance",
      "Schedule preventive maintenance in 2 weeks"
    ]
  },
  "metrics": {
    "avg_temperature": 72.3,
    "max_temperature": 85.2,
    "uptime_percentage": 99.8
  },
  "period": "24h",
  "generated_at": "2025-11-27T14:30:00Z"
}

3. Anomaly Prediction¶

Endpoint: GET /api/ai/predict/{device_id}

Response:

{
  "device_id": "sensor_1234",
  "predictions": [
    {
      "prediction": "Temperature spike likely in next 6 hours",
      "confidence": 0.78,
      "reasoning": "Historical pattern shows temperature increase during afternoon peak hours",
      "recommended_action": "Verify cooling system operational status",
      "severity": "warning"
    }
  ],
  "risk_score": 0.65,
  "forecast_period": "24h",
  "generated_at": "2025-11-27T14:30:00Z"
}

4. Device Report Generation¶

Endpoint: GET /api/ai/report/{device_id}

Query Parameters: - period: Time range (24h, 7d, 30d) - default: 24h - report_type: daily, incident, weekly - default: daily - mode: maintenance, predictive, executive - default: maintenance

Response:

id=__codelineno-21-1 name=__codelineno-21-1 href=#__codelineno-21-1>{ "report_id": "rpt_20251127_1234", "device_id": "sensor_1234", "report_type": "daily", "mode": "executive", "content": { "title": "Daily Health Report - Temperature Sensor (Server Room)", "summary": "Device maintained 99.8% uptime with 2 minor anomalies detected...", "key_metrics": { "uptime": "99.8%", "avg_temperature": "72.3°C", "events_logged": 156, "anomalies": 2 }, "incidents": [ { "timestamp": "2025-11-27T10:30:00Z", "description": "Temperature spike to 85.2°C", "resolution": "Automatic cooling adjustment", "impact": "None" } ], "recommendations": [ "Schedule preventive maintenance in 2 weeks", "Consider upgrading cooling capacity for peak periods" ], "cost_impact": "$0 downtime cost, $150 estimated preventive maintenance" }, "generated_at": "2025-11-27T14:30:00Z" class=p>}

5. Site-Wide Insights¶

Endpoint: GET /api/ai/insights/{site_id}

Response:

{
  "site_id": "site_123",
  "site_name": "Building A",
  "insights": {
    "overall_health": "Good",
    "health_score": 0.87,
    "total_devices": 45,
    "devices_healthy": 42,
    "devices_warning": 2,
    "devices_critical": 1,
    "key_findings": [
      "HVAC system showing signs of reduced efficiency",
      "3 sensors approaching end of warranty period",
      "Network connectivity stable across all devices"
    ],
    "predicted_issues": [
      {
        "description": "HVAC compressor may require maintenance within 30 days",
        "confidence": 0.72,
        "estimated_cost": "$800-$1200"
      }
    ],
    "cost_analysis": {
      "current_month_downtime_cost": "$0",
      "prevented_issues_value": "$2400",
      "recommended_maintenance_budget": "$1500"
    }
  },
  "generated_at": "2025-11-27T14:30:00Z"
}

6. Ollama Health Check¶

Endpoint: GET /api/ai/status

Response:

{
  "ollama_status": "running",
  "ollama_url": "http://localhost:11434",
  "available_models": [
    {"name": "llama3.2:3b", "size": "2.0GB", "status": "ready"},
    {"name": "mistral:7b", "size": "4.1GB", "status": "ready"},
    {"name": "phi3.5:3.8b", "size": "2.2GB", "status": "ready"}
  ],
  "chromadb_status": "connected",
  "chromadb_collections": 2,
  "total_embeddings": 15847,
  "memory_usage": {
    "ollama": "6.2GB",
    "chromadb": "1.4GB"
  },
  "timestamp": "2025-11-27T14:30:00Z"
}

Data Flow¶

Natural Language Query Flow¶

1. User Query
2. POST /api/ai/query
3. Parse query + extract intent
4. Fetch relevant device data (PostgreSQL)
5. Retrieve similar past patterns (ChromaDB vector search)
6. Build context prompt:
   - User question
   - Current device data
   - Relevant historical patterns
   - Analysis mode (maintenance/predictive/executive)
7. Send to Ollama LLM
8. Generate response
9. Post-process (extract device references, format)
10. Return structured response to frontend

Anomaly Detection Flow¶

1. Device Metric Update (every 1 minute)
2. Calculate health score:
   - Compare to baseline
   - Check threshold violations
   - Analyze trend deviation
3. If anomaly detected:
4. Log to ai_anomaly_logs table
5. Generate incident description
6. Create embedding (SentenceTransformer)
7. Store in ChromaDB (device_patterns collection)
8. Search for similar past incidents
9. If similar incident found:
   - Retrieve past resolution
   - Generate recommendation
10. Send alert notification
11. Update frontend dashboard

Predictive Analysis Flow¶

1. Scheduled Task (hourly)
2. For each device:
3. Fetch 30-day historical metrics
4. Extract patterns (time-series analysis)
5. Query ChromaDB for similar progression patterns
6. Build prediction prompt:
   - Current metrics
   - Historical trend
   - Similar past failures (if any)
7. Send to Ollama LLM
8. Parse prediction + confidence score
9. If high-risk prediction:
   - Log to database
   - Generate alert
   - Create recommended action
10. Update predictive maintenance schedule

Testing Strategy¶

Unit Tests¶

Coverage Target: 80%+

Test Files:

backend/tests/ai/
├── test_llm.py                    # Ollama integration
├── test_device_memory.py          # ChromaDB operations
├── test_event_store.py            # Event caching
├── test_anomaly_detection.py      # Health scoring
├── test_analysis_modes.py         # Mode switching
└── test_api_endpoints.py          # API routes

Key Test Cases: - LLM response generation with mocked Ollama - Embedding generation and vector search - Health score calculation accuracy - Event caching and retrieval - Mode prompt formatting - Error handling (Ollama down, ChromaDB unavailable)

Integration Tests¶

Test Scenarios: 1. End-to-End Query Flow - User query → LLM response → structured output - Verify device data retrieval - Verify vector search integration

Anomaly Detection Pipeline
Simulate metric update → anomaly detection → alert generation
Verify pattern storage in ChromaDB
Verify similar incident retrieval
Report Generation
Generate daily report → verify content structure
Test different modes (maintenance, predictive, executive)
Verify report storage
Performance Tests
Query response time < 2s (95^th percentile)
Concurrent query handling (10+ users)
Memory usage monitoring

Load Testing¶

Tools: Locust or Apache JMeter

Test Scenarios: - 10 concurrent users querying devices - 100 requests/minute sustained load - Spike test: 50 concurrent queries

Metrics to Monitor: - Response time percentiles (p50, p95, p99) - Error rate - Ollama queue length - Memory usage - Database connection pool usage

Deployment Plan¶

Development Environment¶

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull models
ollama pull llama3.2:3b
ollama pull mistral:7b
ollama pull phi3.5:3.8b

# 3. Start Ollama service (in background)
ollama serve &

# 4. Activate virtual environment
cd /home/mghorbani/workspace/homepot-client
source .venv/bin/activate

# 5. Install Python dependencies
cd backend
pip install -r requirements.txt

# 6. Set environment variables
export OLLAMA_URL="http://localhost:11434"
export CHROMADB_PATH="./chroma_db"
export AI_ENABLED="true"

# 7. Run database migrations
alembic upgrade head

# 8. Initialize ChromaDB
python -m homepot.ai.init_chromadb

# 9. Start backend with AI service
uvicorn homepot.main:app --reload --host 0.0.0.0 --port 8000

Production Deployment (Native)¶

Systemd Service Setup:

Create /etc/systemd/system/homepot-ollama.service:

[Unit]
Description=Ollama LLM Service for HOMEPOT
After=network.target

[Service]
Type=simple
User=homepot
WorkingDirectory=/opt/homepot
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MODELS=/opt/homepot/ollama_models"
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Create /etc/systemd/system/homepot-backend.service:

[Unit]
Description=HOMEPOT Backend API with AI Service
After=network.target postgresql.service homepot-ollama.service
Requires=postgresql.service homepot-ollama.service

[Service]
Type=simple
User=homepot
WorkingDirectory=/opt/homepot/backend
Environment="PYTHONPATH=/opt/homepot/backend/src"
Environment="DATABASE_URL=postgresql://homepot_user:password@localhost:5432/homepot_db"
Environment="OLLAMA_URL=http://localhost:11434"
Environment="CHROMADB_PATH=/opt/homepot/chroma_db"
Environment="AI_ENABLED=true"
ExecStart=/opt/homepot/venv/bin/uvicorn homepot.main:app --host 0.0.0.0 --port 8000
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Deployment Steps:

# 1. Create deployment directory structure
sudo mkdir -p /opt/homepot/{backend,ollama_models,chroma_db,logs}
sudo useradd -r -s /bin/false homepot
sudo chown -R homepot:homepot /opt/homepot

# 2. Install Ollama system-wide
curl -fsSL https://ollama.com/install.sh | sh

# 3. Pull required models
sudo -u homepot ollama pull llama3.2:3b
sudo -u homepot ollama pull mistral:7b
sudo -u homepot ollama pull phi3.5:3.8b

# 4. Deploy application code
sudo cp -r /home/mghorbani/workspace/homepot-client/* /opt/homepot/
sudo chown -R homepot:homepot /opt/homepot

# 5. Set up Python virtual environment
cd /opt/homepot
sudo -u homepot python3.11 -m venv venv
sudo -u homepot /opt/homepot/venv/bin/pip install -r backend/requirements.txt

# 6. Run database migrations
sudo -u homepot /opt/homepot/venv/bin/alembic -c /opt/homepot/backend/alembic.ini upgrade head

# 7. Initialize ChromaDB
sudo -u homepot /opt/homepot/venv/bin/python -m homepot.ai.init_chromadb

# 8. Install and start systemd services
sudo systemctl daemon-reload
sudo systemctl enable homepot-ollama homepot-backend
sudo systemctl start homepot-ollama
sleep 5  # Wait for Ollama to initialize
sudo systemctl start homepot-backend

# 9. Verify services
sudo systemctl status homepot-ollama
sudo systemctl status homepot-backend

# 10. Test AI service
curl http://localhost:8000/api/ai/status

Monitoring & Maintenance¶

Metrics to Monitor: - Ollama response time - ChromaDB query performance - Memory usage (Ollama + ChromaDB) - LLM request queue length - Error rate - Cache hit rate

Logging:

# Add structured logging for AI operations
import logging

logger = logging.getLogger("homepot.ai")

# Log all LLM requests
logger.info(f"LLM query: {query[:100]}...", extra={
    "model": model,
    "user_id": user_id,
    "response_time_ms": response_time
})

# Log anomaly detections
logger.warning(f"Anomaly detected: {device_id}", extra={
    "health_score": health_score,
    "threshold": threshold,
    "severity": severity
})

Maintenance Tasks: - Weekly ChromaDB backup (tar -czf chroma_backup.tar.gz ./chroma_db) - Monthly model update check (ollama list and ollama pull <model>) - Quarterly embedding regeneration (if embedding model upgraded) - Daily cleanup of old cached events - Service health monitoring (systemctl status homepot-*) - Log rotation for Ollama and backend logs

Next Steps¶

Review this plan with the development team
Set up infrastructure (Ollama, ChromaDB)
Create feature branch (DONE: feature/ai-infrastructure)
Sprint 1 kickoff - Foundation setup
Weekly progress reviews and plan adjustments

References¶

Document Version: 1.0
Last Updated: November 27, 2025
Author: HOMEPOT Development Team
Status: Ready for Implementation