ws-sanctum-chronicler/README.md

# The Sanctum Chronicler

A Dockerized Python MVP for an AI stream assistant that monitors Twitch chat, gently guides conversation, stores stream events, and exports a post-stream markdown ledger.

## Overview

**The Sanctum Chronicler** is an intelligent assistant designed to enhance live streaming by:
- Monitoring real-time chat and stream activity
- Maintaining a warm, non-intrusive presence during streams
- Flagging suspicious content and spam patterns
- Archiving discussion highlights and clip candidates
- Generating post-stream ledgers and blog ideas

The system uses a multi-mode agent architecture, where different "personas" handle different aspects of stream management:
- **Hearthkeeper** - Gently prompts chat when it's quiet
- **Steward** - Responds thoughtfully to engagement
- **Warden** - Detects suspicious content and spam
- **Librarian** - Archives important discussion
- **Scribe** - Compiles post-stream ledgers

## Architecture

### Project Structure

```
sanctum-agent/
├── app/
│   ├── main.py                 # FastAPI application
│   ├── config.py               # Configuration (pydantic-settings)
│   ├── twitch/
│   │   ├── eventsub.py        # Twitch EventSub client (stub)
│   │   └── chat.py            # Chat message handling (stub)
│   ├── agent/
│   │   ├── orchestrator.py    # Main agent orchestrator
│   │   ├── policies.py        # Behavior policies
│   │   └── modes/
│   │       ├── hearthkeeper.py
│   │       ├── steward.py
│   │       ├── warden.py
│   │       ├── librarian.py
│   │       └── scribe.py
│   ├── memory/
│   │   ├── database.py        # Async SQLAlchemy setup
│   │   ├── models.py          # Database models
│   │   └── repository.py      # Data access layer
│   ├── llm/
│   │   ├── client.py          # Pluggable LLM client
│   │   └── prompts.py         # Prompt templates
│   └── exports/
│       └── markdown.py        # Markdown ledger generation
├── exports/                    # Generated ledgers
├── data/                       # Local data storage
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .env.example
└── README.md
```

### Tech Stack

- **Python 3.12** - Core language
- **FastAPI** - REST API framework
- **SQLAlchemy + asyncpg** - Async database ORM
- **PostgreSQL** - Primary data store
- **Docker & Docker Compose** - Containerization
- **Pydantic** - Configuration and validation

### Key Design Patterns

**Agent Modes:** Each mode operates independently but shares access to:
- The LLM client for text generation
- The database repository for persistence
- Shared policies for behavior control

**Policies:** Encapsulate decision logic:
- `ChatActivityPolicy` - Tracks inactivity periods
- `ResponseSuppression` - Avoids speaking during active chat
- `SuspiciousContentPolicy` - Pattern matching for spam/scams

**Async Architecture:** All I/O operations are non-blocking:
- Database queries use `asyncpg`
- FastAPI endpoints handle concurrent requests
- LLM calls prepare for real API integration

## Setup & Quick Start

### Prerequisites

- Docker & Docker Compose
- Python 3.12 (for local development)
- PostgreSQL 16 (or use Docker)

### 1. Clone Repository

```bash
cd ws-sanctum-chronicler
```

### 2. Configure Environment

```bash
cp .env.example .env
# Edit .env with your settings (Twitch tokens, LLM provider, etc.)
```

### 3. Start Services

```bash
docker-compose up --build
```

The API will be available at `http://localhost:8000`

### 4. Test the API

**Health Check:**
```bash
curl http://localhost:8000/health
```

**Start Session:**
```bash
curl -X POST http://localhost:8000/admin/session/start \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "channel_name=example_channel"
```

**Send Test Message:**
```bash
curl -X POST http://localhost:8000/admin/test-message \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "session_id=<SESSION_ID>&username=test_user&message=Hello stream!"
```

**Get Ledger:**
```bash
curl http://localhost:8000/admin/ledger?session_id=<SESSION_ID>
```

**End Session:**
```bash
curl -X POST http://localhost:8000/admin/session/end \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "session_id=<SESSION_ID>"
```

## API Endpoints

### Health & Status

- `GET /health` - Application health check

### Session Management

- `POST /admin/session/start?channel_name=<name>` - Start stream session
- `POST /admin/session/end?session_id=<id>` - End stream session

### Testing & Admin

- `POST /admin/test-message?session_id=<id>&username=<user>&message=<msg>` - Send test message
- `GET /admin/ledger?session_id=<id>` - Retrieve markdown ledger

## Configuration

All settings are loaded from environment variables (see `.env.example`):

### Application

- `APP_NAME` - Application display name
- `APP_ENV` - Environment (development/production)
- `DEBUG` - Enable debug logging

### Database

- `DATABASE_URL` - PostgreSQL connection string
- `DB_PASSWORD` - Database password (for docker-compose)

### Twitch (Optional - Stubs Present)

- `TWITCH_CLIENT_ID` - Twitch OAuth client ID
- `TWITCH_CLIENT_SECRET` - Twitch OAuth secret
- `TWITCH_BOT_USERNAME` - Bot username
- `TWITCH_CHANNEL_NAME` - Channel to monitor

### LLM

- `LLM_PROVIDER` - Provider: `openai`, `ollama`, `lm_studio`, or empty for mock
- `LLM_BASE_URL` - API endpoint (for local providers)
- `LLM_API_KEY` - API key (if needed)
- `LLM_MODEL` - Model identifier (default: gpt-3.5-turbo)

### Export

- `EXPORT_PATH` - Directory for ledger exports

## Agent Policies

### Chat Activity Policy

- **Inactivity Threshold:** 15 minutes
- **Hearthkeeper Activation:** Sends gentle prompt when no messages for 15+ minutes
- **Human Override:** Hearthkeeper stays silent if chat is active (5+ messages/minute)

### Response Suppression

- **Active Chat Threshold:** 5 messages per minute
- **Behavior:** Agent suppresses responses when humans are actively talking
- **Rationale:** Respects human conversation and avoids noise

### Suspicious Content Detection

**Patterns Detected:**
- "join our discord", "discord.gg" (growth spam)
- "grow your channel", "easy money" (scams)
- Multiple URLs (spam)
- Common scam keywords

**Actions:** Warden flags suspicious messages (not auto-delete)

## Database Schema

### StreamSession
- `id` (UUID) - Primary key
- `channel_name` - Twitch channel
- `started_at` - Session start time
- `ended_at` - Session end time (null if active)
- `theme` - Stream theme
- `is_active` - Boolean flag

### ChatMessage
- `id` (UUID)
- `session_id` - Reference to session
- `username` - Message author
- `content` - Message text
- `timestamp` - Message time
- `is_bot`, `is_moderator` - Flags

### AgentAction
- `id` (UUID)
- `session_id` - Reference to session
- `action_type` - RESPONSE, FLAG_SUSPICIOUS, ARCHIVE_CLIP, etc.
- `mode` - Which agent mode took action
- `triggered_by_message_id` - Message that triggered action
- `description` - Action details

### ClipCandidate
- `id` (UUID)
- `session_id`, `message_id`
- `reason` - Why it's clip-worthy

### BlogSeed
- `id` (UUID)
- `session_id`
- `topic`, `description`
- `related_messages` - JSON array

## LLM Integration

The system includes a pluggable LLM client that currently:
- Generates mock responses when no provider is configured
- Prepares for OpenAI, Ollama, and LM Studio integration

**Current Mock Behavior:**
- Returns deterministic responses based on keywords
- Useful for testing without API costs

**Implementing Real Providers:**

See `app/llm/client.py` for TODO comments marking where to integrate:
- `_generate_openai()` - OpenAI API calls
- `_generate_ollama()` - Ollama local API
- `_generate_lm_studio()` - LM Studio API

## Current Limitations

This is **scaffolding, not production code**:

- Twitch EventSub connection is a stub (see TODO comments)
- Chat sending is not implemented
- LLM providers are not integrated yet (mock mode works)
- No real OAuth flow for Twitch
- Database migrations are automatic (no versioning)
- No rate limiting on endpoints
- No authentication/authorization
- Ledger export is basic markdown (no formatting options)

## Next Implementation Steps

### Phase 1: Core Features (Recommended)

1. **Implement real Twitch integration:**
   - Implement EventSub WebSocket connection in `app/twitch/eventsub.py`
   - Implement send chat message API in `app/twitch/chat.py`
   - Add OAuth token exchange flow

2. **Integrate real LLM provider:**
   - Choose provider (e.g., Ollama for self-hosted)
   - Implement `_generate_ollama()` or `_generate_openai()`
   - Test with actual model

3. **Enhance agent modes:**
   - Refine Hearthkeeper timing logic
   - Implement Steward mention detection
   - Expand Warden pattern library
   - Complete Librarian topic extraction

### Phase 2: User Experience

1. **Add UI/Dashboard:**
   - Stream monitoring view
   - Ledger generation UI
   - Settings panel

2. **Improve exports:**
   - Configurable markdown templates
   - JSON export option
   - Email distribution

3. **Add persistence:**
   - Session history
   - Settings storage per channel
   - Analytics dashboard

### Phase 3: Production Readiness

1. **Testing:**
   - Unit tests for policies
   - Integration tests for agent modes
   - E2E tests for full flows

2. **DevOps:**
   - Database migrations (Alembic)
   - Logging aggregation
   - Monitoring/alerting

3. **Performance:**
   - Rate limiting
   - Caching for repeated LLM calls
   - Message deduplication

## Development

### Local Setup (Without Docker)

```bash
# Create virtual environment
python3.12 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Create .env file
cp .env.example .env

# Run migrations (auto-created on app startup)
# Start app
python -m uvicorn app.main:app --reload
```

### Database Access

```bash
# Connect to running PostgreSQL
docker-compose exec sanctum-db psql -U sanctum -d sanctum

# View tables
\dt

# Query sessions
SELECT id, channel_name, started_at FROM stream_sessions;
```

### Logging

The application uses Python's standard logging. Configure in `app/main.py`:

```python
logging.basicConfig(level=logging.DEBUG)
```

## Contributing

When adding features:
- Maintain async/await patterns throughout
- Add type hints to all functions
- Include docstrings with purpose and TODO comments for future work
- Keep modes independent but shareable

## License

(Add your license here)

## Support

For issues or questions:
1. Check TODO comments in relevant files
2. Review the architecture overview
3. File an issue with reproduction steps