runbook init

This commit is contained in:
2026-05-09 09:36:21 -05:00
parent a3994a1121
commit e8bb27270e
4 changed files with 618 additions and 5 deletions

View File

@@ -0,0 +1,584 @@
# Fractional Insight CIO
## Operational Runbook — Daily & Weekly Administration
### DigitalOcean / Docker / Nextcloud / Fastmail Pilot Environment
---
# 1. Purpose
This operational runbook defines the recurring administrative tasks required to safely operate and maintain the client pilot environment hosted on DigitalOcean infrastructure using:
- Nextcloud
- Fastmail
- Docker containers
- Linux server administration
- Reverse proxy / SSL management
- Backup and recovery validation
- Security and compliance oversight
The goal of this runbook is to:
- Reduce operational risk
- Reduce exposure to liability
- Detect security incidents early
- Ensure recoverability of client data
- Maintain stable uptime and user access
- Establish evidence of reasonable administrative diligence
This document assumes:
- Remote users
- No internal IT staff
- Small pilot deployment
- Shared responsibility model between consultant and client
- MFA enforcement in both Fastmail and Nextcloud
---
# 2. Operational Philosophy
The environment should be treated as:
- A business collaboration platform
- A controlled data environment
- A security-sensitive system
- A system requiring documented administrative oversight
Because the platform contains:
- Client communications
- Potential confidential documents
- Shared file repositories
- User credentials
- Internet-exposed services
…administration must prioritize:
1. Security first
2. Recoverability second
3. Stability third
4. Convenience last
---
# 3. Daily Operational Tasks
---
## 3.1 Morning Health Check
### Frequency
Daily (business days)
### Estimated Time
1015 minutes
### Objective
Confirm that all core systems are operational before users begin work.
### Tasks
#### Infrastructure
- Verify droplet online in DigitalOcean
- Verify CPU/RAM/disk usage within normal thresholds
- Verify disk utilization below 80%
- Verify Docker daemon operational
#### Services
- Verify Nextcloud web login functional
- Verify Fastmail operational status
- Verify SSL certificates valid
- Verify reverse proxy routing functional
#### Containers
Check:
- Nextcloud container
- Database container
- Redis container
- Reverse proxy container
Example:
```bash
docker ps
```
#### External Access Test
Validate:
- HTTPS access
- File upload/download
- Login functionality
#### Email
Send/receive test email through Fastmail test account.
### Deliverable
- Daily operational log entry
---
## 3.2 Security Event Review
### Frequency
Daily
### Estimated Time
10 minutes
### Objective
Identify suspicious activity before escalation.
### Tasks
#### Review:
- Failed login attempts
- MFA failures
- New device logins
- Suspicious IP addresses
- Excessive upload activity
- Unexpected admin actions
#### Check:
- Nextcloud security warnings
- Linux auth logs
- Docker errors
- Reverse proxy logs
Example:
```bash
sudo journalctl -p 3 -xb
```
### Escalation Triggers
Immediate escalation if:
- Multiple failed admin logins
- MFA bypass suspicion
- Unknown admin account
- Malware/ransomware indicators
- Unexpected outbound traffic
### Deliverable
- Security review noted in operational log
---
## 3.3 Backup Verification
### Frequency
Daily
### Estimated Time
510 minutes
### Objective
Verify backups completed successfully.
### Tasks
#### Verify:
- Scheduled backup job completed
- Backup storage reachable
- Backup size reasonable
- No corruption warnings
- Snapshot success in DigitalOcean
#### Validate:
- Latest backup timestamp
- Database dump presence
- File archive generation
### Important
A backup that has not been validated should be treated as nonexistent.
### Deliverable
- Backup verification entry in operational log
---
## 3.4 User Administration Review
### Frequency
Daily
### Estimated Time
510 minutes
### Objective
Ensure user/account integrity.
### Tasks
#### Review:
- New user requests
- Disabled users
- Terminated personnel
- Permission changes
- Shared folder permissions
- Public links
#### Verify:
- No orphaned admin accounts
- MFA enabled for all admins
- Least-privilege principles maintained
### High-Risk Areas
- Shared folders with external access
- Public upload links
- Administrative delegation
### Deliverable
- Access review note
---
## 3.5 Incident Queue Review
### Frequency
Daily
### Estimated Time
515 minutes
### Objective
Identify unresolved operational or security issues.
### Tasks
Review:
- User tickets
- Error reports
- Sync failures
- Email delivery issues
- Storage complaints
- Permission problems
### Deliverable
- Updated incident tracking
---
# 4. Weekly Operational Tasks
---
## 4.1 Operating System Updates
### Frequency
Weekly
### Estimated Time
3060 minutes
### Objective
Maintain security posture and system stability.
### Tasks
#### Linux Updates
```bash
sudo apt update
sudo apt upgrade
```
#### Docker
- Update container images
- Rebuild containers if necessary
- Remove unused images
#### Validate:
- Nextcloud functionality after updates
- Database connectivity
- Reverse proxy operation
### Important
Do NOT apply major-version upgrades during business hours.
### Deliverable
- Patch log
- Change log entry
---
## 4.2 Nextcloud Maintenance Review
### Frequency
Weekly
### Estimated Time
2030 minutes
### Tasks
#### Review:
- Security warnings
- Integrity check results
- App updates
- Background jobs
- Storage consumption
#### Validate:
- Cron jobs functioning
- File scanning healthy
- No database corruption warnings
#### Execute
```bash
docker exec -it nextcloud-app php occ status
```
### Deliverable
- Weekly maintenance report
---
## 4.3 Backup Restore Test
### Frequency
Weekly
### Estimated Time
3060 minutes
### Objective
Prove recoverability.
### Tasks
Restore:
- Single file
- Database dump
- User folder sample
### Verify:
- File integrity
- Permissions
- Recovery speed
### Critical Principle
If restore testing is not performed, liability exposure increases substantially.
### Deliverable
- Restore validation report
---
## 4.4 Security Audit Review
### Frequency
Weekly
### Estimated Time
30 minutes
### Tasks
#### Review:
- Admin accounts
- Group memberships
- External shares
- Public links
- Expired accounts
- MFA compliance
#### Validate:
- SSL certificate expiration dates
- Firewall rules
- SSH access
- Root login disabled
- Fail2Ban status (if implemented)
### Deliverable
- Weekly security audit checklist
---
## 4.5 Capacity and Performance Review
### Frequency
Weekly
### Estimated Time
2030 minutes
### Tasks
#### Analyze:
- Storage growth
- User growth
- Bandwidth usage
- CPU/RAM trends
- Database size growth
#### Evaluate:
- Need for droplet resize
- Need for archive policies
- Need for retention changes
### Deliverable
- Capacity trend notes
---
## 4.6 Documentation and Change Log
### Frequency
Weekly
### Estimated Time
1520 minutes
### Objective
Maintain defensible operational records.
### Tasks
Document:
- Changes made
- Accounts added/removed
- Incidents
- Security events
- Backup issues
- Maintenance performed
### Important
Operational documentation is part of liability protection.
If a breach occurs, documented operational diligence matters significantly.
### Deliverable
- Weekly operational summary
---
# 5. Monthly Administrative Tasks
---
## 5.1 Full Disaster Recovery Exercise
### Estimated Time
24 hours
### Tasks
Simulate:
- Server loss
- Container rebuild
- Restore from backup
- DNS validation
- SSL restoration
---
## 5.2 User Access Certification
### Estimated Time
3060 minutes
### Tasks
Review with client:
- Active users
- Admin privileges
- External sharing
- Terminated employees
---
## 5.3 Security Policy Review
### Estimated Time
30 minutes
### Tasks
Review:
- MFA compliance
- Password standards
- Administrative access
- Training completion
---
# 6. Estimated Operational Effort
| Activity | Estimated Time |
|---|---|
| Daily Operations | 3560 min/day |
| Weekly Maintenance | 24 hrs/week |
| Monthly DR/Security | 36 hrs/month |
---
# 7. Recommended Retainer Guidance
For a pilot of this size:
| Service Level | Estimated Monthly Hours |
|---|---|
| Minimal Reactive Support | 810 hrs |
| Recommended Operational Support | 1520 hrs |
| Security-Conscious Managed Support | 2535 hrs |
Given the recent discussions around:
- liability
- data protection
- backup validation
- MFA enforcement
- user training
- documented diligence
…the “Recommended Operational Support” tier is likely the minimum responsible posture.
---
# 8. Key Risk Areas to Monitor
The largest liability exposure areas are:
## Administrative Misconfiguration
- Incorrect sharing permissions
- Public links
- Excessive admin rights
## Backup Failure
- Silent backup corruption
- Unverified restores
## Credential Compromise
- Weak passwords
- MFA disabled
- Phishing
## Delayed Patching
- Unpatched Nextcloud vulnerabilities
- Docker/container CVEs
- Linux exploits
## User Behavior
- Unsafe uploads
- Credential reuse
- Local machine compromise
## Lack of Documentation
- No operational evidence
- No audit trail
- Undefined responsibilities
---
# 9. Strong Recommendations
## Require:
- MFA for all users
- Mandatory admin training
- Signed acceptable use/security acknowledgment
- Principle of least privilege
## Strongly Recommended:
- Centralized logging
- Automated monitoring alerts
- Offsite backups
- Written incident response plan
- Cyber liability / E&O insurance
## Avoid:
- Shared admin accounts
- Permanent public links
- Unrestricted upload folders
- Direct root SSH access
- Unmanaged personal devices for administrators