9.5 KiB
Fractional Insight CIO
Operational Runbook — Daily & Weekly Administration
DigitalOcean / Docker / Nextcloud / Fastmail Pilot Environment
1. Purpose
This operational runbook defines the recurring administrative tasks required to safely operate and maintain the client pilot environment hosted on DigitalOcean infrastructure using:
- Nextcloud
- Fastmail
- Docker containers
- Linux server administration
- Reverse proxy / SSL management
- Backup and recovery validation
- Security and compliance oversight
The goal of this runbook is to:
- Reduce operational risk
- Reduce exposure to liability
- Detect security incidents early
- Ensure recoverability of client data
- Maintain stable uptime and user access
- Establish evidence of reasonable administrative diligence
This document assumes:
- Remote users
- No internal IT staff
- Small pilot deployment
- Shared responsibility model between consultant and client
- MFA enforcement in both Fastmail and Nextcloud
2. Operational Philosophy
The environment should be treated as:
- A business collaboration platform
- A controlled data environment
- A security-sensitive system
- A system requiring documented administrative oversight
Because the platform contains:
- Client communications
- Potential confidential documents
- Shared file repositories
- User credentials
- Internet-exposed services
…administration must prioritize:
- Security first
- Recoverability second
- Stability third
- Convenience last
3. Daily Operational Tasks
3.1 Morning Health Check
Frequency
Daily (business days)
Estimated Time
10–15 minutes
Objective
Confirm that all core systems are operational before users begin work.
Tasks
Infrastructure
- Verify droplet online in DigitalOcean
- Verify CPU/RAM/disk usage within normal thresholds
- Verify disk utilization below 80%
- Verify Docker daemon operational
Services
- Verify Nextcloud web login functional
- Verify Fastmail operational status
- Verify SSL certificates valid
- Verify reverse proxy routing functional
Containers
Check:
- Nextcloud container
- Database container
- Redis container
- Reverse proxy container
Example:
docker ps
External Access Test
Validate:
- HTTPS access
- File upload/download
- Login functionality
Send/receive test email through Fastmail test account.
Deliverable
- Daily operational log entry
3.2 Security Event Review
Frequency
Daily
Estimated Time
10 minutes
Objective
Identify suspicious activity before escalation.
Tasks
Review:
- Failed login attempts
- MFA failures
- New device logins
- Suspicious IP addresses
- Excessive upload activity
- Unexpected admin actions
Check:
- Nextcloud security warnings
- Linux auth logs
- Docker errors
- Reverse proxy logs
Example:
sudo journalctl -p 3 -xb
Escalation Triggers
Immediate escalation if:
- Multiple failed admin logins
- MFA bypass suspicion
- Unknown admin account
- Malware/ransomware indicators
- Unexpected outbound traffic
Deliverable
- Security review noted in operational log
3.3 Backup Verification
Frequency
Daily
Estimated Time
5–10 minutes
Objective
Verify backups completed successfully.
Tasks
Verify:
- Scheduled backup job completed
- Backup storage reachable
- Backup size reasonable
- No corruption warnings
- Snapshot success in DigitalOcean
Validate:
- Latest backup timestamp
- Database dump presence
- File archive generation
Important
A backup that has not been validated should be treated as nonexistent.
Deliverable
- Backup verification entry in operational log
3.4 User Administration Review
Frequency
Daily
Estimated Time
5–10 minutes
Objective
Ensure user/account integrity.
Tasks
Review:
- New user requests
- Disabled users
- Terminated personnel
- Permission changes
- Shared folder permissions
- Public links
Verify:
- No orphaned admin accounts
- MFA enabled for all admins
- Least-privilege principles maintained
High-Risk Areas
- Shared folders with external access
- Public upload links
- Administrative delegation
Deliverable
- Access review note
3.5 Incident Queue Review
Frequency
Daily
Estimated Time
5–15 minutes
Objective
Identify unresolved operational or security issues.
Tasks
Review:
- User tickets
- Error reports
- Sync failures
- Email delivery issues
- Storage complaints
- Permission problems
Deliverable
- Updated incident tracking
4. Weekly Operational Tasks
4.1 Operating System Updates
Frequency
Weekly
Estimated Time
30–60 minutes
Objective
Maintain security posture and system stability.
Tasks
Linux Updates
sudo apt update
sudo apt upgrade
Docker
- Update container images
- Rebuild containers if necessary
- Remove unused images
Validate:
- Nextcloud functionality after updates
- Database connectivity
- Reverse proxy operation
Important
Do NOT apply major-version upgrades during business hours.
Deliverable
- Patch log
- Change log entry
4.2 Nextcloud Maintenance Review
Frequency
Weekly
Estimated Time
20–30 minutes
Tasks
Review:
- Security warnings
- Integrity check results
- App updates
- Background jobs
- Storage consumption
Validate:
- Cron jobs functioning
- File scanning healthy
- No database corruption warnings
Execute
docker exec -it nextcloud-app php occ status
Deliverable
- Weekly maintenance report
4.3 Backup Restore Test
Frequency
Weekly
Estimated Time
30–60 minutes
Objective
Prove recoverability.
Tasks
Restore:
- Single file
- Database dump
- User folder sample
Verify:
- File integrity
- Permissions
- Recovery speed
Critical Principle
If restore testing is not performed, liability exposure increases substantially.
Deliverable
- Restore validation report
4.4 Security Audit Review
Frequency
Weekly
Estimated Time
30 minutes
Tasks
Review:
- Admin accounts
- Group memberships
- External shares
- Public links
- Expired accounts
- MFA compliance
Validate:
- SSL certificate expiration dates
- Firewall rules
- SSH access
- Root login disabled
- Fail2Ban status (if implemented)
Deliverable
- Weekly security audit checklist
4.5 Capacity and Performance Review
Frequency
Weekly
Estimated Time
20–30 minutes
Tasks
Analyze:
- Storage growth
- User growth
- Bandwidth usage
- CPU/RAM trends
- Database size growth
Evaluate:
- Need for droplet resize
- Need for archive policies
- Need for retention changes
Deliverable
- Capacity trend notes
4.6 Documentation and Change Log
Frequency
Weekly
Estimated Time
15–20 minutes
Objective
Maintain defensible operational records.
Tasks
Document:
- Changes made
- Accounts added/removed
- Incidents
- Security events
- Backup issues
- Maintenance performed
Important
Operational documentation is part of liability protection.
If a breach occurs, documented operational diligence matters significantly.
Deliverable
- Weekly operational summary
5. Monthly Administrative Tasks
5.1 Full Disaster Recovery Exercise
Estimated Time
2–4 hours
Tasks
Simulate:
- Server loss
- Container rebuild
- Restore from backup
- DNS validation
- SSL restoration
5.2 User Access Certification
Estimated Time
30–60 minutes
Tasks
Review with client:
- Active users
- Admin privileges
- External sharing
- Terminated employees
5.3 Security Policy Review
Estimated Time
30 minutes
Tasks
Review:
- MFA compliance
- Password standards
- Administrative access
- Training completion
6. Estimated Operational Effort
| Activity | Estimated Time |
|---|---|
| Daily Operations | 35–60 min/day |
| Weekly Maintenance | 2–4 hrs/week |
| Monthly DR/Security | 3–6 hrs/month |
7. Recommended Retainer Guidance
For a pilot of this size:
| Service Level | Estimated Monthly Hours |
|---|---|
| Minimal Reactive Support | 8–10 hrs |
| Recommended Operational Support | 15–20 hrs |
| Security-Conscious Managed Support | 25–35 hrs |
Given the recent discussions around:
- liability
- data protection
- backup validation
- MFA enforcement
- user training
- documented diligence
…the “Recommended Operational Support” tier is likely the minimum responsible posture.
8. Key Risk Areas to Monitor
The largest liability exposure areas are:
Administrative Misconfiguration
- Incorrect sharing permissions
- Public links
- Excessive admin rights
Backup Failure
- Silent backup corruption
- Unverified restores
Credential Compromise
- Weak passwords
- MFA disabled
- Phishing
Delayed Patching
- Unpatched Nextcloud vulnerabilities
- Docker/container CVEs
- Linux exploits
User Behavior
- Unsafe uploads
- Credential reuse
- Local machine compromise
Lack of Documentation
- No operational evidence
- No audit trail
- Undefined responsibilities
9. Strong Recommendations
Require:
- MFA for all users
- Mandatory admin training
- Signed acceptable use/security acknowledgment
- Principle of least privilege
Strongly Recommended:
- Centralized logging
- Automated monitoring alerts
- Offsite backups
- Written incident response plan
- Cyber liability / E&O insurance
Avoid:
- Shared admin accounts
- Permanent public links
- Unrestricted upload folders
- Direct root SSH access
- Unmanaged personal devices for administrators