584 lines
9.5 KiB
Markdown
584 lines
9.5 KiB
Markdown
# Fractional Insight CIO
|
||
## Operational Runbook — Daily & Weekly Administration
|
||
### DigitalOcean / Docker / Nextcloud / Fastmail Pilot Environment
|
||
|
||
---
|
||
|
||
# 1. Purpose
|
||
|
||
This operational runbook defines the recurring administrative tasks required to safely operate and maintain the client pilot environment hosted on DigitalOcean infrastructure using:
|
||
|
||
- Nextcloud
|
||
- Fastmail
|
||
- Docker containers
|
||
- Linux server administration
|
||
- Reverse proxy / SSL management
|
||
- Backup and recovery validation
|
||
- Security and compliance oversight
|
||
|
||
The goal of this runbook is to:
|
||
|
||
- Reduce operational risk
|
||
- Reduce exposure to liability
|
||
- Detect security incidents early
|
||
- Ensure recoverability of client data
|
||
- Maintain stable uptime and user access
|
||
- Establish evidence of reasonable administrative diligence
|
||
|
||
This document assumes:
|
||
- Remote users
|
||
- No internal IT staff
|
||
- Small pilot deployment
|
||
- Shared responsibility model between consultant and client
|
||
- MFA enforcement in both Fastmail and Nextcloud
|
||
|
||
---
|
||
|
||
# 2. Operational Philosophy
|
||
|
||
The environment should be treated as:
|
||
|
||
- A business collaboration platform
|
||
- A controlled data environment
|
||
- A security-sensitive system
|
||
- A system requiring documented administrative oversight
|
||
|
||
Because the platform contains:
|
||
- Client communications
|
||
- Potential confidential documents
|
||
- Shared file repositories
|
||
- User credentials
|
||
- Internet-exposed services
|
||
|
||
…administration must prioritize:
|
||
|
||
1. Security first
|
||
2. Recoverability second
|
||
3. Stability third
|
||
4. Convenience last
|
||
|
||
---
|
||
|
||
# 3. Daily Operational Tasks
|
||
|
||
---
|
||
|
||
## 3.1 Morning Health Check
|
||
|
||
### Frequency
|
||
Daily (business days)
|
||
|
||
### Estimated Time
|
||
10–15 minutes
|
||
|
||
### Objective
|
||
Confirm that all core systems are operational before users begin work.
|
||
|
||
### Tasks
|
||
|
||
#### Infrastructure
|
||
- Verify droplet online in DigitalOcean
|
||
- Verify CPU/RAM/disk usage within normal thresholds
|
||
- Verify disk utilization below 80%
|
||
- Verify Docker daemon operational
|
||
|
||
#### Services
|
||
- Verify Nextcloud web login functional
|
||
- Verify Fastmail operational status
|
||
- Verify SSL certificates valid
|
||
- Verify reverse proxy routing functional
|
||
|
||
#### Containers
|
||
Check:
|
||
- Nextcloud container
|
||
- Database container
|
||
- Redis container
|
||
- Reverse proxy container
|
||
|
||
Example:
|
||
```bash
|
||
docker ps
|
||
```
|
||
|
||
#### External Access Test
|
||
Validate:
|
||
- HTTPS access
|
||
- File upload/download
|
||
- Login functionality
|
||
|
||
#### Email
|
||
Send/receive test email through Fastmail test account.
|
||
|
||
### Deliverable
|
||
- Daily operational log entry
|
||
|
||
---
|
||
|
||
## 3.2 Security Event Review
|
||
|
||
### Frequency
|
||
Daily
|
||
|
||
### Estimated Time
|
||
10 minutes
|
||
|
||
### Objective
|
||
Identify suspicious activity before escalation.
|
||
|
||
### Tasks
|
||
|
||
#### Review:
|
||
- Failed login attempts
|
||
- MFA failures
|
||
- New device logins
|
||
- Suspicious IP addresses
|
||
- Excessive upload activity
|
||
- Unexpected admin actions
|
||
|
||
#### Check:
|
||
- Nextcloud security warnings
|
||
- Linux auth logs
|
||
- Docker errors
|
||
- Reverse proxy logs
|
||
|
||
Example:
|
||
```bash
|
||
sudo journalctl -p 3 -xb
|
||
```
|
||
|
||
### Escalation Triggers
|
||
Immediate escalation if:
|
||
- Multiple failed admin logins
|
||
- MFA bypass suspicion
|
||
- Unknown admin account
|
||
- Malware/ransomware indicators
|
||
- Unexpected outbound traffic
|
||
|
||
### Deliverable
|
||
- Security review noted in operational log
|
||
|
||
---
|
||
|
||
## 3.3 Backup Verification
|
||
|
||
### Frequency
|
||
Daily
|
||
|
||
### Estimated Time
|
||
5–10 minutes
|
||
|
||
### Objective
|
||
Verify backups completed successfully.
|
||
|
||
### Tasks
|
||
|
||
#### Verify:
|
||
- Scheduled backup job completed
|
||
- Backup storage reachable
|
||
- Backup size reasonable
|
||
- No corruption warnings
|
||
- Snapshot success in DigitalOcean
|
||
|
||
#### Validate:
|
||
- Latest backup timestamp
|
||
- Database dump presence
|
||
- File archive generation
|
||
|
||
### Important
|
||
A backup that has not been validated should be treated as nonexistent.
|
||
|
||
### Deliverable
|
||
- Backup verification entry in operational log
|
||
|
||
---
|
||
|
||
## 3.4 User Administration Review
|
||
|
||
### Frequency
|
||
Daily
|
||
|
||
### Estimated Time
|
||
5–10 minutes
|
||
|
||
### Objective
|
||
Ensure user/account integrity.
|
||
|
||
### Tasks
|
||
|
||
#### Review:
|
||
- New user requests
|
||
- Disabled users
|
||
- Terminated personnel
|
||
- Permission changes
|
||
- Shared folder permissions
|
||
- Public links
|
||
|
||
#### Verify:
|
||
- No orphaned admin accounts
|
||
- MFA enabled for all admins
|
||
- Least-privilege principles maintained
|
||
|
||
### High-Risk Areas
|
||
- Shared folders with external access
|
||
- Public upload links
|
||
- Administrative delegation
|
||
|
||
### Deliverable
|
||
- Access review note
|
||
|
||
---
|
||
|
||
## 3.5 Incident Queue Review
|
||
|
||
### Frequency
|
||
Daily
|
||
|
||
### Estimated Time
|
||
5–15 minutes
|
||
|
||
### Objective
|
||
Identify unresolved operational or security issues.
|
||
|
||
### Tasks
|
||
|
||
Review:
|
||
- User tickets
|
||
- Error reports
|
||
- Sync failures
|
||
- Email delivery issues
|
||
- Storage complaints
|
||
- Permission problems
|
||
|
||
### Deliverable
|
||
- Updated incident tracking
|
||
|
||
---
|
||
|
||
# 4. Weekly Operational Tasks
|
||
|
||
---
|
||
|
||
## 4.1 Operating System Updates
|
||
|
||
### Frequency
|
||
Weekly
|
||
|
||
### Estimated Time
|
||
30–60 minutes
|
||
|
||
### Objective
|
||
Maintain security posture and system stability.
|
||
|
||
### Tasks
|
||
|
||
#### Linux Updates
|
||
```bash
|
||
sudo apt update
|
||
sudo apt upgrade
|
||
```
|
||
|
||
#### Docker
|
||
- Update container images
|
||
- Rebuild containers if necessary
|
||
- Remove unused images
|
||
|
||
#### Validate:
|
||
- Nextcloud functionality after updates
|
||
- Database connectivity
|
||
- Reverse proxy operation
|
||
|
||
### Important
|
||
Do NOT apply major-version upgrades during business hours.
|
||
|
||
### Deliverable
|
||
- Patch log
|
||
- Change log entry
|
||
|
||
---
|
||
|
||
## 4.2 Nextcloud Maintenance Review
|
||
|
||
### Frequency
|
||
Weekly
|
||
|
||
### Estimated Time
|
||
20–30 minutes
|
||
|
||
### Tasks
|
||
|
||
#### Review:
|
||
- Security warnings
|
||
- Integrity check results
|
||
- App updates
|
||
- Background jobs
|
||
- Storage consumption
|
||
|
||
#### Validate:
|
||
- Cron jobs functioning
|
||
- File scanning healthy
|
||
- No database corruption warnings
|
||
|
||
#### Execute
|
||
```bash
|
||
docker exec -it nextcloud-app php occ status
|
||
```
|
||
|
||
### Deliverable
|
||
- Weekly maintenance report
|
||
|
||
---
|
||
|
||
## 4.3 Backup Restore Test
|
||
|
||
### Frequency
|
||
Weekly
|
||
|
||
### Estimated Time
|
||
30–60 minutes
|
||
|
||
### Objective
|
||
Prove recoverability.
|
||
|
||
### Tasks
|
||
|
||
Restore:
|
||
- Single file
|
||
- Database dump
|
||
- User folder sample
|
||
|
||
### Verify:
|
||
- File integrity
|
||
- Permissions
|
||
- Recovery speed
|
||
|
||
### Critical Principle
|
||
If restore testing is not performed, liability exposure increases substantially.
|
||
|
||
### Deliverable
|
||
- Restore validation report
|
||
|
||
---
|
||
|
||
## 4.4 Security Audit Review
|
||
|
||
### Frequency
|
||
Weekly
|
||
|
||
### Estimated Time
|
||
30 minutes
|
||
|
||
### Tasks
|
||
|
||
#### Review:
|
||
- Admin accounts
|
||
- Group memberships
|
||
- External shares
|
||
- Public links
|
||
- Expired accounts
|
||
- MFA compliance
|
||
|
||
#### Validate:
|
||
- SSL certificate expiration dates
|
||
- Firewall rules
|
||
- SSH access
|
||
- Root login disabled
|
||
- Fail2Ban status (if implemented)
|
||
|
||
### Deliverable
|
||
- Weekly security audit checklist
|
||
|
||
---
|
||
|
||
## 4.5 Capacity and Performance Review
|
||
|
||
### Frequency
|
||
Weekly
|
||
|
||
### Estimated Time
|
||
20–30 minutes
|
||
|
||
### Tasks
|
||
|
||
#### Analyze:
|
||
- Storage growth
|
||
- User growth
|
||
- Bandwidth usage
|
||
- CPU/RAM trends
|
||
- Database size growth
|
||
|
||
#### Evaluate:
|
||
- Need for droplet resize
|
||
- Need for archive policies
|
||
- Need for retention changes
|
||
|
||
### Deliverable
|
||
- Capacity trend notes
|
||
|
||
---
|
||
|
||
## 4.6 Documentation and Change Log
|
||
|
||
### Frequency
|
||
Weekly
|
||
|
||
### Estimated Time
|
||
15–20 minutes
|
||
|
||
### Objective
|
||
Maintain defensible operational records.
|
||
|
||
### Tasks
|
||
|
||
Document:
|
||
- Changes made
|
||
- Accounts added/removed
|
||
- Incidents
|
||
- Security events
|
||
- Backup issues
|
||
- Maintenance performed
|
||
|
||
### Important
|
||
Operational documentation is part of liability protection.
|
||
|
||
If a breach occurs, documented operational diligence matters significantly.
|
||
|
||
### Deliverable
|
||
- Weekly operational summary
|
||
|
||
---
|
||
|
||
# 5. Monthly Administrative Tasks
|
||
|
||
---
|
||
|
||
## 5.1 Full Disaster Recovery Exercise
|
||
|
||
### Estimated Time
|
||
2–4 hours
|
||
|
||
### Tasks
|
||
Simulate:
|
||
- Server loss
|
||
- Container rebuild
|
||
- Restore from backup
|
||
- DNS validation
|
||
- SSL restoration
|
||
|
||
---
|
||
|
||
## 5.2 User Access Certification
|
||
|
||
### Estimated Time
|
||
30–60 minutes
|
||
|
||
### Tasks
|
||
Review with client:
|
||
- Active users
|
||
- Admin privileges
|
||
- External sharing
|
||
- Terminated employees
|
||
|
||
---
|
||
|
||
## 5.3 Security Policy Review
|
||
|
||
### Estimated Time
|
||
30 minutes
|
||
|
||
### Tasks
|
||
Review:
|
||
- MFA compliance
|
||
- Password standards
|
||
- Administrative access
|
||
- Training completion
|
||
|
||
---
|
||
|
||
# 6. Estimated Operational Effort
|
||
|
||
| Activity | Estimated Time |
|
||
|---|---|
|
||
| Daily Operations | 35–60 min/day |
|
||
| Weekly Maintenance | 2–4 hrs/week |
|
||
| Monthly DR/Security | 3–6 hrs/month |
|
||
|
||
---
|
||
|
||
# 7. Recommended Retainer Guidance
|
||
|
||
For a pilot of this size:
|
||
|
||
| Service Level | Estimated Monthly Hours |
|
||
|---|---|
|
||
| Minimal Reactive Support | 8–10 hrs |
|
||
| Recommended Operational Support | 15–20 hrs |
|
||
| Security-Conscious Managed Support | 25–35 hrs |
|
||
|
||
Given the recent discussions around:
|
||
- liability
|
||
- data protection
|
||
- backup validation
|
||
- MFA enforcement
|
||
- user training
|
||
- documented diligence
|
||
|
||
…the “Recommended Operational Support” tier is likely the minimum responsible posture.
|
||
|
||
---
|
||
|
||
# 8. Key Risk Areas to Monitor
|
||
|
||
The largest liability exposure areas are:
|
||
|
||
## Administrative Misconfiguration
|
||
- Incorrect sharing permissions
|
||
- Public links
|
||
- Excessive admin rights
|
||
|
||
## Backup Failure
|
||
- Silent backup corruption
|
||
- Unverified restores
|
||
|
||
## Credential Compromise
|
||
- Weak passwords
|
||
- MFA disabled
|
||
- Phishing
|
||
|
||
## Delayed Patching
|
||
- Unpatched Nextcloud vulnerabilities
|
||
- Docker/container CVEs
|
||
- Linux exploits
|
||
|
||
## User Behavior
|
||
- Unsafe uploads
|
||
- Credential reuse
|
||
- Local machine compromise
|
||
|
||
## Lack of Documentation
|
||
- No operational evidence
|
||
- No audit trail
|
||
- Undefined responsibilities
|
||
|
||
---
|
||
|
||
# 9. Strong Recommendations
|
||
|
||
## Require:
|
||
- MFA for all users
|
||
- Mandatory admin training
|
||
- Signed acceptable use/security acknowledgment
|
||
- Principle of least privilege
|
||
|
||
## Strongly Recommended:
|
||
- Centralized logging
|
||
- Automated monitoring alerts
|
||
- Offsite backups
|
||
- Written incident response plan
|
||
- Cyber liability / E&O insurance
|
||
|
||
## Avoid:
|
||
- Shared admin accounts
|
||
- Permanent public links
|
||
- Unrestricted upload folders
|
||
- Direct root SSH access
|
||
- Unmanaged personal devices for administrators |