runbook init
This commit is contained in:
1270
Runbooks/Implementation Runbook.md
Normal file
1270
Runbooks/Implementation Runbook.md
Normal file
File diff suppressed because it is too large
Load Diff
584
Runbooks/System Admin Runbook.md
Normal file
584
Runbooks/System Admin Runbook.md
Normal file
@@ -0,0 +1,584 @@
|
||||
# Fractional Insight CIO
|
||||
## Operational Runbook — Daily & Weekly Administration
|
||||
### DigitalOcean / Docker / Nextcloud / Fastmail Pilot Environment
|
||||
|
||||
---
|
||||
|
||||
# 1. Purpose
|
||||
|
||||
This operational runbook defines the recurring administrative tasks required to safely operate and maintain the client pilot environment hosted on DigitalOcean infrastructure using:
|
||||
|
||||
- Nextcloud
|
||||
- Fastmail
|
||||
- Docker containers
|
||||
- Linux server administration
|
||||
- Reverse proxy / SSL management
|
||||
- Backup and recovery validation
|
||||
- Security and compliance oversight
|
||||
|
||||
The goal of this runbook is to:
|
||||
|
||||
- Reduce operational risk
|
||||
- Reduce exposure to liability
|
||||
- Detect security incidents early
|
||||
- Ensure recoverability of client data
|
||||
- Maintain stable uptime and user access
|
||||
- Establish evidence of reasonable administrative diligence
|
||||
|
||||
This document assumes:
|
||||
- Remote users
|
||||
- No internal IT staff
|
||||
- Small pilot deployment
|
||||
- Shared responsibility model between consultant and client
|
||||
- MFA enforcement in both Fastmail and Nextcloud
|
||||
|
||||
---
|
||||
|
||||
# 2. Operational Philosophy
|
||||
|
||||
The environment should be treated as:
|
||||
|
||||
- A business collaboration platform
|
||||
- A controlled data environment
|
||||
- A security-sensitive system
|
||||
- A system requiring documented administrative oversight
|
||||
|
||||
Because the platform contains:
|
||||
- Client communications
|
||||
- Potential confidential documents
|
||||
- Shared file repositories
|
||||
- User credentials
|
||||
- Internet-exposed services
|
||||
|
||||
…administration must prioritize:
|
||||
|
||||
1. Security first
|
||||
2. Recoverability second
|
||||
3. Stability third
|
||||
4. Convenience last
|
||||
|
||||
---
|
||||
|
||||
# 3. Daily Operational Tasks
|
||||
|
||||
---
|
||||
|
||||
## 3.1 Morning Health Check
|
||||
|
||||
### Frequency
|
||||
Daily (business days)
|
||||
|
||||
### Estimated Time
|
||||
10–15 minutes
|
||||
|
||||
### Objective
|
||||
Confirm that all core systems are operational before users begin work.
|
||||
|
||||
### Tasks
|
||||
|
||||
#### Infrastructure
|
||||
- Verify droplet online in DigitalOcean
|
||||
- Verify CPU/RAM/disk usage within normal thresholds
|
||||
- Verify disk utilization below 80%
|
||||
- Verify Docker daemon operational
|
||||
|
||||
#### Services
|
||||
- Verify Nextcloud web login functional
|
||||
- Verify Fastmail operational status
|
||||
- Verify SSL certificates valid
|
||||
- Verify reverse proxy routing functional
|
||||
|
||||
#### Containers
|
||||
Check:
|
||||
- Nextcloud container
|
||||
- Database container
|
||||
- Redis container
|
||||
- Reverse proxy container
|
||||
|
||||
Example:
|
||||
```bash
|
||||
docker ps
|
||||
```
|
||||
|
||||
#### External Access Test
|
||||
Validate:
|
||||
- HTTPS access
|
||||
- File upload/download
|
||||
- Login functionality
|
||||
|
||||
#### Email
|
||||
Send/receive test email through Fastmail test account.
|
||||
|
||||
### Deliverable
|
||||
- Daily operational log entry
|
||||
|
||||
---
|
||||
|
||||
## 3.2 Security Event Review
|
||||
|
||||
### Frequency
|
||||
Daily
|
||||
|
||||
### Estimated Time
|
||||
10 minutes
|
||||
|
||||
### Objective
|
||||
Identify suspicious activity before escalation.
|
||||
|
||||
### Tasks
|
||||
|
||||
#### Review:
|
||||
- Failed login attempts
|
||||
- MFA failures
|
||||
- New device logins
|
||||
- Suspicious IP addresses
|
||||
- Excessive upload activity
|
||||
- Unexpected admin actions
|
||||
|
||||
#### Check:
|
||||
- Nextcloud security warnings
|
||||
- Linux auth logs
|
||||
- Docker errors
|
||||
- Reverse proxy logs
|
||||
|
||||
Example:
|
||||
```bash
|
||||
sudo journalctl -p 3 -xb
|
||||
```
|
||||
|
||||
### Escalation Triggers
|
||||
Immediate escalation if:
|
||||
- Multiple failed admin logins
|
||||
- MFA bypass suspicion
|
||||
- Unknown admin account
|
||||
- Malware/ransomware indicators
|
||||
- Unexpected outbound traffic
|
||||
|
||||
### Deliverable
|
||||
- Security review noted in operational log
|
||||
|
||||
---
|
||||
|
||||
## 3.3 Backup Verification
|
||||
|
||||
### Frequency
|
||||
Daily
|
||||
|
||||
### Estimated Time
|
||||
5–10 minutes
|
||||
|
||||
### Objective
|
||||
Verify backups completed successfully.
|
||||
|
||||
### Tasks
|
||||
|
||||
#### Verify:
|
||||
- Scheduled backup job completed
|
||||
- Backup storage reachable
|
||||
- Backup size reasonable
|
||||
- No corruption warnings
|
||||
- Snapshot success in DigitalOcean
|
||||
|
||||
#### Validate:
|
||||
- Latest backup timestamp
|
||||
- Database dump presence
|
||||
- File archive generation
|
||||
|
||||
### Important
|
||||
A backup that has not been validated should be treated as nonexistent.
|
||||
|
||||
### Deliverable
|
||||
- Backup verification entry in operational log
|
||||
|
||||
---
|
||||
|
||||
## 3.4 User Administration Review
|
||||
|
||||
### Frequency
|
||||
Daily
|
||||
|
||||
### Estimated Time
|
||||
5–10 minutes
|
||||
|
||||
### Objective
|
||||
Ensure user/account integrity.
|
||||
|
||||
### Tasks
|
||||
|
||||
#### Review:
|
||||
- New user requests
|
||||
- Disabled users
|
||||
- Terminated personnel
|
||||
- Permission changes
|
||||
- Shared folder permissions
|
||||
- Public links
|
||||
|
||||
#### Verify:
|
||||
- No orphaned admin accounts
|
||||
- MFA enabled for all admins
|
||||
- Least-privilege principles maintained
|
||||
|
||||
### High-Risk Areas
|
||||
- Shared folders with external access
|
||||
- Public upload links
|
||||
- Administrative delegation
|
||||
|
||||
### Deliverable
|
||||
- Access review note
|
||||
|
||||
---
|
||||
|
||||
## 3.5 Incident Queue Review
|
||||
|
||||
### Frequency
|
||||
Daily
|
||||
|
||||
### Estimated Time
|
||||
5–15 minutes
|
||||
|
||||
### Objective
|
||||
Identify unresolved operational or security issues.
|
||||
|
||||
### Tasks
|
||||
|
||||
Review:
|
||||
- User tickets
|
||||
- Error reports
|
||||
- Sync failures
|
||||
- Email delivery issues
|
||||
- Storage complaints
|
||||
- Permission problems
|
||||
|
||||
### Deliverable
|
||||
- Updated incident tracking
|
||||
|
||||
---
|
||||
|
||||
# 4. Weekly Operational Tasks
|
||||
|
||||
---
|
||||
|
||||
## 4.1 Operating System Updates
|
||||
|
||||
### Frequency
|
||||
Weekly
|
||||
|
||||
### Estimated Time
|
||||
30–60 minutes
|
||||
|
||||
### Objective
|
||||
Maintain security posture and system stability.
|
||||
|
||||
### Tasks
|
||||
|
||||
#### Linux Updates
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt upgrade
|
||||
```
|
||||
|
||||
#### Docker
|
||||
- Update container images
|
||||
- Rebuild containers if necessary
|
||||
- Remove unused images
|
||||
|
||||
#### Validate:
|
||||
- Nextcloud functionality after updates
|
||||
- Database connectivity
|
||||
- Reverse proxy operation
|
||||
|
||||
### Important
|
||||
Do NOT apply major-version upgrades during business hours.
|
||||
|
||||
### Deliverable
|
||||
- Patch log
|
||||
- Change log entry
|
||||
|
||||
---
|
||||
|
||||
## 4.2 Nextcloud Maintenance Review
|
||||
|
||||
### Frequency
|
||||
Weekly
|
||||
|
||||
### Estimated Time
|
||||
20–30 minutes
|
||||
|
||||
### Tasks
|
||||
|
||||
#### Review:
|
||||
- Security warnings
|
||||
- Integrity check results
|
||||
- App updates
|
||||
- Background jobs
|
||||
- Storage consumption
|
||||
|
||||
#### Validate:
|
||||
- Cron jobs functioning
|
||||
- File scanning healthy
|
||||
- No database corruption warnings
|
||||
|
||||
#### Execute
|
||||
```bash
|
||||
docker exec -it nextcloud-app php occ status
|
||||
```
|
||||
|
||||
### Deliverable
|
||||
- Weekly maintenance report
|
||||
|
||||
---
|
||||
|
||||
## 4.3 Backup Restore Test
|
||||
|
||||
### Frequency
|
||||
Weekly
|
||||
|
||||
### Estimated Time
|
||||
30–60 minutes
|
||||
|
||||
### Objective
|
||||
Prove recoverability.
|
||||
|
||||
### Tasks
|
||||
|
||||
Restore:
|
||||
- Single file
|
||||
- Database dump
|
||||
- User folder sample
|
||||
|
||||
### Verify:
|
||||
- File integrity
|
||||
- Permissions
|
||||
- Recovery speed
|
||||
|
||||
### Critical Principle
|
||||
If restore testing is not performed, liability exposure increases substantially.
|
||||
|
||||
### Deliverable
|
||||
- Restore validation report
|
||||
|
||||
---
|
||||
|
||||
## 4.4 Security Audit Review
|
||||
|
||||
### Frequency
|
||||
Weekly
|
||||
|
||||
### Estimated Time
|
||||
30 minutes
|
||||
|
||||
### Tasks
|
||||
|
||||
#### Review:
|
||||
- Admin accounts
|
||||
- Group memberships
|
||||
- External shares
|
||||
- Public links
|
||||
- Expired accounts
|
||||
- MFA compliance
|
||||
|
||||
#### Validate:
|
||||
- SSL certificate expiration dates
|
||||
- Firewall rules
|
||||
- SSH access
|
||||
- Root login disabled
|
||||
- Fail2Ban status (if implemented)
|
||||
|
||||
### Deliverable
|
||||
- Weekly security audit checklist
|
||||
|
||||
---
|
||||
|
||||
## 4.5 Capacity and Performance Review
|
||||
|
||||
### Frequency
|
||||
Weekly
|
||||
|
||||
### Estimated Time
|
||||
20–30 minutes
|
||||
|
||||
### Tasks
|
||||
|
||||
#### Analyze:
|
||||
- Storage growth
|
||||
- User growth
|
||||
- Bandwidth usage
|
||||
- CPU/RAM trends
|
||||
- Database size growth
|
||||
|
||||
#### Evaluate:
|
||||
- Need for droplet resize
|
||||
- Need for archive policies
|
||||
- Need for retention changes
|
||||
|
||||
### Deliverable
|
||||
- Capacity trend notes
|
||||
|
||||
---
|
||||
|
||||
## 4.6 Documentation and Change Log
|
||||
|
||||
### Frequency
|
||||
Weekly
|
||||
|
||||
### Estimated Time
|
||||
15–20 minutes
|
||||
|
||||
### Objective
|
||||
Maintain defensible operational records.
|
||||
|
||||
### Tasks
|
||||
|
||||
Document:
|
||||
- Changes made
|
||||
- Accounts added/removed
|
||||
- Incidents
|
||||
- Security events
|
||||
- Backup issues
|
||||
- Maintenance performed
|
||||
|
||||
### Important
|
||||
Operational documentation is part of liability protection.
|
||||
|
||||
If a breach occurs, documented operational diligence matters significantly.
|
||||
|
||||
### Deliverable
|
||||
- Weekly operational summary
|
||||
|
||||
---
|
||||
|
||||
# 5. Monthly Administrative Tasks
|
||||
|
||||
---
|
||||
|
||||
## 5.1 Full Disaster Recovery Exercise
|
||||
|
||||
### Estimated Time
|
||||
2–4 hours
|
||||
|
||||
### Tasks
|
||||
Simulate:
|
||||
- Server loss
|
||||
- Container rebuild
|
||||
- Restore from backup
|
||||
- DNS validation
|
||||
- SSL restoration
|
||||
|
||||
---
|
||||
|
||||
## 5.2 User Access Certification
|
||||
|
||||
### Estimated Time
|
||||
30–60 minutes
|
||||
|
||||
### Tasks
|
||||
Review with client:
|
||||
- Active users
|
||||
- Admin privileges
|
||||
- External sharing
|
||||
- Terminated employees
|
||||
|
||||
---
|
||||
|
||||
## 5.3 Security Policy Review
|
||||
|
||||
### Estimated Time
|
||||
30 minutes
|
||||
|
||||
### Tasks
|
||||
Review:
|
||||
- MFA compliance
|
||||
- Password standards
|
||||
- Administrative access
|
||||
- Training completion
|
||||
|
||||
---
|
||||
|
||||
# 6. Estimated Operational Effort
|
||||
|
||||
| Activity | Estimated Time |
|
||||
|---|---|
|
||||
| Daily Operations | 35–60 min/day |
|
||||
| Weekly Maintenance | 2–4 hrs/week |
|
||||
| Monthly DR/Security | 3–6 hrs/month |
|
||||
|
||||
---
|
||||
|
||||
# 7. Recommended Retainer Guidance
|
||||
|
||||
For a pilot of this size:
|
||||
|
||||
| Service Level | Estimated Monthly Hours |
|
||||
|---|---|
|
||||
| Minimal Reactive Support | 8–10 hrs |
|
||||
| Recommended Operational Support | 15–20 hrs |
|
||||
| Security-Conscious Managed Support | 25–35 hrs |
|
||||
|
||||
Given the recent discussions around:
|
||||
- liability
|
||||
- data protection
|
||||
- backup validation
|
||||
- MFA enforcement
|
||||
- user training
|
||||
- documented diligence
|
||||
|
||||
…the “Recommended Operational Support” tier is likely the minimum responsible posture.
|
||||
|
||||
---
|
||||
|
||||
# 8. Key Risk Areas to Monitor
|
||||
|
||||
The largest liability exposure areas are:
|
||||
|
||||
## Administrative Misconfiguration
|
||||
- Incorrect sharing permissions
|
||||
- Public links
|
||||
- Excessive admin rights
|
||||
|
||||
## Backup Failure
|
||||
- Silent backup corruption
|
||||
- Unverified restores
|
||||
|
||||
## Credential Compromise
|
||||
- Weak passwords
|
||||
- MFA disabled
|
||||
- Phishing
|
||||
|
||||
## Delayed Patching
|
||||
- Unpatched Nextcloud vulnerabilities
|
||||
- Docker/container CVEs
|
||||
- Linux exploits
|
||||
|
||||
## User Behavior
|
||||
- Unsafe uploads
|
||||
- Credential reuse
|
||||
- Local machine compromise
|
||||
|
||||
## Lack of Documentation
|
||||
- No operational evidence
|
||||
- No audit trail
|
||||
- Undefined responsibilities
|
||||
|
||||
---
|
||||
|
||||
# 9. Strong Recommendations
|
||||
|
||||
## Require:
|
||||
- MFA for all users
|
||||
- Mandatory admin training
|
||||
- Signed acceptable use/security acknowledgment
|
||||
- Principle of least privilege
|
||||
|
||||
## Strongly Recommended:
|
||||
- Centralized logging
|
||||
- Automated monitoring alerts
|
||||
- Offsite backups
|
||||
- Written incident response plan
|
||||
- Cyber liability / E&O insurance
|
||||
|
||||
## Avoid:
|
||||
- Shared admin accounts
|
||||
- Permanent public links
|
||||
- Unrestricted upload folders
|
||||
- Direct root SSH access
|
||||
- Unmanaged personal devices for administrators
|
||||
Reference in New Issue
Block a user