docs: convert ci docs to the in-repo wiki/ standard + fix stale ECS facts

Adopt the team wiki convention (in-repo wiki/ folder, plain markdown) used in
tinqs/studio. Convert DEVOPS.md + PLAN.md and the heavy parts of README.md
into cross-linked wiki pages: Home, Architecture, DevOps-Reference,
Operations, Roadmap. Root README slimmed to a repo intro pointing at wiki/.

Corrects stale topology while converting:
- ECS cluster tinqs-git / EFS tinqs-git-repos retired 2026-06-05; platform now
  the standalone EC2 box tinqs-prod-gitea (ALB tinqs-git, ECR image, RDS).
- Records this session's fixes: deploy-label dry-run route, runner-name
  collisions, arikigame IAM bucket, and template deploy repointed ECS→EC2/SSM.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-07 20:43:05 +01:00
parent 4076cf67b7
commit 33f967e42e
8 changed files with 377 additions and 295 deletions
+33
View File
@@ -0,0 +1,33 @@
# Roadmap
[← Home](README.md) · [Architecture](Architecture.md) · [DevOps Reference](DevOps-Reference.md) · [Operations](Operations.md)
## Done
- [x] Composite actions: `checkout`, `setup-go`, `setup-node`, `setup-aws`
- [x] Lambda dispatcher with Spot instance routing by `runs-on` label
- [x] Ephemeral runners (one job, self-terminate)
- [x] Git auth for private repos (`url.insteadOf`)
- [x] DynamoDB run tracking + cleanup cron
- [x] Runner image Dockerfiles: base, go, node, docker, deploy, godot
- [x] Zombie runner incident resolved (25 May 2026)
- [x] `deploy`-label jobs routed through Spot (was dead-Lambda dry-run) (07 Jun 2026)
- [x] Unique Spot runner names per dispatch (07 Jun 2026)
- [x] Template deploy repointed off deleted ECS → EC2 via SSM (07 Jun 2026)
## Next
| Priority | Task | Impact |
|----------|------|--------|
| P1 | Pre-warm Go module + build cache in the AMI | 30s build time |
| P1 | Automate AMI build (Packer or script) | Repeatable, no manual SSH |
| P2 | Internal DNS for git clones | Faster than public HTTPS |
| P2 | CloudWatch agent on the runner AMI | Persistent logs after instance death |
| P3 | `tinqs/ci/deploy-s3` action | S3 sync + CloudFront invalidation wrapper |
| P3 | `tinqs/ci/deploy-ssm` action | Reusable SSM-to-prod deploy (generalise the template-deploy step) |
| P3 | `tinqs/ci/notify` action | Post build status to GChat |
## Watch / cleanup
- **Repo size** — `tinqs/studio` now commits the arikigame site assets (~75 MB) as regular files because the CI `checkout` does no `git lfs pull`. If this grows, add `git lfs pull` to the checkout action, then LFS-track `web/arikigame/public/img/**`.
- **DEVOPS doc drift** — keep this wiki current when AWS topology changes (the ECS→EC2 move went unnoticed in docs for two days and broke deploys).