Adopt the team wiki convention (in-repo wiki/ folder, plain markdown) used in tinqs/studio. Convert DEVOPS.md + PLAN.md and the heavy parts of README.md into cross-linked wiki pages: Home, Architecture, DevOps-Reference, Operations, Roadmap. Root README slimmed to a repo intro pointing at wiki/. Corrects stale topology while converting: - ECS cluster tinqs-git / EFS tinqs-git-repos retired 2026-06-05; platform now the standalone EC2 box tinqs-prod-gitea (ALB tinqs-git, ECR image, RDS). - Records this session's fixes: deploy-label dry-run route, runner-name collisions, arikigame IAM bucket, and template deploy repointed ECS→EC2/SSM. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3.9 KiB
Architecture
← Home · DevOps Reference · Operations · Roadmap
Push → Gitea webhook → Lambda (tinqs-ci-dispatch) → EC2 Spot → act_runner → job → self-terminate
Runners are ephemeral: one Spot instance per job, self-terminating on completion. Private-repo clones are authenticated via git config url.insteadOf injected in the runner user-data.
Key design decisions
- Ephemeral Spot instances (not Fargate, not persistent runners) — cheapest, cleanest, no state to manage.
--ephemeralonact_runner register— the runner exits after one job, triggeringshutdown -h now→ the instance terminates. Without this, runners pile up as zombies (see the 25 May 2026 incident in Operations).- No local action cache —
act_runneruses go-git internally, which ignores~/.gitconfig. Theurl.insteadOftrick only works for thegitbinary (used by thecheckoutaction), so action repos are cloned fresh each run. This is whytinqs/cimust stay public. tinqs.com— Gitea'sROOT_URListinqs.com. The oldgit.tinqs.comsubdomain is retired.
Composite actions
Bash-only composite actions (no Node.js runtime). Resolve via @v1 (the main branch).
| Action | What it does |
|---|---|
tinqs/ci/checkout@v1 |
Clone a repo from tinqs.com (sparse checkout, depth control, token auth) |
tinqs/ci/setup-go@v1 |
Install Go (skips if pre-baked in the AMI) |
tinqs/ci/setup-node@v1 |
Install Node.js + pnpm (skips if pre-baked) |
tinqs/ci/setup-aws@v1 |
Install AWS CLI + optional ECR login |
steps:
- uses: tinqs/ci/checkout@v1
with:
sparse: 'cmd/tstudio'
- uses: tinqs/ci/setup-go@v1
- uses: tinqs/ci/setup-aws@v1
with:
ecr-login: 'true'
Dispatcher (Lambda)
orchestrator/dispatch/main.go receives Gitea push webhooks, fetches .gitea/workflows/*.yml via the Gitea API, evaluates triggers (branch + path filters), reads each matched workflow's runs-on label, and launches a Spot instance with that label. Run state is tracked in DynamoDB.
Routing by label (labelToSpot map in main.go):
| Label | Instance | Use |
|---|---|---|
go |
t3.small | Go builds (tstudio, proxy, docgen) |
docker |
t3.medium | Docker image builds (platform, bot) |
deploy |
t3.micro | S3 sync, CloudFront invalidation, SSM template deploy |
node |
t3.medium | Frontend builds |
godot |
t3.medium | Game exports (future) |
runs-on: host is skipped by the dispatcher (it's for a standing registered runner, not Spot).
Fixed 2026-06-07:
deploy-labelled jobs used to route to a separate executor Lambda (tinqs-ci-exec) that was deleted 26 May, so they silently hit a[DRY RUN] Would invoke executorno-op and never ran. They now fall through to the normal Spot path like every other label. A second bug — runner names derived fromrunID[:12]collided across same-commit deploys — was also fixed (names now use the full sanitised runID).
Runner lifecycle (user-data)
boot → git auth config (url.insteadOf with GITEA_TOKEN)
→ act_runner register --ephemeral --labels <label>:host
→ act_runner daemon (blocks until job completes)
→ EXIT trap → shutdown -h now → instance terminates
Runner images
Dockerfiles in images/ — lean, purpose-built. Push to ECR with images/build-all.sh v1.
| Image | Contents |
|---|---|
base |
Alpine + git + AWS CLI + SSH |
go |
base + Go |
node |
base + Node + pnpm |
docker |
docker:dind + Go + AWS CLI |
deploy |
base only (lightest) |
godot |
base + headless Godot |
Note: the live Spot runners boot from a pre-baked AMI (
RUNNER_AMI, with Go/Node/Docker/act_runner installed), not these container images. The images exist for purpose-built runner variants; the AMI is the fast path.