Files
ci/wiki/Architecture.md
ozan 33f967e42e docs: convert ci docs to the in-repo wiki/ standard + fix stale ECS facts
Adopt the team wiki convention (in-repo wiki/ folder, plain markdown) used in
tinqs/studio. Convert DEVOPS.md + PLAN.md and the heavy parts of README.md
into cross-linked wiki pages: Home, Architecture, DevOps-Reference,
Operations, Roadmap. Root README slimmed to a repo intro pointing at wiki/.

Corrects stale topology while converting:
- ECS cluster tinqs-git / EFS tinqs-git-repos retired 2026-06-05; platform now
  the standalone EC2 box tinqs-prod-gitea (ALB tinqs-git, ECR image, RDS).
- Records this session's fixes: deploy-label dry-run route, runner-name
  collisions, arikigame IAM bucket, and template deploy repointed ECS→EC2/SSM.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:43:05 +01:00

81 lines
3.9 KiB
Markdown

# Architecture
[← Home](README.md) · [DevOps Reference](DevOps-Reference.md) · [Operations](Operations.md) · [Roadmap](Roadmap.md)
```
Push → Gitea webhook → Lambda (tinqs-ci-dispatch) → EC2 Spot → act_runner → job → self-terminate
```
Runners are **ephemeral**: one Spot instance per job, self-terminating on completion. Private-repo clones are authenticated via `git config url.insteadOf` injected in the runner user-data.
## Key design decisions
- **Ephemeral Spot instances** (not Fargate, not persistent runners) — cheapest, cleanest, no state to manage.
- **`--ephemeral` on `act_runner register`** — the runner exits after one job, triggering `shutdown -h now` → the instance terminates. Without this, runners pile up as zombies (see the 25 May 2026 incident in [Operations](Operations.md)).
- **No local action cache** — `act_runner` uses go-git internally, which ignores `~/.gitconfig`. The `url.insteadOf` trick only works for the `git` binary (used by the `checkout` action), so action repos are cloned fresh each run. This is why `tinqs/ci` must stay public.
- **`tinqs.com`** — Gitea's `ROOT_URL` is `tinqs.com`. The old `git.tinqs.com` subdomain is retired.
## Composite actions
Bash-only composite actions (no Node.js runtime). Resolve via `@v1` (the main branch).
| Action | What it does |
|--------|-------------|
| `tinqs/ci/checkout@v1` | Clone a repo from tinqs.com (sparse checkout, depth control, token auth) |
| `tinqs/ci/setup-go@v1` | Install Go (skips if pre-baked in the AMI) |
| `tinqs/ci/setup-node@v1` | Install Node.js + pnpm (skips if pre-baked) |
| `tinqs/ci/setup-aws@v1` | Install AWS CLI + optional ECR login |
```yaml
steps:
- uses: tinqs/ci/checkout@v1
with:
sparse: 'cmd/tstudio'
- uses: tinqs/ci/setup-go@v1
- uses: tinqs/ci/setup-aws@v1
with:
ecr-login: 'true'
```
## Dispatcher (Lambda)
`orchestrator/dispatch/main.go` receives Gitea push webhooks, fetches `.gitea/workflows/*.yml` via the Gitea API, evaluates triggers (branch + path filters), reads each matched workflow's `runs-on` label, and launches a Spot instance with that label. Run state is tracked in DynamoDB.
Routing by label (`labelToSpot` map in `main.go`):
| Label | Instance | Use |
|-------|----------|-----|
| `go` | t3.small | Go builds (tstudio, proxy, docgen) |
| `docker` | t3.medium | Docker image builds (platform, bot) |
| `deploy` | t3.micro | S3 sync, CloudFront invalidation, SSM template deploy |
| `node` | t3.medium | Frontend builds |
| `godot` | t3.medium | Game exports (future) |
`runs-on: host` is skipped by the dispatcher (it's for a standing registered runner, not Spot).
> **Fixed 2026-06-07:** `deploy`-labelled jobs used to route to a separate executor Lambda (`tinqs-ci-exec`) that was deleted 26 May, so they silently hit a `[DRY RUN] Would invoke executor` no-op and never ran. They now fall through to the normal Spot path like every other label. A second bug — runner names derived from `runID[:12]` collided across same-commit deploys — was also fixed (names now use the full sanitised runID).
## Runner lifecycle (user-data)
```
boot → git auth config (url.insteadOf with GITEA_TOKEN)
→ act_runner register --ephemeral --labels <label>:host
→ act_runner daemon (blocks until job completes)
→ EXIT trap → shutdown -h now → instance terminates
```
## Runner images
Dockerfiles in `images/` — lean, purpose-built. Push to ECR with `images/build-all.sh v1`.
| Image | Contents |
|-------|----------|
| `base` | Alpine + git + AWS CLI + SSH |
| `go` | base + Go |
| `node` | base + Node + pnpm |
| `docker` | docker:dind + Go + AWS CLI |
| `deploy` | base only (lightest) |
| `godot` | base + headless Godot |
> Note: the live Spot runners boot from a **pre-baked AMI** (`RUNNER_AMI`, with Go/Node/Docker/act_runner installed), not these container images. The images exist for purpose-built runner variants; the AMI is the fast path.