2026-05-26 01:20:05 +01:00
# tinqs/ci
CI toolchain for Tinqs Studio — composite Gitea Actions and a Lambda dispatcher that orchestrates ephemeral Spot runners.
**This repo must stay public. ** act_runner (go-git) clones action repos without auth. All other tinqs repos are private.
## Architecture
```
Push → Gitea webhook → Lambda (tinqs-ci-dispatch) → EC2 Spot → act_runner → job → self-terminate
```
Runners are ephemeral: one Spot instance per job, self-terminates on completion. Private repo clones are authenticated via `git config url.insteadOf` injected in the runner user-data.
### Key design decisions
- **Ephemeral Spot instances** (not Fargate, not persistent runners) — cheapest, cleanest, no state to manage.
- **`--ephemeral` on `act_runner register` ** — runner exits after one job, triggering `shutdown -h now` → instance terminates. Without this, runners pile up as zombies.
- **No local action cache** — act_runner uses go-git internally which ignores `~/.gitconfig` . The `url.insteadOf` trick only works for the git binary (used by checkout action).
2026-05-26 05:26:48 +01:00
- **`tinqs.com` ** — Gitea's ROOT_URL is `tinqs.com` . The old `git.tinqs.com` subdomain is retired.
2026-05-26 01:20:05 +01:00
## Actions
| Action | What it does |
|--------|-------------|
| `tinqs/ci/checkout@v1` | Clone a repo from tinqs.com (sparse checkout, depth control, token auth) |
| `tinqs/ci/setup-go@v1` | Install Go (skips if pre-baked in AMI) |
| `tinqs/ci/setup-node@v1` | Install Node.js + pnpm (skips if pre-baked) |
| `tinqs/ci/setup-aws@v1` | Install AWS CLI + optional ECR login |
``` yaml
steps :
- uses : tinqs/ci/checkout@v1
with :
sparse : 'cmd/tstudio'
- uses : tinqs/ci/setup-go@v1
- uses : tinqs/ci/setup-aws@v1
with :
ecr-login : 'true'
```
## Dispatcher (Lambda)
`orchestrator/dispatch/main.go` — receives Gitea webhooks, evaluates workflow triggers (branch + path filters), launches Spot instances with the right label.
| Label | Instance | Use |
|-------|----------|-----|
| `go` | t3.small | Go builds (tstudio, proxy, docgen) |
| `docker` | t3.medium | Docker image builds (platform, bot) |
| `deploy` | t3.micro | S3 sync, ECS update |
| `node` | t3.medium | Frontend builds |
| `godot` | t3.medium | Game exports (future) |
Runner user-data flow: boot → git auth config → act_runner register (ephemeral) → daemon → job → exit → shutdown → terminate.
## Runner Images
Dockerfiles in `images/` — lean, purpose-built. Push to ECR with `images/build-all.sh v1` .
| Image | Contents |
|-------|----------|
| `base` | Alpine + git + AWS CLI + SSH |
| `go` | base + Go 1.26 |
| `node` | base + Node 22 + pnpm |
| `docker` | docker:dind + Go + AWS CLI |
| `deploy` | base only (lightest) |
| `godot` | base + headless Godot 4.6 |
## Deploying the dispatcher
The dispatcher Lambda can't CI itself — deploy manually:
``` bash
cd orchestrator/dispatch
# Build
GOOS = linux GOARCH = amd64 CGO_ENABLED = 0 go build -o bootstrap -ldflags "-s -w" .
# Zip
# Windows:
powershell -Command "Compress-Archive -Path bootstrap -DestinationPath function.zip -Force"
# Mac/Linux:
zip -j function .zip bootstrap
# Deploy
aws lambda update-function-code --region eu-west-1 \
--function-name tinqs-ci-dispatch \
--zip-file fileb://function.zip
# Trigger a test build
# Push any change to cmd/tstudio/ in tinqs/studio
```
## Lambda env vars
Configured in AWS console, not in code:
| Var | Purpose |
|-----|---------|
| `GITEA_URL` | `https://tinqs.com` |
| `GITEA_TOKEN` | API token — used for fetching workflows AND runner git auth |
| `RUNNER_TOKEN` | act_runner registration token (from Gitea admin → Runners) |
| `RUNNER_AMI` | Pre-baked AMI ID (Go, Node, Docker, act_runner installed) |
| `SUBNET` | VPC subnet for Spot instances |
| `SECURITY_GROUP` | SG allowing outbound HTTPS |
| `DDB_TABLE` | DynamoDB table for run tracking (`tinqs-ci-runs` ) |
| `INSTANCE_PROFILE` | IAM role for runner instances (S3, ECR, ECS access) |
## Monitoring
``` bash
# Zombie check (should be 0 except during active builds)
aws ec2 describe-instances --region eu-west-1 \
--filters "Name=tag:tinqs-ci,Values=true" "Name=instance-state-name,Values=running" \
--query 'Reservations[].Instances[].InstanceId'
# Lambda dispatch logs (use MSYS_NO_PATHCONV=1 on Windows/Git Bash)
MSYS_NO_PATHCONV = 1 aws logs tail '/aws/lambda/tinqs-ci-dispatch' --region eu-west-1
# Build logs
TOKEN = <your-gitea-token>
curl -s "https://tinqs.com/api/v1/repos/tinqs/studio/actions/jobs/<JOB_ID>/logs" \
-H " Authorization: token $TOKEN "
# Runner instance logs (while instance is alive)
aws ssm send-command --region eu-west-1 --instance-ids <ID> \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["cat /var/log/tinqs-ci.log"]'
# Stale DynamoDB runs
aws dynamodb scan --region eu-west-1 --table-name tinqs-ci-runs \
--filter-expression "#s = :r" \
--expression-attribute-names '{"#s":"status"}' \
--expression-attribute-values '{":r":{"S":"running"}}' \
--query Count
```
Full debug guide: `tinqs/docs/.cursor/skills/ci-pipeline-discipline/SKILL.md`
## Contributing
### Adding a new composite action
1. Create `<action-name>/action.yml` with `using: composite` and `shell: bash`
2. Keep it simple — no Node.js runtime, just bash
3. Add a `<action-name>/README.md` with inputs/outputs
4. Add to the Actions table in this README
5. Push to main — actions resolve via `@v1` (main branch)
### Modifying the dispatcher
1. Edit `orchestrator/dispatch/main.go`
2. Build: `go build .` (catches compile errors)
3. Deploy manually (see Deploying above)
4. Verify: push a change to `tinqs/studio` and watch the pipeline
### Adding a new runner label
1. Add entry to `labelToSpot` map in `main.go`
2. Create `images/<label>/Dockerfile` if needed
3. Build and push image: `cd images && ./build-all.sh v1`
4. Deploy updated Lambda
5. Add `runs-on: <label>` to the workflow that needs it
### Updating the AMI
1. Launch a t3.small from the current AMI (`RUNNER_AMI` env var)
2. SSH in, install/update tools
3. Create AMI: `aws ec2 create-image --instance-id <ID> --name tinqs-ci-runner-v<N>`
4. Update `RUNNER_AMI` Lambda env var
5. Terminate the build instance
## Incidents
- **25 May 2026**: 18 zombie runners DDoS-ing Gitea. Root cause: no `--ephemeral` on registration + no git auth after repos went private. Full post-mortem: `tinqs/internal/incidents/ci-zombie-runners-2026-05-25.md`