Files
act_runner/go.mod
Bo-Yi Wu f2d545565f perf: reduce runner-to-server connection load with adaptive reporting and polling (#819)
## Summary

Many teams self-host Gitea + Act Runner at scale. The current runner design causes excessive HTTP requests to the Gitea server, leading to high server load. This PR addresses three root causes: aggressive fixed-interval polling, per-task status reporting every 1 second regardless of activity, and unoptimized HTTP client configuration.

## Problem

The original architecture has these issues:

**1. Fixed 1-second reporting interval (RunDaemon)**

- Every running task calls ReportLog + ReportState every 1 second (2 HTTP requests/sec/task)
- These requests are sent even when there are no new log rows or state changes
- With 200 runners × 3 tasks each = **1,200 req/sec just for status reporting**

**2. Fixed 2-second polling interval (no backoff)**

- Idle runners poll FetchTask every 2 seconds forever, even when no jobs are queued
- No exponential backoff or jitter — all runners can synchronize after network recovery (thundering herd)
- 200 idle runners = **100 req/sec doing nothing useful**

**3. HTTP client not tuned**

- Uses http.DefaultClient with MaxIdleConnsPerHost=2, causing frequent TCP/TLS reconnects
- Creates two separate http.Client instances (one for Ping, one for Runner service) instead of sharing

**Total: ~1,300 req/sec for 200 runners with 3 tasks each**

## Solution

### Adaptive Event-Driven Log Reporting

Replace the recursive `time.AfterFunc(1s)` pattern in RunDaemon with a goroutine-based select event loop using three trigger mechanisms:

| Trigger | Default | Purpose |
|---------|---------|---------|
| `log_report_max_latency` | 3s | Guarantee even a single log line is delivered within this time |
| `log_report_interval` | 5s | Periodic sweep — steady-state cadence |
| `log_report_batch_size` | 100 rows | Immediate flush during bursty output (e.g., npm install) |

**Key design**: `log_report_max_latency` (3s) must be less than `log_report_interval` (5s) so the max-latency timer fires before the periodic ticker for single-line scenarios.

State reporting is decoupled to its own `state_report_interval` (default 5s), with immediate flush on step transitions (start/stop) via a stateNotify channel for responsive frontend UX.

Additionally:
- Skip ReportLog when `len(rows) == 0` (no pending log rows)
- Skip ReportState when `stateChanged == false && len(outputs) == 0` (nothing changed)
- Move expensive `proto.Clone` after the early-return check to avoid deep copies on no-op paths

### Polling Backoff with Jitter

Replace fixed `rate.Limiter` with adaptive exponential backoff:
- Track `consecutiveEmpty` and `consecutiveErrors` counters
- Interval doubles with each empty/error response: `base × 2^(n-1)`, capped at `fetch_interval_max` (default 60s)
- Add ±20% random jitter to prevent thundering herd
- Fetch first, sleep after ��� preserves burst=1 behavior for immediate first fetch on startup and after task completion

### HTTP Client Tuning

- Configure custom `http.Transport` with `MaxIdleConnsPerHost=10` (was 2)
- Share a single `http.Client` between PingService and RunnerService
- Add `IdleConnTimeout=90s` for clean connection lifecycle

## Load Reduction

For 200 runners × 3 tasks (70% with active log output):

| Component | Before | After | Reduction |
|-----------|--------|-------|-----------|
| Polling (idle) | 100 req/s | ~3.4 req/s | 97% |
| Log reporting | 420 req/s | ~84 req/s | 80% |
| State reporting | 126 req/s | ~25 req/s | 80% |
| **Total** | **~1,300 req/s** | **~113 req/s** | **~91%** |

## Frontend UX Impact

| Scenario | Before | After | Notes |
|----------|--------|-------|-------|
| Continuous output (npm install) | ~1s | ~5s | Periodic ticker sweep |
| Single line then silence | ~1s | ≤3s | maxLatencyTimer guarantee |
| Bursty output (100+ lines) | ~1s | <1s | Batch size immediate flush |
| Step start/stop | ~1s | <1s | stateNotify immediate flush |
| Job completion | ~1s | ~1s | Close() retry unchanged |

## New Configuration Options

All have safe defaults — existing config files need no changes:

```yaml
runner:
  fetch_interval_max: 60s        # Max backoff interval when idle
  log_report_interval: 5s        # Periodic log flush interval
  log_report_max_latency: 3s     # Max time a log row waits (must be < log_report_interval)
  log_report_batch_size: 100     # Immediate flush threshold
  state_report_interval: 5s      # State flush interval (step transitions are always immediate)
```

Config validation warns on invalid combinations:
- `fetch_interval_max < fetch_interval` → auto-corrected
- `log_report_max_latency >= log_report_interval` → warning (timer would be redundant)

## Test Plan

- [x] `go build ./...` passes
- [x] `go test ./...` passes (all existing + 3 new tests)
- [x] `golangci-lint run` — 0 issues
- [x] TestReporter_MaxLatencyTimer — verifies single log line flushed by maxLatencyTimer before logTicker
- [x] TestReporter_BatchSizeFlush — verifies batch size threshold triggers immediate flush
- [x] TestReporter_StateNotifyFlush — verifies step transition triggers immediate state flush
- [x] TestReporter_EphemeralRunnerDeletion — verifies Close/RunDaemon race safety
- [x] TestReporter_RunDaemonClose_Race — verifies concurrent Close safety

Reviewed-on: https://gitea.com/gitea/act_runner/pulls/819
Reviewed-by: Nicolas <173651+bircni@noreply.gitea.com>
Co-authored-by: Bo-Yi Wu <appleboy.tw@gmail.com>
Co-committed-by: Bo-Yi Wu <appleboy.tw@gmail.com>
2026-04-14 11:29:25 +00:00

117 lines
5.3 KiB
Modula-2

module gitea.com/gitea/act_runner
go 1.26.0
require (
code.gitea.io/actions-proto-go v0.4.1
code.gitea.io/gitea-vet v0.2.3
connectrpc.com/connect v1.19.1
github.com/avast/retry-go/v4 v4.7.0
github.com/docker/docker v25.0.13+incompatible
github.com/joho/godotenv v1.5.1
github.com/mattn/go-isatty v0.0.20
github.com/nektos/act v0.0.0 // will be replaced
github.com/sirupsen/logrus v1.9.4
github.com/spf13/cobra v1.10.2
github.com/stretchr/testify v1.11.1
go.yaml.in/yaml/v4 v4.0.0-rc.3
golang.org/x/term v0.40.0
golang.org/x/time v0.14.0 // indirect
google.golang.org/protobuf v1.36.11
gopkg.in/yaml.v3 v3.0.1
gotest.tools/v3 v3.5.2
)
require (
cyphar.com/go-pathrs v0.2.3 // indirect
dario.cat/mergo v1.0.2 // indirect
github.com/AdaLogics/go-fuzz-headers v0.0.0-20240806141605-e8a1dd7889d6 // indirect
github.com/Masterminds/semver v1.5.0 // indirect
github.com/Microsoft/go-winio v0.6.2 // indirect
github.com/ProtonMail/go-crypto v1.3.0 // indirect
github.com/bmatcuk/doublestar/v4 v4.10.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/clipperhouse/uax29/v2 v2.7.0 // indirect
github.com/cloudflare/circl v1.6.3 // indirect
github.com/containerd/containerd v1.7.29 // indirect
github.com/containerd/log v0.1.0 // indirect
github.com/creack/pty v1.1.24 // indirect
github.com/cyphar/filepath-securejoin v0.6.1 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/distribution/reference v0.6.0 // indirect
github.com/docker/cli v25.0.3+incompatible // indirect
github.com/docker/distribution v2.8.3+incompatible // indirect
github.com/docker/docker-credential-helpers v0.9.5 // indirect
github.com/docker/go-connections v0.6.0 // indirect
github.com/docker/go-units v0.5.0 // indirect
github.com/emirpasic/gods v1.18.1 // indirect
github.com/fatih/color v1.18.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/go-git/gcfg v1.5.1-0.20230307220236-3a3c6141e376 // indirect
github.com/go-git/go-billy/v5 v5.7.0 // indirect
github.com/go-git/go-git/v5 v5.16.5 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/gobwas/glob v0.2.3 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
github.com/imdario/mergo v0.3.16 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99 // indirect
github.com/julienschmidt/httprouter v1.3.0 // indirect
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51 // indirect
github.com/kevinburke/ssh_config v1.6.0 // indirect
github.com/klauspost/compress v1.18.4 // indirect
github.com/klauspost/cpuid/v2 v2.3.0 // indirect
github.com/mattn/go-colorable v0.1.14 // indirect
github.com/mattn/go-runewidth v0.0.20 // indirect
github.com/mattn/go-shellwords v1.0.12 // indirect
github.com/mitchellh/mapstructure v1.1.2 // indirect
github.com/moby/buildkit v0.13.2 // indirect
github.com/moby/patternmatcher v0.6.0 // indirect
github.com/moby/sys/sequential v0.6.0 // indirect
github.com/moby/sys/user v0.4.0 // indirect
github.com/moby/sys/userns v0.1.0 // indirect
github.com/moby/term v0.5.2 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/image-spec v1.1.1 // indirect
github.com/opencontainers/selinux v1.13.1 // indirect
github.com/pjbgf/sha1cd v0.5.0 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/rhysd/actionlint v1.7.11 // indirect
github.com/robfig/cron/v3 v3.0.1 // indirect
github.com/sergi/go-diff v1.4.0 // indirect
github.com/skeema/knownhosts v1.3.2 // indirect
github.com/spf13/pflag v1.0.10 // indirect
github.com/stretchr/objx v0.5.3 // indirect
github.com/timshannon/bolthold v0.0.0-20240314194003-30aac6950928 // indirect
github.com/xanzy/ssh-agent v0.3.3 // indirect
github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb // indirect
github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415 // indirect
github.com/xeipuuv/gojsonschema v1.2.0 // indirect
go.etcd.io/bbolt v1.4.3 // indirect
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.65.0 // indirect
go.opentelemetry.io/otel v1.40.0 // indirect
go.opentelemetry.io/otel/metric v1.40.0 // indirect
go.opentelemetry.io/otel/trace v1.40.0 // indirect
golang.org/x/crypto v0.48.0 // indirect
golang.org/x/net v0.50.0 // indirect
golang.org/x/sync v0.19.0 // indirect
golang.org/x/sys v0.41.0 // indirect
golang.org/x/tools v0.42.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1 // indirect
google.golang.org/grpc v1.67.0 // indirect
gopkg.in/warnings.v0 v0.1.2 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
)
replace github.com/nektos/act => gitea.com/gitea/act v0.261.10
// Remove after github.com/docker/distribution is updated to support distribution/reference v0.6.0
// (pulled in via moby/buildkit, breaks on undefined: reference.SplitHostname)
replace github.com/distribution/reference v0.6.0 => github.com/distribution/reference v0.5.0