mirror of
https://gitea.com/gitea/act_runner.git
synced 2026-04-24 21:00:27 +08:00
## Summary Many teams self-host Gitea + Act Runner at scale. The current runner design causes excessive HTTP requests to the Gitea server, leading to high server load. This PR addresses three root causes: aggressive fixed-interval polling, per-task status reporting every 1 second regardless of activity, and unoptimized HTTP client configuration. ## Problem The original architecture has these issues: **1. Fixed 1-second reporting interval (RunDaemon)** - Every running task calls ReportLog + ReportState every 1 second (2 HTTP requests/sec/task) - These requests are sent even when there are no new log rows or state changes - With 200 runners × 3 tasks each = **1,200 req/sec just for status reporting** **2. Fixed 2-second polling interval (no backoff)** - Idle runners poll FetchTask every 2 seconds forever, even when no jobs are queued - No exponential backoff or jitter — all runners can synchronize after network recovery (thundering herd) - 200 idle runners = **100 req/sec doing nothing useful** **3. HTTP client not tuned** - Uses http.DefaultClient with MaxIdleConnsPerHost=2, causing frequent TCP/TLS reconnects - Creates two separate http.Client instances (one for Ping, one for Runner service) instead of sharing **Total: ~1,300 req/sec for 200 runners with 3 tasks each** ## Solution ### Adaptive Event-Driven Log Reporting Replace the recursive `time.AfterFunc(1s)` pattern in RunDaemon with a goroutine-based select event loop using three trigger mechanisms: | Trigger | Default | Purpose | |---------|---------|---------| | `log_report_max_latency` | 3s | Guarantee even a single log line is delivered within this time | | `log_report_interval` | 5s | Periodic sweep — steady-state cadence | | `log_report_batch_size` | 100 rows | Immediate flush during bursty output (e.g., npm install) | **Key design**: `log_report_max_latency` (3s) must be less than `log_report_interval` (5s) so the max-latency timer fires before the periodic ticker for single-line scenarios. State reporting is decoupled to its own `state_report_interval` (default 5s), with immediate flush on step transitions (start/stop) via a stateNotify channel for responsive frontend UX. Additionally: - Skip ReportLog when `len(rows) == 0` (no pending log rows) - Skip ReportState when `stateChanged == false && len(outputs) == 0` (nothing changed) - Move expensive `proto.Clone` after the early-return check to avoid deep copies on no-op paths ### Polling Backoff with Jitter Replace fixed `rate.Limiter` with adaptive exponential backoff: - Track `consecutiveEmpty` and `consecutiveErrors` counters - Interval doubles with each empty/error response: `base × 2^(n-1)`, capped at `fetch_interval_max` (default 60s) - Add ±20% random jitter to prevent thundering herd - Fetch first, sleep after ��� preserves burst=1 behavior for immediate first fetch on startup and after task completion ### HTTP Client Tuning - Configure custom `http.Transport` with `MaxIdleConnsPerHost=10` (was 2) - Share a single `http.Client` between PingService and RunnerService - Add `IdleConnTimeout=90s` for clean connection lifecycle ## Load Reduction For 200 runners × 3 tasks (70% with active log output): | Component | Before | After | Reduction | |-----------|--------|-------|-----------| | Polling (idle) | 100 req/s | ~3.4 req/s | 97% | | Log reporting | 420 req/s | ~84 req/s | 80% | | State reporting | 126 req/s | ~25 req/s | 80% | | **Total** | **~1,300 req/s** | **~113 req/s** | **~91%** | ## Frontend UX Impact | Scenario | Before | After | Notes | |----------|--------|-------|-------| | Continuous output (npm install) | ~1s | ~5s | Periodic ticker sweep | | Single line then silence | ~1s | ≤3s | maxLatencyTimer guarantee | | Bursty output (100+ lines) | ~1s | <1s | Batch size immediate flush | | Step start/stop | ~1s | <1s | stateNotify immediate flush | | Job completion | ~1s | ~1s | Close() retry unchanged | ## New Configuration Options All have safe defaults — existing config files need no changes: ```yaml runner: fetch_interval_max: 60s # Max backoff interval when idle log_report_interval: 5s # Periodic log flush interval log_report_max_latency: 3s # Max time a log row waits (must be < log_report_interval) log_report_batch_size: 100 # Immediate flush threshold state_report_interval: 5s # State flush interval (step transitions are always immediate) ``` Config validation warns on invalid combinations: - `fetch_interval_max < fetch_interval` → auto-corrected - `log_report_max_latency >= log_report_interval` → warning (timer would be redundant) ## Test Plan - [x] `go build ./...` passes - [x] `go test ./...` passes (all existing + 3 new tests) - [x] `golangci-lint run` — 0 issues - [x] TestReporter_MaxLatencyTimer — verifies single log line flushed by maxLatencyTimer before logTicker - [x] TestReporter_BatchSizeFlush — verifies batch size threshold triggers immediate flush - [x] TestReporter_StateNotifyFlush — verifies step transition triggers immediate state flush - [x] TestReporter_EphemeralRunnerDeletion — verifies Close/RunDaemon race safety - [x] TestReporter_RunDaemonClose_Race — verifies concurrent Close safety Reviewed-on: https://gitea.com/gitea/act_runner/pulls/819 Reviewed-by: Nicolas <173651+bircni@noreply.gitea.com> Co-authored-by: Bo-Yi Wu <appleboy.tw@gmail.com> Co-committed-by: Bo-Yi Wu <appleboy.tw@gmail.com>
117 lines
5.3 KiB
Modula-2
117 lines
5.3 KiB
Modula-2
module gitea.com/gitea/act_runner
|
|
|
|
go 1.26.0
|
|
|
|
require (
|
|
code.gitea.io/actions-proto-go v0.4.1
|
|
code.gitea.io/gitea-vet v0.2.3
|
|
connectrpc.com/connect v1.19.1
|
|
github.com/avast/retry-go/v4 v4.7.0
|
|
github.com/docker/docker v25.0.13+incompatible
|
|
github.com/joho/godotenv v1.5.1
|
|
github.com/mattn/go-isatty v0.0.20
|
|
github.com/nektos/act v0.0.0 // will be replaced
|
|
github.com/sirupsen/logrus v1.9.4
|
|
github.com/spf13/cobra v1.10.2
|
|
github.com/stretchr/testify v1.11.1
|
|
go.yaml.in/yaml/v4 v4.0.0-rc.3
|
|
golang.org/x/term v0.40.0
|
|
golang.org/x/time v0.14.0 // indirect
|
|
google.golang.org/protobuf v1.36.11
|
|
gopkg.in/yaml.v3 v3.0.1
|
|
gotest.tools/v3 v3.5.2
|
|
)
|
|
|
|
require (
|
|
cyphar.com/go-pathrs v0.2.3 // indirect
|
|
dario.cat/mergo v1.0.2 // indirect
|
|
github.com/AdaLogics/go-fuzz-headers v0.0.0-20240806141605-e8a1dd7889d6 // indirect
|
|
github.com/Masterminds/semver v1.5.0 // indirect
|
|
github.com/Microsoft/go-winio v0.6.2 // indirect
|
|
github.com/ProtonMail/go-crypto v1.3.0 // indirect
|
|
github.com/bmatcuk/doublestar/v4 v4.10.0 // indirect
|
|
github.com/cespare/xxhash/v2 v2.3.0 // indirect
|
|
github.com/clipperhouse/uax29/v2 v2.7.0 // indirect
|
|
github.com/cloudflare/circl v1.6.3 // indirect
|
|
github.com/containerd/containerd v1.7.29 // indirect
|
|
github.com/containerd/log v0.1.0 // indirect
|
|
github.com/creack/pty v1.1.24 // indirect
|
|
github.com/cyphar/filepath-securejoin v0.6.1 // indirect
|
|
github.com/davecgh/go-spew v1.1.1 // indirect
|
|
github.com/distribution/reference v0.6.0 // indirect
|
|
github.com/docker/cli v25.0.3+incompatible // indirect
|
|
github.com/docker/distribution v2.8.3+incompatible // indirect
|
|
github.com/docker/docker-credential-helpers v0.9.5 // indirect
|
|
github.com/docker/go-connections v0.6.0 // indirect
|
|
github.com/docker/go-units v0.5.0 // indirect
|
|
github.com/emirpasic/gods v1.18.1 // indirect
|
|
github.com/fatih/color v1.18.0 // indirect
|
|
github.com/felixge/httpsnoop v1.0.4 // indirect
|
|
github.com/go-git/gcfg v1.5.1-0.20230307220236-3a3c6141e376 // indirect
|
|
github.com/go-git/go-billy/v5 v5.7.0 // indirect
|
|
github.com/go-git/go-git/v5 v5.16.5 // indirect
|
|
github.com/go-logr/logr v1.4.3 // indirect
|
|
github.com/go-logr/stdr v1.2.2 // indirect
|
|
github.com/gobwas/glob v0.2.3 // indirect
|
|
github.com/gogo/protobuf v1.3.2 // indirect
|
|
github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 // indirect
|
|
github.com/google/go-cmp v0.7.0 // indirect
|
|
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
|
|
github.com/imdario/mergo v0.3.16 // indirect
|
|
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
|
github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99 // indirect
|
|
github.com/julienschmidt/httprouter v1.3.0 // indirect
|
|
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51 // indirect
|
|
github.com/kevinburke/ssh_config v1.6.0 // indirect
|
|
github.com/klauspost/compress v1.18.4 // indirect
|
|
github.com/klauspost/cpuid/v2 v2.3.0 // indirect
|
|
github.com/mattn/go-colorable v0.1.14 // indirect
|
|
github.com/mattn/go-runewidth v0.0.20 // indirect
|
|
github.com/mattn/go-shellwords v1.0.12 // indirect
|
|
github.com/mitchellh/mapstructure v1.1.2 // indirect
|
|
github.com/moby/buildkit v0.13.2 // indirect
|
|
github.com/moby/patternmatcher v0.6.0 // indirect
|
|
github.com/moby/sys/sequential v0.6.0 // indirect
|
|
github.com/moby/sys/user v0.4.0 // indirect
|
|
github.com/moby/sys/userns v0.1.0 // indirect
|
|
github.com/moby/term v0.5.2 // indirect
|
|
github.com/opencontainers/go-digest v1.0.0 // indirect
|
|
github.com/opencontainers/image-spec v1.1.1 // indirect
|
|
github.com/opencontainers/selinux v1.13.1 // indirect
|
|
github.com/pjbgf/sha1cd v0.5.0 // indirect
|
|
github.com/pkg/errors v0.9.1 // indirect
|
|
github.com/pmezard/go-difflib v1.0.0 // indirect
|
|
github.com/rhysd/actionlint v1.7.11 // indirect
|
|
github.com/robfig/cron/v3 v3.0.1 // indirect
|
|
github.com/sergi/go-diff v1.4.0 // indirect
|
|
github.com/skeema/knownhosts v1.3.2 // indirect
|
|
github.com/spf13/pflag v1.0.10 // indirect
|
|
github.com/stretchr/objx v0.5.3 // indirect
|
|
github.com/timshannon/bolthold v0.0.0-20240314194003-30aac6950928 // indirect
|
|
github.com/xanzy/ssh-agent v0.3.3 // indirect
|
|
github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb // indirect
|
|
github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415 // indirect
|
|
github.com/xeipuuv/gojsonschema v1.2.0 // indirect
|
|
go.etcd.io/bbolt v1.4.3 // indirect
|
|
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
|
|
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.65.0 // indirect
|
|
go.opentelemetry.io/otel v1.40.0 // indirect
|
|
go.opentelemetry.io/otel/metric v1.40.0 // indirect
|
|
go.opentelemetry.io/otel/trace v1.40.0 // indirect
|
|
golang.org/x/crypto v0.48.0 // indirect
|
|
golang.org/x/net v0.50.0 // indirect
|
|
golang.org/x/sync v0.19.0 // indirect
|
|
golang.org/x/sys v0.41.0 // indirect
|
|
golang.org/x/tools v0.42.0 // indirect
|
|
google.golang.org/genproto/googleapis/rpc v0.0.0-20240903143218-8af14fe29dc1 // indirect
|
|
google.golang.org/grpc v1.67.0 // indirect
|
|
gopkg.in/warnings.v0 v0.1.2 // indirect
|
|
gopkg.in/yaml.v2 v2.4.0 // indirect
|
|
)
|
|
|
|
replace github.com/nektos/act => gitea.com/gitea/act v0.261.10
|
|
|
|
// Remove after github.com/docker/distribution is updated to support distribution/reference v0.6.0
|
|
// (pulled in via moby/buildkit, breaks on undefined: reference.SplitHostname)
|
|
replace github.com/distribution/reference v0.6.0 => github.com/distribution/reference v0.5.0
|