Commit Graph

11 Commits

Author SHA1 Message Date
Bo-Yi Wu
f2d545565f perf: reduce runner-to-server connection load with adaptive reporting and polling (#819)
## Summary

Many teams self-host Gitea + Act Runner at scale. The current runner design causes excessive HTTP requests to the Gitea server, leading to high server load. This PR addresses three root causes: aggressive fixed-interval polling, per-task status reporting every 1 second regardless of activity, and unoptimized HTTP client configuration.

## Problem

The original architecture has these issues:

**1. Fixed 1-second reporting interval (RunDaemon)**

- Every running task calls ReportLog + ReportState every 1 second (2 HTTP requests/sec/task)
- These requests are sent even when there are no new log rows or state changes
- With 200 runners × 3 tasks each = **1,200 req/sec just for status reporting**

**2. Fixed 2-second polling interval (no backoff)**

- Idle runners poll FetchTask every 2 seconds forever, even when no jobs are queued
- No exponential backoff or jitter — all runners can synchronize after network recovery (thundering herd)
- 200 idle runners = **100 req/sec doing nothing useful**

**3. HTTP client not tuned**

- Uses http.DefaultClient with MaxIdleConnsPerHost=2, causing frequent TCP/TLS reconnects
- Creates two separate http.Client instances (one for Ping, one for Runner service) instead of sharing

**Total: ~1,300 req/sec for 200 runners with 3 tasks each**

## Solution

### Adaptive Event-Driven Log Reporting

Replace the recursive `time.AfterFunc(1s)` pattern in RunDaemon with a goroutine-based select event loop using three trigger mechanisms:

| Trigger | Default | Purpose |
|---------|---------|---------|
| `log_report_max_latency` | 3s | Guarantee even a single log line is delivered within this time |
| `log_report_interval` | 5s | Periodic sweep — steady-state cadence |
| `log_report_batch_size` | 100 rows | Immediate flush during bursty output (e.g., npm install) |

**Key design**: `log_report_max_latency` (3s) must be less than `log_report_interval` (5s) so the max-latency timer fires before the periodic ticker for single-line scenarios.

State reporting is decoupled to its own `state_report_interval` (default 5s), with immediate flush on step transitions (start/stop) via a stateNotify channel for responsive frontend UX.

Additionally:
- Skip ReportLog when `len(rows) == 0` (no pending log rows)
- Skip ReportState when `stateChanged == false && len(outputs) == 0` (nothing changed)
- Move expensive `proto.Clone` after the early-return check to avoid deep copies on no-op paths

### Polling Backoff with Jitter

Replace fixed `rate.Limiter` with adaptive exponential backoff:
- Track `consecutiveEmpty` and `consecutiveErrors` counters
- Interval doubles with each empty/error response: `base × 2^(n-1)`, capped at `fetch_interval_max` (default 60s)
- Add ±20% random jitter to prevent thundering herd
- Fetch first, sleep after ��� preserves burst=1 behavior for immediate first fetch on startup and after task completion

### HTTP Client Tuning

- Configure custom `http.Transport` with `MaxIdleConnsPerHost=10` (was 2)
- Share a single `http.Client` between PingService and RunnerService
- Add `IdleConnTimeout=90s` for clean connection lifecycle

## Load Reduction

For 200 runners × 3 tasks (70% with active log output):

| Component | Before | After | Reduction |
|-----------|--------|-------|-----------|
| Polling (idle) | 100 req/s | ~3.4 req/s | 97% |
| Log reporting | 420 req/s | ~84 req/s | 80% |
| State reporting | 126 req/s | ~25 req/s | 80% |
| **Total** | **~1,300 req/s** | **~113 req/s** | **~91%** |

## Frontend UX Impact

| Scenario | Before | After | Notes |
|----------|--------|-------|-------|
| Continuous output (npm install) | ~1s | ~5s | Periodic ticker sweep |
| Single line then silence | ~1s | ≤3s | maxLatencyTimer guarantee |
| Bursty output (100+ lines) | ~1s | <1s | Batch size immediate flush |
| Step start/stop | ~1s | <1s | stateNotify immediate flush |
| Job completion | ~1s | ~1s | Close() retry unchanged |

## New Configuration Options

All have safe defaults — existing config files need no changes:

```yaml
runner:
  fetch_interval_max: 60s        # Max backoff interval when idle
  log_report_interval: 5s        # Periodic log flush interval
  log_report_max_latency: 3s     # Max time a log row waits (must be < log_report_interval)
  log_report_batch_size: 100     # Immediate flush threshold
  state_report_interval: 5s      # State flush interval (step transitions are always immediate)
```

Config validation warns on invalid combinations:
- `fetch_interval_max < fetch_interval` → auto-corrected
- `log_report_max_latency >= log_report_interval` → warning (timer would be redundant)

## Test Plan

- [x] `go build ./...` passes
- [x] `go test ./...` passes (all existing + 3 new tests)
- [x] `golangci-lint run` — 0 issues
- [x] TestReporter_MaxLatencyTimer — verifies single log line flushed by maxLatencyTimer before logTicker
- [x] TestReporter_BatchSizeFlush — verifies batch size threshold triggers immediate flush
- [x] TestReporter_StateNotifyFlush — verifies step transition triggers immediate state flush
- [x] TestReporter_EphemeralRunnerDeletion — verifies Close/RunDaemon race safety
- [x] TestReporter_RunDaemonClose_Race — verifies concurrent Close safety

Reviewed-on: https://gitea.com/gitea/act_runner/pulls/819
Reviewed-by: Nicolas <173651+bircni@noreply.gitea.com>
Co-authored-by: Bo-Yi Wu <appleboy.tw@gmail.com>
Co-committed-by: Bo-Yi Wu <appleboy.tw@gmail.com>
2026-04-14 11:29:25 +00:00
silverwind
658101d9cb chore(lint): add golangci-lint v2 and fix all lint issues (#803)
## Summary
- Replace old `.golangci.yml` (v1 format) with v2 format, aligned with gitea's lint config
- Add `lint-go`, `lint-go-fix`, and `lint` Makefile targets using golangci-lint v2.10.1
- Replace `make vet` with `make lint` in CI workflow (lint includes vet)
- Fix all 35 lint issues: modernize (maps.Copy, range over int, any), perfsprint (errors.New), unparam (remove unused parameters), revive (var naming), staticcheck, forbidigo exclusion for cmd/
- Make `security-check` non-fatal (apply https://github.com/go-gitea/gitea/pull/36681)
- Remove dead gocritic exclusion rules (commentFormatting, exitAfterDefer)
- Remove dead linter exclusions and disabled checks (singleCaseSwitch, ST1003, QF1001, QF1006, QF1008, testifylint go-require/require-error, test file exclusions for dupl/errcheck/staticcheck/unparam)

## Test plan
- [x] `golangci-lint run` passes
- [x] `go build ./...` passes
- [x] `go test ./...` passes

---------

Co-authored-by: ChristopherHX <christopher.homberger@web.de>
Co-authored-by: Christopher Homberger <christopher.homberger@web.de>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/803
Reviewed-by: ChristopherHX <christopherhx@noreply.gitea.com>
2026-02-22 17:35:08 +00:00
garet90
8bc0275e74 feat: add once flag to daemon command (#19) (#598)
Once flag polls and completes one job then exits.

I use this with Windows Sandbox (and creating users with local brew install on Mac) to create a fresh environment every time.

Co-authored-by: Garet Halliday <garet@pit.dev>
Co-authored-by: Jason Song <wolfogre@noreply.gitea.com>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/598
Reviewed-by: techknowlogick <techknowlogick@noreply.gitea.com>
Reviewed-by: Jason Song <wolfogre@noreply.gitea.com>
Co-authored-by: garet90 <garet90@noreply.gitea.com>
Co-committed-by: garet90 <garet90@noreply.gitea.com>
2024-11-06 17:16:08 +00:00
rowan-allspice
d1d3cad4b0 feat: allow graceful shutdowns (#546)
Add a `Shutdown(context.Context) error` method to the Poller. Calling this method will first shutdown all active polling, preventing any new jobs from spawning. It will then wait for either all jobs to finish, or for the context to be cancelled. If the context is cancelled, it will then force all jobs to end, and then exit.

Fixes https://gitea.com/gitea/act_runner/issues/107

Co-authored-by: Rowan Bohde <rowan.bohde@gmail.com>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/546
Reviewed-by: Jason Song <i@wolfogre.com>
Reviewed-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: rowan-allspice <rowan-allspice@noreply.gitea.com>
Co-committed-by: rowan-allspice <rowan-allspice@noreply.gitea.com>
2024-05-27 07:38:55 +00:00
Jason Song
23ec12b8cf Bump act to v0.260.0 (#522)
Related to https://gitea.com/gitea/act/issues/99.

Also update other main dependencies.

Reviewed-on: https://gitea.com/gitea/act_runner/pulls/522
Reviewed-by: Zettat123 <zettat123@noreply.gitea.com>
2024-03-27 03:17:04 +00:00
sillyguodong
12999b61dd Reduce unnecessary DB queries for Actions tasks (#219)
implement: https://github.com/go-gitea/gitea/issues/24544

Changes:
- Add a global variable `tasksVersion` to store the lastest version number which returned by Gitea.
- Pass `tasksVersion` to Gitea when invoking `FetchTask`.
- If there is no task in the `FetchTask` response, store the version from the `FetchTask` response into `tasksVersion` variable.
- If there is a task in the `FetchTask` response, set `tasksVersion` to zero to focre query db in next `FetchTask` request.

Related:
- Protocol: https://gitea.com/gitea/actions-proto-def/pulls/10
- Gitea side: https://github.com/go-gitea/gitea/pull/25199

Reviewed-on: https://gitea.com/gitea/act_runner/pulls/219
Co-authored-by: sillyguodong <gedong_1994@163.com>
Co-committed-by: sillyguodong <gedong_1994@163.com>
2023-07-25 03:25:50 +00:00
caicandong
49a2fcc138 fix endless loop (#306)
fix endless loop in  poll
relate #305

Co-authored-by: CaiCandong <1290147055@qq.com>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/306
Co-authored-by: caicandong <caicandong@noreply.gitea.com>
Co-committed-by: caicandong <caicandong@noreply.gitea.com>
2023-07-24 07:07:53 +00:00
caicandong
a1bb3b56fd Catch the panic and print the error (#305)
refactor # 215  Catch the panic and print the error
close #215

Co-authored-by: CaiCandong <1290147055@qq.com>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/305
Co-authored-by: caicandong <caicandong@noreply.gitea.com>
Co-committed-by: caicandong <caicandong@noreply.gitea.com>
2023-07-24 04:28:44 +00:00
Jason Song
9c6499ec08 Fix panic when response is nil (#105)
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/105
Reviewed-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: Jason Song <i@wolfogre.com>
Co-committed-by: Jason Song <i@wolfogre.com>
2023-04-06 21:51:46 +08:00
Jason Song
d139faa40c Supports configuring fetch_timeout and fetch_interval. (#100)
Fix #99.

Fix #94.

Reviewed-on: https://gitea.com/gitea/act_runner/pulls/100
Reviewed-by: Zettat123 <zettat123@noreply.gitea.io>
2023-04-06 10:57:36 +08:00
Jason Song
220efa69c0 Refactor to new framework (#98)
- Adjust directory structure
```text
├── internal
│   ├── app
│   │   ├── artifactcache
│   │   ├── cmd
│   │   ├── poll
│   │   └── run
│   └── pkg
│       ├── client
│       ├── config
│       ├── envcheck
│       ├── labels
│       ├── report
│       └── ver
└── main.go
```
- New pkg `labels` to parse label
- New pkg `report` to report logs to Gitea
- Remove pkg `engine`, use `envcheck` to check if docker running.
- Rewrite `runtime` to `run`
- Rewrite `poller` to `poll`
- Simplify some code and remove what's useless.

Reviewed-on: https://gitea.com/gitea/act_runner/pulls/98
Reviewed-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: Jason Song <i@wolfogre.com>
Co-committed-by: Jason Song <i@wolfogre.com>
2023-04-04 21:32:04 +08:00