## Summary
Many teams self-host Gitea + Act Runner at scale. The current runner design causes excessive HTTP requests to the Gitea server, leading to high server load. This PR addresses three root causes: aggressive fixed-interval polling, per-task status reporting every 1 second regardless of activity, and unoptimized HTTP client configuration.
## Problem
The original architecture has these issues:
**1. Fixed 1-second reporting interval (RunDaemon)**
- Every running task calls ReportLog + ReportState every 1 second (2 HTTP requests/sec/task)
- These requests are sent even when there are no new log rows or state changes
- With 200 runners × 3 tasks each = **1,200 req/sec just for status reporting**
**2. Fixed 2-second polling interval (no backoff)**
- Idle runners poll FetchTask every 2 seconds forever, even when no jobs are queued
- No exponential backoff or jitter — all runners can synchronize after network recovery (thundering herd)
- 200 idle runners = **100 req/sec doing nothing useful**
**3. HTTP client not tuned**
- Uses http.DefaultClient with MaxIdleConnsPerHost=2, causing frequent TCP/TLS reconnects
- Creates two separate http.Client instances (one for Ping, one for Runner service) instead of sharing
**Total: ~1,300 req/sec for 200 runners with 3 tasks each**
## Solution
### Adaptive Event-Driven Log Reporting
Replace the recursive `time.AfterFunc(1s)` pattern in RunDaemon with a goroutine-based select event loop using three trigger mechanisms:
| Trigger | Default | Purpose |
|---------|---------|---------|
| `log_report_max_latency` | 3s | Guarantee even a single log line is delivered within this time |
| `log_report_interval` | 5s | Periodic sweep — steady-state cadence |
| `log_report_batch_size` | 100 rows | Immediate flush during bursty output (e.g., npm install) |
**Key design**: `log_report_max_latency` (3s) must be less than `log_report_interval` (5s) so the max-latency timer fires before the periodic ticker for single-line scenarios.
State reporting is decoupled to its own `state_report_interval` (default 5s), with immediate flush on step transitions (start/stop) via a stateNotify channel for responsive frontend UX.
Additionally:
- Skip ReportLog when `len(rows) == 0` (no pending log rows)
- Skip ReportState when `stateChanged == false && len(outputs) == 0` (nothing changed)
- Move expensive `proto.Clone` after the early-return check to avoid deep copies on no-op paths
### Polling Backoff with Jitter
Replace fixed `rate.Limiter` with adaptive exponential backoff:
- Track `consecutiveEmpty` and `consecutiveErrors` counters
- Interval doubles with each empty/error response: `base × 2^(n-1)`, capped at `fetch_interval_max` (default 60s)
- Add ±20% random jitter to prevent thundering herd
- Fetch first, sleep after ��� preserves burst=1 behavior for immediate first fetch on startup and after task completion
### HTTP Client Tuning
- Configure custom `http.Transport` with `MaxIdleConnsPerHost=10` (was 2)
- Share a single `http.Client` between PingService and RunnerService
- Add `IdleConnTimeout=90s` for clean connection lifecycle
## Load Reduction
For 200 runners × 3 tasks (70% with active log output):
| Component | Before | After | Reduction |
|-----------|--------|-------|-----------|
| Polling (idle) | 100 req/s | ~3.4 req/s | 97% |
| Log reporting | 420 req/s | ~84 req/s | 80% |
| State reporting | 126 req/s | ~25 req/s | 80% |
| **Total** | **~1,300 req/s** | **~113 req/s** | **~91%** |
## Frontend UX Impact
| Scenario | Before | After | Notes |
|----------|--------|-------|-------|
| Continuous output (npm install) | ~1s | ~5s | Periodic ticker sweep |
| Single line then silence | ~1s | ≤3s | maxLatencyTimer guarantee |
| Bursty output (100+ lines) | ~1s | <1s | Batch size immediate flush |
| Step start/stop | ~1s | <1s | stateNotify immediate flush |
| Job completion | ~1s | ~1s | Close() retry unchanged |
## New Configuration Options
All have safe defaults — existing config files need no changes:
```yaml
runner:
fetch_interval_max: 60s # Max backoff interval when idle
log_report_interval: 5s # Periodic log flush interval
log_report_max_latency: 3s # Max time a log row waits (must be < log_report_interval)
log_report_batch_size: 100 # Immediate flush threshold
state_report_interval: 5s # State flush interval (step transitions are always immediate)
```
Config validation warns on invalid combinations:
- `fetch_interval_max < fetch_interval` → auto-corrected
- `log_report_max_latency >= log_report_interval` → warning (timer would be redundant)
## Test Plan
- [x] `go build ./...` passes
- [x] `go test ./...` passes (all existing + 3 new tests)
- [x] `golangci-lint run` — 0 issues
- [x] TestReporter_MaxLatencyTimer — verifies single log line flushed by maxLatencyTimer before logTicker
- [x] TestReporter_BatchSizeFlush — verifies batch size threshold triggers immediate flush
- [x] TestReporter_StateNotifyFlush — verifies step transition triggers immediate state flush
- [x] TestReporter_EphemeralRunnerDeletion — verifies Close/RunDaemon race safety
- [x] TestReporter_RunDaemonClose_Race — verifies concurrent Close safety
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/819
Reviewed-by: Nicolas <173651+bircni@noreply.gitea.com>
Co-authored-by: Bo-Yi Wu <appleboy.tw@gmail.com>
Co-committed-by: Bo-Yi Wu <appleboy.tw@gmail.com>
## Summary
Adds a `container.bind_workdir` config option that exposes the nektos/act `BindWorkdir` setting. When enabled, workspaces are bind-mounted from the host filesystem instead of Docker volumes, which is required for DinD setups where jobs use `docker compose` with bind mounts (e.g. `.:/app`).
Each job gets an isolated workspace at `/workspace/<task_id>/<owner>/<repo>` to prevent concurrent jobs from the same repo interfering with each other. The task directory is cleaned up after job execution.
### Configuration
```yaml
container:
bind_workdir: true
```
When using this with DinD, also mount the workspace parent into the runner container and add it to `valid_volumes`:
```yaml
container:
valid_volumes:
- /workspace/**
```
*This PR was authored by Claude (AI assistant)*
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/810
Reviewed-by: ChristopherHX <38043+christopherhx@noreply.gitea.com>
Co-authored-by: silverwind <me@silverwind.io>
Co-committed-by: silverwind <me@silverwind.io>
We wanted the ability to disable outputting the logs from the individual job to the console. This changes the logging so that job logs are only output to the console whenever debug logging is enabled in `act_runner`, while still allowing the `Reporter` to receive these logs and forward them to Gitea when debug logging is not enabled.
Co-authored-by: Rowan Bohde <rowan.bohde@gmail.com>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/543
Reviewed-by: Jason Song <i@wolfogre.com>
Reviewed-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: rowan-allspice <rowan-allspice@noreply.gitea.com>
Co-committed-by: rowan-allspice <rowan-allspice@noreply.gitea.com>
- Removed `deadcode`, `structcheck`, and `varcheck` linters from `.golangci.yml`
- Fixed a typo in a comment in `daemon.go`
- Renamed `defaultActionsUrl` to `defaultActionsURL` in `exec.go`
- Removed unnecessary else clause in `exec.go` and `runner.go`
- Simplified variable initialization in `exec.go`
- Changed function name from `getHttpClient` to `getHTTPClient` in `http.go`
- Removed unnecessary else clause in `labels_test.go`
Signed-off-by: Bo-Yi Wu <appleboy.tw@gmail.com>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/289
Co-authored-by: Bo-Yi Wu <appleboy.tw@gmail.com>
Co-committed-by: Bo-Yi Wu <appleboy.tw@gmail.com>
This PR
- adds the `cache-server` command so act_runner can run as a cache server. When running as a cache server, act_runner only processes the requests related to cache and does not run jobs.
- adds the `external_server` configuration for cache. If specified, act_runner will use this URL as the ACTIONS_CACHE_URL instead of starting a cache server itself.
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/275
Co-authored-by: Zettat123 <zettat123@gmail.com>
Co-committed-by: Zettat123 <zettat123@gmail.com>
Follow #242, #244Fixes#258
Users could use `docker_host` configuration to specify which docker daemon will be used by act_runner.
- If `docker_host` is **empty**, act_runner will find an available docker host automatically.
- If `docker_host` is **"-"**, act_runner will find an available docker host automatically, but the docker host won't be mounted to the job containers and service containers.
- If `docker_host` is **not empty or "-"**, the specified docker host will be used. An error will be returned if it doesn't work.
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/260
Co-authored-by: Zettat123 <zettat123@gmail.com>
Co-committed-by: Zettat123 <zettat123@gmail.com>
Close https://gitea.com/gitea/act_runner/issues/177
Related https://gitea.com/gitea/act/pulls/56
### ⚠️ Breaking
The `container.network_mode` is a deprecated configuration item. It may be removed after Gitea 1.20 released.
Previously, if the value of `container.network_mode` is `bridge`, it means that `act_runner` will create a new network for job.But `bridge` is easily confused with the bridge network created by Docker by default.
We recommand that using `container.network` to specify the network to which containers created by `act_runner` connect.
### 🆕 container.network
The configuration file of `act_runner` add a new item of `contianer.network`.
In `config.example.yaml`:
```yaml
container:
# Specifies the network to which the container will connect.
# Could be host, bridge or the name of a custom network.
# If it's empty, act_runner will create a network automatically.
network: ""
```
As the comment in the example above says, the purpose of the `container.network` is specifying the network to which containers created by `act_runner` will connect.
`container.network` accepts the following valid values:
- `host`: All of containers (including job containers and service contianers) created by `act_runner` will be connected to the network named `host` which is created automatically by Docker. Containers will share the host’s network stack and all interfaces from the host will be available to these containers.
- `bridge`: It is similar to `host`. All of containers created by `act_runner` will be connected to the network named `bridge` which is created automatically by Docker. All containers connected to the `bridge` (Perhaps there are containers that are not created by `act_runner`) are allowed to communicate with each other, while providing isolation from containers which are not connected to that `bridge` network.
- `<custom_network>`: Please make sure that the `<custom_network>` network already exists firstly (`act_runner` does not detect whether the specified network exists currently. If not exists yet, will return error in the stage of `docker create`). All of containers created by `act_runner` will be connected to `<custom_network>`. After the job is executed, containers are removed and automatically disconnected from the `<custom_network>`.
- empty: `act_runner` will create a new network for each job container and their service containers (if defined in workflow). So each job container and their service containers share a network environment, but are isolated from others container and the Docker host. Of course, these networks created by `act_runner` will be removed at last.
### Others
- If you do not have special needs, we highly recommend that setting `container.network` to empty string (and do not use `container.network_mode` any more). Because the containers created by `act_runner` will connect to the networks that are created by itself. This point will provide better isolation.
- If you set `contianer.network` to empty string or `<custom_network>`, we can be access to service containers by `<service-id>:<port>` in the steps of job. Because we added an alias to the service container when connecting to the network.
Co-authored-by: Jason Song <i@wolfogre.com>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/184
Reviewed-by: Jason Song <i@wolfogre.com>
Co-authored-by: sillyguodong <gedong_1994@163.com>
Co-committed-by: sillyguodong <gedong_1994@163.com>
Fixes#145
At present, the working directory of a work flow is a path like `/<owner>/<repo>`, so the directory may conflict with system directory like `/usr/bin`. We need to add a parent directory for the working directory.
In this PR, the parent directory is `/workspace` by default and users could configure it by the `workdir_parent` option.
This change doesn't affect the host mode because in host mode the working directory will always be in `$HOME/.cache/act/` directory.
Co-authored-by: Jason Song <i@wolfogre.com>
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/154
Reviewed-by: Jason Song <i@wolfogre.com>
Co-authored-by: Zettat123 <zettat123@gmail.com>
Co-committed-by: Zettat123 <zettat123@gmail.com>
This adds a very simple Dockerfile and run script for running `act_runner` as a container.
It also allows setting `Privileged` and `ContainerOptions` flags via the new config file when spawning task containers. The combination makes it possible to use Docker-in-Docker (which requires `privileged` mode) as well as pass any other options child Docker containers may require.
For example, if Gitea is running in Docker on the same machine, for the `checkout` action to behave as expected from a task container launched by `act_runner`, it might be necessary to map the hostname via something like:
```
container:
network_mode: bridge
privileged: true
options: --add-host=my.gitea.hostname:host-gateway
```
> NOTE: Description updated to reflect latest code.
> NOTE: Description updated to reflect latest code (again).
Reviewed-on: https://gitea.com/gitea/act_runner/pulls/84
Reviewed-by: Lunny Xiao <xiaolunwen@gmail.com>
Reviewed-by: Jason Song <i@wolfogre.com>
Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Co-committed-by: Thomas E Lackey <telackey@bozemanpass.com>