Skip to content

fix: discard auth migration telemetry silently in broker mode#4299

Draft
clechevallier wants to merge 1 commit intoactions:mainfrom
clechevallier:fix/auth-migration-telemetry-broker-mode
Draft

fix: discard auth migration telemetry silently in broker mode#4299
clechevallier wants to merge 1 commit intoactions:mainfrom
clechevallier:fix/auth-migration-telemetry-broker-mode

Conversation

@clechevallier
Copy link

In broker mode (serverUrl == serverUrlV2), _runnerServer.ConnectAsync is intentionally skipped in BrokerMessageListener.CreateSessionAsync since only the broker connection is needed. However, ReportAuthMigrationTelemetryAsync still calls UpdateAgentUpdateStateAsync which calls CheckConnection(Generic), throwing InvalidOperationException: SetConnection Generic.

The exception is caught, but the telemetry is re-enqueued for retry, creating a silent infinite loop that emits an ERR every ~60 seconds for the entire lifetime of every runner pod using broker mode (e.g. ARC with GitHub App auth).

[RUNNER 2026-03-13 15:39:55Z ERR  Runner] Failed to report auth migration telemetry.
[RUNNER 2026-03-13 15:39:55Z ERR  Runner] System.InvalidOperationException: SetConnection Generic
[RUNNER 2026-03-13 15:39:55Z ERR  Runner]    at GitHub.Runner.Common.RunnerServer.CheckConnection(RunnerConnectionType connectionType)
[RUNNER 2026-03-13 15:39:55Z ERR  Runner]    at GitHub.Runner.Common.RunnerServer.UpdateAgentUpdateStateAsync(Int32 agentPoolId, UInt64 agentId, String currentState, String trace, CancellationToken cancellationToken)
[RUNNER 2026-03-13 15:39:55Z ERR  Runner]    at GitHub.Runner.Listener.Runner.ReportAuthMigrationTelemetryAsync(CancellationToken token)

Fix by catching the specific InvalidOperationException from CheckConnection before the generic catch, and discarding the telemetry entry with a Trace.Verbose instead of re-enqueuing. This breaks the infinite retry loop with zero impact on non-broker runners where _runnerServer is always connected.

In broker mode (serverUrl == serverUrlV2), _runnerServer.ConnectAsync is
intentionally skipped in BrokerMessageListener.CreateSessionAsync since only
the broker connection is needed. However, ReportAuthMigrationTelemetryAsync
still calls UpdateAgentUpdateStateAsync which calls CheckConnection(Generic),
throwing InvalidOperationException: SetConnection Generic.

The exception is caught, but the telemetry is re-enqueued for retry, creating
a silent infinite loop that emits an ERR every ~60 seconds for the entire
lifetime of every runner pod using broker mode (e.g. ARC with GitHub App auth).

Fix by catching the specific InvalidOperationException from CheckConnection
before the generic catch, and discarding the telemetry entry with a
Trace.Verbose instead of re-enqueuing. This breaks the infinite retry loop
with zero impact on non-broker runners where _runnerServer is always connected.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant