Skip to content

Alt storage mode: scannerReportTTL not honored, no TTL-based rescans occur #2869

@rzala

Description

@rzala

Summary

In alternate report storage mode (OPERATOR_ALT_REPORT_STORAGE_ENABLED=true), the OPERATOR_SCANNER_REPORT_TTL setting is effectively ignored. Reports written to the filesystem are never re-evaluated for TTL expiration, and the only way to trigger new scans is to restart the operator — which causes a mass rescan of ALL workloads, ignoring existing report TTLs.

Root Cause

The WorkloadController relies on controller-runtime's Owns() watch mechanism to trigger re-reconciliation after scan jobs complete. In CRD mode, creating a VulnerabilityReport triggers the .Owns(&v1alpha1.VulnerabilityReport{}) watch, which re-reconciles the parent workload. This allows the TTL check code to run on subsequent reconciliations.

In alt storage mode, reports are written directly to the filesystem via streamReportToFile() in pkg/vulnerabilityreport/controller/scanjob.go. No CRD is created, so the Owns() watch never fires. The workload is never re-reconciled after the initial scan, and the TTL expiration check in reconcileWorkload() is never reached.

Relevant code paths:

  • pkg/vulnerabilityreport/controller/workload.goSetupWithManager() registers .Owns(&v1alpha1.VulnerabilityReport{}) watches
  • pkg/vulnerabilityreport/controller/workload.goreconcileWorkload() contains TTL check logic that is never reached in alt storage mode
  • pkg/vulnerabilityreport/controller/scanjob.goprocessCompleteScanJob() writes reports to the filesystem, bypassing CRD creation entirely

Impact

  1. No TTL-based rescans: Once a workload is scanned, it is never rescanned based on scannerReportTTL. The TTL config setting is silently ignored in alt storage mode.

  2. Operator restart causes mass rescan: On startup, controller-runtime's informers perform an initial LIST and generate synthetic Create events for all existing workloads. Since the reconciler has no memory of previous scans (the filesystem-based ReadWriter has no FindByOwner implementation), ALL workloads are rescanned simultaneously — regardless of whether their existing reports are still within TTL.

  3. No RequeueAfter scheduling: After a scan job completes in alt storage mode, the reconciler returns ctrl.Result{} with no RequeueAfter. There is no mechanism to schedule the next TTL check.

Additional Findings

Filename ambiguity with hyphen separators

Report filenames use hyphens as field separators ({Kind}-{Name}-{Container}.json), but Kubernetes resource names also contain hyphens (RFC 1123). This makes filenames ambiguous and glob patterns unreliable when trying to implement filesystem-based FindByOwner. For example, a ReplicaSet named app-web-abc123 produces a filename where kind, name, and container boundaries cannot be reliably determined.

Filesystem FindByOwner not implemented

The ReadWriter interface's FindByOwner method has no filesystem-backed implementation. The CRD-based FindByOwner queries the Kubernetes API, which returns nothing in alt storage mode since no CRDs are created. This means hasVulnerabilityReports() always returns false in alt storage mode, contributing to unnecessary rescans on every operator restart.

Silent trivy server requeue

When BuiltInTrivyServer is true and the trivy server is unreachable, the reconciler returns ctrl.Result{RequeueAfter: r.Config.ScanJobRetryAfter} with NO log output at any level. This makes debugging scan failures in alt storage mode particularly difficult since scans silently fail to start.

Proposed Fix

  1. Implement a filesystem-backed FindByOwner that globs report files on disk and returns matching reports
  2. Use an unambiguous filename separator (e.g., underscore _) since it is not valid in K8s resource names (RFC 1123)
  3. After scan job completion in alt storage mode, return ctrl.Result{RequeueAfter: ttl} to schedule TTL re-evaluation
  4. Add namespace prefix to filenames to prevent cross-namespace filename collisions
  5. Add logging when trivy server health check causes a silent requeue

Environment

  • trivy-operator version: latest (main branch)
  • Kubernetes version: any
  • Alt storage enabled: OPERATOR_ALT_REPORT_STORAGE_ENABLED=true
  • Report TTL configured: OPERATOR_SCANNER_REPORT_TTL set to any non-empty value (e.g., 24h)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions