-
Notifications
You must be signed in to change notification settings - Fork 269
Description
Summary
In alternate report storage mode (OPERATOR_ALT_REPORT_STORAGE_ENABLED=true), the OPERATOR_SCANNER_REPORT_TTL setting is effectively ignored. Reports written to the filesystem are never re-evaluated for TTL expiration, and the only way to trigger new scans is to restart the operator — which causes a mass rescan of ALL workloads, ignoring existing report TTLs.
Root Cause
The WorkloadController relies on controller-runtime's Owns() watch mechanism to trigger re-reconciliation after scan jobs complete. In CRD mode, creating a VulnerabilityReport triggers the .Owns(&v1alpha1.VulnerabilityReport{}) watch, which re-reconciles the parent workload. This allows the TTL check code to run on subsequent reconciliations.
In alt storage mode, reports are written directly to the filesystem via streamReportToFile() in pkg/vulnerabilityreport/controller/scanjob.go. No CRD is created, so the Owns() watch never fires. The workload is never re-reconciled after the initial scan, and the TTL expiration check in reconcileWorkload() is never reached.
Relevant code paths:
pkg/vulnerabilityreport/controller/workload.go—SetupWithManager()registers.Owns(&v1alpha1.VulnerabilityReport{})watchespkg/vulnerabilityreport/controller/workload.go—reconcileWorkload()contains TTL check logic that is never reached in alt storage modepkg/vulnerabilityreport/controller/scanjob.go—processCompleteScanJob()writes reports to the filesystem, bypassing CRD creation entirely
Impact
-
No TTL-based rescans: Once a workload is scanned, it is never rescanned based on
scannerReportTTL. The TTL config setting is silently ignored in alt storage mode. -
Operator restart causes mass rescan: On startup, controller-runtime's informers perform an initial LIST and generate synthetic Create events for all existing workloads. Since the reconciler has no memory of previous scans (the filesystem-based
ReadWriterhas noFindByOwnerimplementation), ALL workloads are rescanned simultaneously — regardless of whether their existing reports are still within TTL. -
No
RequeueAfterscheduling: After a scan job completes in alt storage mode, the reconciler returnsctrl.Result{}with noRequeueAfter. There is no mechanism to schedule the next TTL check.
Additional Findings
Filename ambiguity with hyphen separators
Report filenames use hyphens as field separators ({Kind}-{Name}-{Container}.json), but Kubernetes resource names also contain hyphens (RFC 1123). This makes filenames ambiguous and glob patterns unreliable when trying to implement filesystem-based FindByOwner. For example, a ReplicaSet named app-web-abc123 produces a filename where kind, name, and container boundaries cannot be reliably determined.
Filesystem FindByOwner not implemented
The ReadWriter interface's FindByOwner method has no filesystem-backed implementation. The CRD-based FindByOwner queries the Kubernetes API, which returns nothing in alt storage mode since no CRDs are created. This means hasVulnerabilityReports() always returns false in alt storage mode, contributing to unnecessary rescans on every operator restart.
Silent trivy server requeue
When BuiltInTrivyServer is true and the trivy server is unreachable, the reconciler returns ctrl.Result{RequeueAfter: r.Config.ScanJobRetryAfter} with NO log output at any level. This makes debugging scan failures in alt storage mode particularly difficult since scans silently fail to start.
Proposed Fix
- Implement a filesystem-backed
FindByOwnerthat globs report files on disk and returns matching reports - Use an unambiguous filename separator (e.g., underscore
_) since it is not valid in K8s resource names (RFC 1123) - After scan job completion in alt storage mode, return
ctrl.Result{RequeueAfter: ttl}to schedule TTL re-evaluation - Add namespace prefix to filenames to prevent cross-namespace filename collisions
- Add logging when trivy server health check causes a silent requeue
Environment
- trivy-operator version: latest (main branch)
- Kubernetes version: any
- Alt storage enabled:
OPERATOR_ALT_REPORT_STORAGE_ENABLED=true - Report TTL configured:
OPERATOR_SCANNER_REPORT_TTLset to any non-empty value (e.g.,24h)