Deep dives
Technical reports over committed benchmark records.
Seven reports, one per deployment scenario, sharing a single documented method. Each describes the mechanism under test, the experimental setup, the measured results, and the limitations that apply to them. Single-run arms are labeled, negative results are reported, and every figure cites a committed report path in the repository. The benchmark harness ships with the product, so all runs are reproducible on independent clusters.
Benchmark methodology
The common experimental method behind every report: same-node two-arm protocol (CFS baseline via safe mode vs. attached tiers), the stepped background-load ladder, load-generator validity rules, metric conventions, scheduler attribution, and the committed-record discipline.
01 · capacityIdle-capacity harvesting under tail-latency constraints
Cluster CPU allocation exceeds usage because requests are provisioned for peaks; harvesting the difference under CFS raises tail latency at the wakeup path. Measured memcached and redis density ladders in which batch work fills protected nodes while p99 remains flat.
CFS bandwidth control and database tail latency
CFS bandwidth control suspends every thread in a cgroup when its quota is exhausted, producing tail-latency cliffs on otherwise idle nodes; measured with kernel throttle counters on PostgreSQL, MySQL, and Cassandra. Includes the cpu.max enforcement semantics under sched_ext and a direct quota-parity measurement.
Tail-latency amplification in microservice chains
A 19-service DeathStarBench application under co-located background load. Per-hop queueing delay compounds across the request chain under CFS; end-to-end p99 growth is measured for both arms at each density step.
CPU-side scheduling and accelerator utilization
Descheduled data-loading threads stall GPU pipelines, idling the accelerator. Measurements cover PyTorch training on NVIDIA L4 under co-location density, a CPU-training comparison against standard Kubernetes remedies, and a negative result on GPU-bound serving.
Thread-level scheduling with workload profiles
Container-level metrics aggregate over threads with heterogeneous roles. Workload profiles assign per-thread-group scheduling policy; measurements cover ONNX and llama.cpp under SMT contention and a MySQL profile validation.
Failure modes and rollback behavior
Failure behavior of the sched_ext attachment: the kernel fallback contract, measured agent-kill failover, an 8-hour soak, reconfiguration cost, watchdog-initiated fallback to CFS, and annotation-based fleet-wide disable.
Automated node consolidation under an SLO guard
A consolidation controller executes an automated 3→2 node reduction on a live GKE cluster: plan-hash approval, tier-ordered drain honoring PodDisruptionBudgets under a continuous SLO guard, with the node reclaimed by the cluster autoscaler. A guard-triggered abort run is also reported.
Methodology, in one paragraph: each comparison runs the same workload, same nodes, same load generator in two arms — stock Kubernetes on CFS (obtained by putting Temper’s nodes in safe mode, so hardware and noise are held constant), then Temper attached. Density tests step a background-workload ladder and record where the primary’s SLO breaks in each arm. Where the mechanism can be verified with kernel counters instead of inferred from latency, it is. *under an attached sched_ext scheduler the cgroup throttle counter does not advance; enforcement semantics are analyzed in the CFS bandwidth control article