Published 2026-04-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
Multi-objective optimization has become one of the central design principles for cloud-native intelligent systems because the dominant workloads of the current cycle of distributed computing-large language model inference, LLM-assisted scheduling, observability-driven AIOps, privacy-preserving federated learning, and edge/IoT reasoning-no longer admit a single "best" objective. Instead, practitioners must optimize under persistent trade-offs among latency, throughput, service-level objective compliance, energy, carbon emissions, cloud expenditure, fairness, privacy, trust, robustness, and remediation time. Recent work makes this shift explicit. In the LLM-serving literature, DeepServe models joint SLO-cost scheduling as a contextual bandit; BOute co-optimizes heterogeneous model routing and GPU deployment with multi-objective Bayesian optimization; ECCOS combines predictive quality-cost estimation with constrained optimization; and the Liao preprint framed as prompt-level cost prediction and SLO awareness presents CAPS, a bi-objective carbon-aware scheduler for online LLM inference. In adjacent domains, cloud autoscaling has been reformulated as risk-constrained reinforcement learning, microservice rate limiting as deep RL under throughput-latency tension, and federated cloud analytics as a joint optimization of accuracy, communication cost, trust, and privacy. The first experiment studies policy search for multi-pool LLM serving under cost, carbon, and SLO-met goodput. The second study autoscales under workload drift and compares a threshold policy, a latency-only reactive controller, and a risk-aware controller. The experiments are not benchmark replications; they are explanatory simulations designed to make the trade-offs in the reviewed literature concrete. They show that Pareto-oriented search can recover better budgeted goodput than simple scalarization under a carbon budget, and that risk-aware capacity control can achieve substantially lower tail latency than a latency-only controller at a modest capacity premium. These findings align with the broader literature's movement away from single-objective heuristics and toward closed-loop, multi-objective, and observability-aware decision systems.