Stabilizing Anomaly Detection in Imbalanced Log Data via Cost-Sensitive Learning and Confidence Regularization
Published 2024-04-28
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This paper addresses the anomaly detection task in time-series and log data, proposing a robust anomaly detection method to improve the learnability of minority anomalies and the stability of alarm decisions, particularly in the face of the prevalent class imbalance problem in real-world scenarios. The method takes event sequences as input, first mapping discrete log events to a continuous embedding space, and obtaining sample-level representations through lightweight temporal aggregation. Based on this, a probability predictor is constructed to output anomaly confidence. To mitigate gradient bias caused by majority class dominance, a cost-sensitive weighted objective is introduced during the training phase to strengthen the anomaly class learning signal. Soft target alignment and confidence regularization constraints are combined to suppress overconfidence, resulting in a smoother probability output and easier thresholding decisions. Thresholding rules are used during the inference phase to complete anomaly determination, ensuring the method can be directly embedded into log stream processing and alarm systems. Comparative experiments show that the proposed method achieves superior and more balanced performance across multiple evaluation metrics, particularly demonstrating stronger stability and controllability in balancing false alarm control and anomaly coverage. This verifies the effectiveness and practical value of cost-sensitive learning and confidence constraints in detecting anomalies with extreme imbalance.