Structured Preference Modeling for Reinforcement Learning-Based Fine-Tuning of Large Models

Lin Zhu; Fan Guo; Guohui Cai; Yumeng Ma

doi:10.5281/zenodo.15340770

Vol. 4 No. 4 (2025)

Articles

Structured Preference Modeling for Reinforcement Learning-Based Fine-Tuning of Large Models

pdf

Lin Zhu,
Fan Guo,
Guohui Cai,
Yumeng Ma

DOI: https://doi.org/10.5281/zenodo.15340770

Published 2025-04-30

How to Cite

Zhu, L., Guo, F., Cai, G., & Ma, Y. (2025). Structured Preference Modeling for Reinforcement Learning-Based Fine-Tuning of Large Models. Journal of Computer Technology and Software, 4(4). https://doi.org/10.5281/zenodo.15340770

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

This paper aims to explore how preference modeling can enhance policy optimization efficiency and behavior controllability during reinforcement learning fine-tuning of large models. To address the limitations of traditional RLHF methods in modeling human feedback and guiding policy learning, we propose a strategy optimization framework that integrates a multi-scale preference modeling mechanism. The proposed method first constructs a structured preference scoring function from human feedback data to approximate reward signals. It then combines this with a policy gradient approach to guide the fine-tuning of language models, enabling effective alignment between model behavior and human preferences. The experimental section evaluates the performance of different preference modeling strategies on multiple natural language generation tasks. A comparative analysis is conducted across several dimensions, including accuracy, preference alignment, convergence speed, and training stability. Results show that the proposed method achieves better overall performance than existing approaches. It demonstrates strong capability in modeling preferences and improving fine-tuning effectiveness.

pdf

Structured Preference Modeling for Reinforcement Learning-Based Fine-Tuning of Large Models

How to Cite

Download Citation

Abstract

Most read articles by the same author(s)