Aligning Model with Human Feedback: A Ranking based Zeroth-order Optimization Method-湖北国家应用数学中心

Aligning Model with Human Feedback: A Ranking based Zeroth-order Optimization Method

2024年07月02日 16:42

报告题目：Aligning Model with Human Feedback: A Ranking based Zeroth-order Optimization Method

报告时间：2024-07-08 14:00-15:00

报告人：Prof.Tsung-Hui Chang (香港中文大学-深圳)

报告地点：理学院东北楼二楼报告厅(209)

Abstract：In this study, we delve into an emerging optimization challenge involving a black-box objective function that can only be gauged via a ranking oracle—a situation frequently encountered in real-world scenarios, especially when the function is evaluated by human judges. A prominent instance of such a situation is Reinforcement Learning with Human Feedback (RLHF), an approach recently employed to enhance the performance of Large Language Models (LLMs) using human. We introduce ZO-RankSGD, an innovative zeroth-order optimization algorithm designed to tackle this optimization problem, accompanied by theoretical assurances. Our algorithm utilizes a novel rank-based random estimator to determine the descent direction and guarantees convergence to a stationary point. Last but not least, we demonstrate the effectiveness of ZO-RankSGD in a novel application: improving the quality of images generated by a diffusion generative model with human ranking feedback. Throughout experiments, we found that ZO-RankSGD can significantly enhance the detail of generated images with only a few rounds of human feedback. Overall, our work advances the field of zeroth-order optimization by addressing the problem of optimizing functions with only ranking feedback, and offers a new and effective approach for aligning Artificial Intelligence (AI) with human intentions.