Multi-armed Bandit

Learning Multi-Objective Rewards and User Utility Function in Contextual Bandits for Personalized Ranking

This paper tackles the problem of providing users with ranked lists of relevant search results, by incorporating contextual features of the users and search results, and learning how a user values multiple objectives. For example, to recommend a …