We consider a dynamic pricing problem where the firm tries to maximize the profit upon selling a product over the course of T periods. We do not assume decision maker’s foreknowledge on the demand. Traditionally, the cost is fixed and the problem may be formulated as a multi armed bandit problem, which is known to have an O(log T) lower bound on the expected regret. In this paper, we consider a setting where the cost may change over time and the optimal price is thus a function of the cost. We develop an upper confidence bound like (UCB-Like) algorithm to solve the problem. We show that our algorithm is robust and efficient in terms of the upper bound on the expected regret.
Event Speaker
We use cookies to ensure you get the best experience on our website.