Decision problems with the features of prisoner's dilemma are quite common. A general solution to this kind of social dilemma is that the agents cooperate to play a joint action. The Nash bargaining solution is an attractive approach to such cooperative games. In this paper, a multi-agent learning algorithm based on the Nash bargaining solution is presented. Different experiments are conducted on a testbed of stochastic games. The experimental results demonstrate that the algorithm converges to the policies of the Nash bargaining solution. Compared with the learning algorithms based on a non-cooperative equilibrium, this algorithm is fast and its complexity is linear with respect to the number of agents and number of iterations. In addition, it avoids the disturbing problem of equilibrium selection.