The use of millimeter waves (mmWave) for next-generation cellular systems is promising due to the large bandwidth available in this band. Beamforming will likely be divided into RF and baseband domains, which is called hybrid beamforming. Precoders can be designed by using a predefined codebook or by choosing beamforming vectors arbitrarily in hybrid beamforming. The computational complexity of finding optimal precoders grows exponentially with the number of RF chains. In this paper, we develop a Q-learning (a form of reinforcement learning) based algorithm to find the precoders jointly. We analyze the complexity of the algorithm as a function of the number of iterations used in the training phase. We compare the spectral efficiency achieved with unconstrained precoding, exhaustive search, and another state-of-art algorithm. Results show that our algorithm provides better spectral efficiency than the state-of-art algorithm and has performance close to that of exhaustive search.