Closed-loop feedback-driven control laws can be used to solve low-thrust many-revolution trajectory design and guidance problems with minimal computational cost. They treat the problem from a targeting perspective and hence value stability over optimality. The optimality can be increased by making the parameters state-dependent at the cost of reduced stability. In this paper, an actor-critic reinforcement learning framework is used to make the parameters of the Lyapunov-based Q-law state-dependent. A single-layer neural network ensures the Jacobian of these state-dependent parameters can be calculated and used to enforce stability throughout the transfer. The current results focus on GTO-GEO and LEO-GEO transfers in Keplerian dynamics. A trade-off between optimality and stability is observed for the first, but the added stability increases optimality for the later. Robustness to uncertainties in position and velocity are also investigated, along with the effects of eclipses and dynamical perturbations such as J2, Sun and Moon third body attractions.