Monolithic scintillation crystals have been widely studied as detectors for PET systems. In this work, we explore deep learning techniques using the information from mean detector response functions (MDRFs) as a new, potentially faster and more accurate method to estimate gamma ray interaction location in monolithic scintillation crystal detectors. Compared with searching based methods, deep learning techniques do not require recording all the MDRF information once the prediction networks are trained. This could reduce the memory cost by a factor of up to 100. We have designed four different neural networks to estimate/predict gamma ray interaction location given the MDRF data. The networks are trained with calibrated MDRFs at 252×252 locations in a 50.8×50.8 mm2 area, with each MDRF data point containing the information from 20 SiPM channels. Our first network consists of only fully connected (FC) layers, with a final regression layer that directly predicts the values for x and y location. This network is trained to eventually have a loss of 4.8 (L2 loss in units of mm2). The second network is also consisted of only FC layers, but with a final classification layer that classifies the x and y location into 252 classes. This network is trained to have a final prediction accuracy of 15%. The third network is designed as a convolutional neural network (CNN) with a final classification layer. This network is trained to have a final prediction accuracy of 20%. With the last network, we perform "sectional training" by first coarsely separating the entire crystal into 12×12 sub-areas. Then for each sub-area, we do fine training and classify the x and y location into the correct class. This network is trained to have a final prediction accuracy of 90%. We test the trained networks with a 5 slit image. The RMS prediction errors for the four networks are 2.6 mm (FC regression network), 2.2 mm (FC classification network), 2.1 mm (CNN network) and 2.0 mm ("sectional training" network). We can see that the CNN network and "sectional training" network can achieve lower prediction error. In comparison with searching based methods, deep learning based estimation methods do not need to keep record of all the MDRF information, which consists of a total of 1270080 parameters, once the networks are trained. This has reduced the memory cost by a factor of 10 - 100 depending on the network structure.