This article deals with the problem of analyzing and recognizing human emotions using sound data processing. In view of the increase in the scope of application, which is largely caused by the difficult epidemiological situation in the world, the solution of the described problem is an urgent issue. The main stages are described: the audio data stream is recorded and, in accordance with the “sound fingerprinting” approach, is converted into an image that is a spectrogram of the sound data set. The stages of training a convolutional neural network on a pre-prepared set of sound data are described, and the structure of the algorithm is also described. To validate the neural network, a different set of audio data was selected, not participating in the training. As a result, graphs were constructed demonstrating the accuracy of the proposed method.
Keywords: neural network; human emotion recognition; convolutional neural network; sound fingerprinting; Tenserflow; Keras; Matlab; Deep Network Toolbox