This paper examines and compares two neural networks, U-Net-Attention and SegGPT, which use different attention mechanisms to find relationships between different parts of the input and output data. The U-Net-Attention architecture is a dual-layer attention U-Net neural network, an efficient neural network for image segmentation. It has an encoder and decoder, combined connections between layers and connections that pass through hidden layers, which allows information about the local properties of feature maps to be conveyed. To improve the quality of segmentation, the original U-Net architecture includes an attention layer, which helps to enhance the search for the image features we need. The SegGPT model is based on the Visual Transformers architecture and also uses an attention mechanism. Both models focus attention on important aspects of a problem and can be effective in solving a variety of problems. In this work, we compared their work on segmenting cracks in road surface images to further classify the condition of the road surface as a whole. An analysis and conclusions are also made about the possibilities of using architectural transformers to solve a wide range of problems.
Keywords: machine learning, Transformer neural networks, U-Net-Attention, SegGPT, roadway condition analysis, computer vision