Artificial intelligence and machine learning
Research Article
Embedding-based object segmentation with an adapted U-Net architecture
Igor Victorovich Vinokurov
| Financial University under the Government of the Russian Federation, Moscow, Russia | |
|
|
Abstract. The article presents a multi-task neural network based on a modified U-Net architecture for joint semantic and instance segmentation of objects in aerial imagery. The model employs a symmetric encoder-decoder structure with skip connections and is equipped with two parallel output heads. The semantic head performs pixel-wise classification, while the embedding head generates discriminative vector representations for each pixel. The application of a specialized discriminative loss function ensures compact embedding clusters within objects and separation between different instances. In the post-processing stage, clustering the embedding field allows for unambiguous extraction of individual object masks.
Experiments were conducted on a specialized aerial imagery dataset containing 23,076 annotated objects across five classes. For the key class «Building» the validation set achieved IoU = 0.812 and F1-score = 0.880. A comparison with state-of-the-art methods (Mask2Former, OneFormer, SAM 2 with LoRA fine-tuning, MR-DeepLabv3 ) confirms the model's competitiveness in terms of the balance between accuracy and inference speed.
The model demonstrates effectiveness for automated mapping and urban structure analysis tasks using remote sensing data. (Linked article texts in English and in Russian).
Keywords: semantic segmentation, instance segmentation, U-Net architecture, pixel-wise embeddings, discriminative loss
MSC-2020
68T20; 68T07, 68T45For citation: Igor V. Vinokurov. Embedding-based object segmentation with an adapted U-Net architecture. Program Systems: Theory and Applications, 2026, 17:1, pp. 105–172. (in Engl. In Russ.). https://psta.psiras.ru/2026/1_105-172.
Full text of bilingual article (PDF): https://psta.psiras.ru/read/psta2026_1_105-172.pdf (Clicking on the flag in the header switches the page language).
The article was submitted 06.01.2026; approved after reviewing 16.01.2026; accepted for publication 26.02.2026; published online 12.03.2026.