Homepage Program Systems: Theory and Applications Русская версия
ISSN 2079-3316 Bilingual online scientific Online scientific journal of the Ailamazyan Program System Institute of the Ailamazyan PSI of PSI of Russian Academy of Science of RAS 12+ 
Volume 17 (2026) . Issue 1 (70) . Paper No. 5 (507)

Artificial intelligence and machine learning

Research Article

Embedding-based object segmentation with an adapted U-Net architecture

Igor Victorovich VinokurovCorrespondent author

Financial University under the Government of the Russian Federation, Moscow, Russia
Igor Victorovich Vinokurov — Correspondent author igvvinokurov@fa.ru

Abstract. The article presents a multi-task neural network based on a modified U-Net architecture for joint semantic and instance segmentation of objects in aerial imagery. The model employs a symmetric encoder-decoder structure with skip connections and is equipped with two parallel output heads. The semantic head performs pixel-wise classification, while the embedding head generates discriminative vector representations for each pixel. The application of a specialized discriminative loss function ensures compact embedding clusters within objects and separation between different instances. In the post-processing stage, clustering the embedding field allows for unambiguous extraction of individual object masks.

Experiments were conducted on a specialized aerial imagery dataset containing 23,076 annotated objects across five classes. For the key class «Building» the validation set achieved IoU = 0.812 and F1-score = 0.880. A comparison with state-of-the-art methods (Mask2Former, OneFormer, SAM 2 with LoRA fine-tuning, MR-DeepLabv3+^+ ) confirms the model's competitiveness in terms of the balance between accuracy and inference speed.

The model demonstrates effectiveness for automated mapping and urban structure analysis tasks using remote sensing data. (Linked article texts in English and in Russian).

Keywords: semantic segmentation, instance segmentation, U-Net architecture, pixel-wise embeddings, discriminative loss

MSC-20202020 Mathematics Subject Classification 68T20; 68T07, 68T45MSC-2020 68-XX: Computer science
MSC-2020 68Txx: Artificial intelligence
MSC-2020 68T20: Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
MSC-2020 68T07: Artificial neural networks and deep learning
MSC-2020 68T45: Machine vision and scene understanding

For citation: Igor V. Vinokurov. Embedding-based object segmentation with an adapted U-Net architecture. Program Systems: Theory and Applications, 2026, 17:1, pp. 105–172. (in Engl. In Russ.). https://psta.psiras.ru/2026/1_105-172.

Full text of bilingual article (PDF): https://psta.psiras.ru/read/psta2026_1_105-172.pdf (Clicking on the flag in the header switches the page language).

The article was submitted 06.01.2026; approved after reviewing 16.01.2026; accepted for publication 26.02.2026; published online 12.03.2026.

© Vinokurov I. V.
2026
Editorial address: Ailamazyan Program Systems Institute of the Russian Academy of Sciences, Peter the First Street 4«a», Veskovo village, Pereslavl area, Yaroslavl region, 152021 Russia;   Website:  http://psta.psiras.ru Phone: +7(4852) 695-228;   E-mail: ;   License: CC-BY-4.0License text on the Creative Commons site
© Ailamazyan Program System Institute of Russian Academy of Science (site design) 2010–2026