UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images

1Qinghai University, 2Beijing Jiaotong University, 3Tsinghua University
method

Abstract

Due to the diverse geographical environments, intricate landscapes, and high-density settlements, the automatic identification of urban village boundaries using remote sensing images remains a highly challenging task. This paper proposes a novel and efficient neural network model called UV-Mamba for accurate boundary detection in high-resolution remote sensing images. UV-Mamba mitigates the memory loss problem in lengthy sequence modeling, which arises in state space models with increasing image size, by incorporating deformable convolutions. Its architecture utilizes an encoder-decoder framework and includes an encoder with four deformable state space augmentation blocks for efficient multi-level semantic extraction and a decoder to integrate the extracted semantic information. We conducted experiments on two large datasets showing that UV-Mamba achieves state-of-the-art performance. Specifically, our model achieves 73.3% and 78.1% IoU on the Beijing and Xi'an datasets, respectively, representing improvements of 1.2% and 3.4% IoU over the previous best model while also being 6× faster in inference speed and 40× smaller in parameter count. Source code and pre-trained models are available at https://github.com/Devin-Egber/UV-Mamba.

Class Activation Map

We employ Class Activation Maps (CAM) to visualize the decision-making process of the convolutional neural network in image classification tasks. Specifically, we compute a weighted sum over the convolutional feature maps from the network's final layer to generate a heatmap that highlights the regions of the input image most relevant to the model’s prediction for a given class. As shown in the figure, red regions denote areas with a higher impact on the prediction, whereas blue regions indicate lesser influence. These visualizations enable us to identify the key regions the model attends to and reveal potential biases in its decision-making process.

Results

MY ALT TEXT