|
|
|
|
|
|
|
Project developed at the LaSTIG lab at IGN in collaboration with Thales under the MicMac photogrammetry software and accepted at the CVPR EarthVision Workshop 2023. |
We present three multi-scale similarity learning architectures, or DeepSim networks. These models learn pixel-level matching with a contrastive loss and are agnostic to the geometry of the considered scene. We establish a middle ground between hybrid and end-to-end approaches by learning to densely allocate all corresponding pixels of an epipolar pair at once. Our features are learnt on large image tiles to be expressive and capture the scene’s wider context. We also demonstrate that curated sample mining can enhance the overall robustness of the predicted similarities and improve the performance on radiometrically homogeneous areas. We run experiments on aerial and satellite datasets. Our DeepSim-Nets outperform the baseline hybrid approaches and generalize better to unseen scene geometries than end-to-end methods. Our flexible architecture can be readily adopted in standard multi-resolution image matching pipelines. |
|
Sampling for dense matching. Training follows the self-supervised contrastive learning paradigm. Conversely to patch-based training, our triplets are sets of features output by a multi-scale CNN backbone. Our sample mining scheme enforces the matching uniqueness constraint and and learnt similarities robustness. |
Acknowledgements |