M-Calib: A Monocular 3D Object Localization using 2D Estimates for Industrial Robot Vision System

Thanh Nguyen Canh

Du Ngoc Trinh

Xiem HoangVan

VNU University of Engineering and Technology, Vietnam
Journal of Automation, Mobile Robotics and Intelligent Systems (JAMRIS), 2025.

[Paper]

[Code]

3D Object Localization has been emerging recently as one of the challenges of Machine Vision or Robot Vision tasks. In this paper, we proposed a novel method designed for the localization of isometric flat 3D objects, leveraging a blend of deep learning techniques primarily rooted in object detection, postimage processing algorithms, and pose estimation. Our approach involves the strategic application of 3D calibration methods tailored for low-cost industrial robotics systems, requiring only a single 2D image input. Initially, object detection is performed using the You Only Look Once (YOLO) model, followed by segmentation of the object into two distinct parts— the top face and the remainder— using the Mask R-CNN model. Subsequently, the center of the top face serves as the initialization position and a unique combination of postprocessing techniques and a novel calibration algorithm is employed to refine the object’s position. Experimental results demonstrate a notable reduction in localization error by 87.65% when compared to existing methodologies.

Paper

Thanh Nguyen Canh, Du Ngoc Trinh, Xiem HoangVan

M-Calib: A Monocular 3D Object Localization using 2D Estimates for Industrial Robot Vision System

JAMRIS 2025.

[pdf]

Overview

A block diagram of our proposed calibration method. The translation vector between the initialized estimate center point (green point) and the calibration center point (red point) is calculated based on deep learning and our novel calibration method..

Illustration of industrial robot vision system: the green point is the initialized estimate center point and the red point is the actual center point.

Estimate Translation Vector.

Experiments

The progress of calculation object position in real‐world coordinate.

The progress of object segmentation and edge extraction.

Illustration of the estimate translation vector.

Visualized examples of experimental results: figure (b): the orange point is Yolo center, figure (d): dark red is the upper part center, The vector created by the blue points is a translation vector, the light blue point is correction center.

Experimental results evaluate position error of our algorithm

Code

[github]

Citation

1. Canh T. N., Ngoc D-T, HoangVan X., M-Calib: A Monocular 3D Object Localization using 2D Estimates for Industrial Robot Vision System. Journal of Automation, Mobile Robotics and Intelligent Systems (JAMRIS), 2025.

@inproceedings{canh2025s3m, 
                                  

                                    author    = {Canh, Thanh Nguyen and Ngoc, Du Trinh and HoangVan, Xiem}, 
                                  

                                    title     = {{M-Calib: A Monocular 3D Object Localization using 2D Estimates for Industrial Robot Vision System}}, 
                                  

                                    journal = {Journal of Automation, Mobile Robotics and Intelligent Systems (JAMRIS)}, 
                                  

                                    year      = {2025}, 
                                  

                                }

Acknowledgements

Thanh Nguyen Canh was funded by the Master, PhD Scholarship Programmer of Vingroup Innovation Foundation (VINIF), code VINIF.2023.ThS.120
This webpage template was borrowed from https://akanazawa.github.io/cmr/.