Content-based image retrieval (CBIR) for location recognition allows more precise indoor navigation than state-of-the-art methods. Using range images and matching a query image to a dataset of geo-tagged images is current research. This thesis investigates the prospects of applying 3D Shape feature detectors and descriptors to a point cloud projection of the range image. Therefor at first the keypoint detection methods Normal Aligned Radial Feature (NARF), Intrinsic Shape Signatures (ISS) and HARRIS3D detector are described, followed by the shape feature descriptors Spin Images, Signatures of Histograms of Orientations (SHOT) and Unique Shape Context (USC). Special attention is paid to the parameters. Varying radii, border estimation methods, preset filters and computing times are analysed in order to determine, how to set those parameters to obtain good results. The results exhibit the shortcomings of the state-of-the-art 3D feature algorithms, in application of indoor navigation. Finally suggestions for improvement are made.
Contents
1 Introduction
2 Background and Methods
2.1 Image and Location Retrieval
2.2 Database of Range Images
2.3 Feature Extraction
2.4 Matching of Features
2.5 Evaluation of Matching
2.6 Depth Image to Point Cloud Projection
2.7 Keypoints Detection Methods
2.7.1 NARF Keypoint Detector
2.7.2 ISS Keypoint Detector
2.7.3 HARRIS3D Keypoint Detector
2.8 Feature Description Methods
2.8.1 NARF Feature Descriptor
2.8.2 Spin Image Feature Descriptor
2.8.3 SHOT Feature Descriptor
2.8.4 USC Feature Descriptor
3 Implementation
3.1 Overall Design
3.2 Filtering
3.3 Normal Estimation
3.4 Keypoint Detectors
3.4.1 NARF Implementation
3.4.2 ISS Implementation
3.4.3 HARRIS3D Implementation
3.4.4 NARF and ISS Combination
3.5 Feature Descriptors
3.5.1 NARF Feature Descriptor Implementation
3.5.2 Spin Image, SHOT and USC Feature Descriptor Implementation
3.6 Saving Files
4 Optimization and Analysis
4.1 Parameters
4.2 ISS Border Estimation
4.3 Preparing Cloud for HARRIS3D
4.4 Computing Times
4.5 Keypoints with ISS vs. NARF
5 Evaluation and Results
5.1 Suitability of 3D Features for CBIR Location Recognition
5.1.1 Query Sets
5.1.2 HARRIS3D and SHOT
5.1.3 Intermediate Result
5.1.4 Further Results of Feature Matching
5.1.5 Explanation
5.2 Combining ISS and NARF
6 Summary and Outlook for Future Work
Objectives and Research Focus
This thesis examines the feasibility of using 3D shape features for indoor location recognition through Content-Based Image Retrieval (CBIR). The central research objective is to evaluate whether 3D descriptors, when applied to range images and point cloud projections, can provide sufficient accuracy for navigating indoor environments without GPS reliance.
- Investigation of keypoint detection methods (NARF, ISS, HARRIS3D).
- Evaluation of 3D feature descriptors (Spin Images, SHOT, USC).
- Optimization of computational parameters and influence of point cloud filtering.
- Analysis of real-world performance using the TUMindoor dataset.
- Assessment of computational efficiency and comparison against 2D-based retrieval methods.
Auszug aus dem Buch
2.6 Depth Image to Point Cloud Projection
The focus of the thesis is to apply all calculations in the 3D space. A depth image, e.g. taken by a Kinect camera, is stored in a png file. This png image has the dimensions dwidth and dheight. That means that the image has an resolution of dwidth times dheight pixels. Each pixel has a grey value representing the depth. To reproduce the 3D space, these points can not only be set to their depth, but also the recording procedure must be taken into account. As Figure 2.4 shows, the focal length f has an important role. The camera or sensor C is set centered in front of the image. Taking beams from C through the points p(u, v) with the length of the depth d behind the image, indicates the 3D coordinates. Equation 2.6 is the mathematical approach, wherein uc is the horizontal center and vc the vertical center of the image, which in our case is set to:
uc = dwidth/2, vc = dheight/2 (2.5)
Now, using the pin-hole camera model, each pixel p(u, v) is projected to 3D point P(x, y, z) as follows:
P(x, y, z)=( (u-uc)/f * d, (v-vc)/f * d, d ) (2.6)
The grey image has a certain grey value resolution. The images in the database as well as those captured by the Kinect are stored in 16 bit precision. But the images are loaded in 8 bit. This causes inaccuracy given by the quantization. The point cloud has steps and gaps on the z axis which causes artefacts. An example is given in Figure 2.6, showing the artefacts of the point cloud from Figure 2.5.
Summary of Chapters
1 Introduction: Provides an overview of CBIR for indoor location recognition and outlines the motivation for utilizing 3D sensors.
2 Background and Methods: Details the theoretical foundation of CBIR, depth image projection, and the various algorithms for keypoint detection and feature description.
3 Implementation: Describes the C++ based test software (3DFeatureExtractor) utilizing PCL, covering data filtering, normal estimation, and file management.
4 Optimization and Analysis: Analyzes the critical parameters and filters required for 3D feature processing, including a performance assessment of computing times.
5 Evaluation and Results: Presents the matching results of the 3D features on the TUMindoor dataset and compares them against existing 2D and hybrid approaches.
6 Summary and Outlook for Future Work: Concludes that current 3D feature descriptors are limited in this scenario and suggests future research into object-relation-based descriptions.
Keywords
Content-Based Image Retrieval, CBIR, Indoor Location Recognition, 3D Shape Features, Range Images, Point Cloud Library, PCL, NARF, ISS, HARRIS3D, SHOT, Spin Images, Computer Vision, TUMindoor, Feature Extraction
Frequently Asked Questions
What is the primary focus of this thesis?
The work investigates the suitability of 3D shape features for indoor location recognition, aiming to improve navigation in environments where GPS is unavailable.
What are the key technical themes?
The study centers on image processing techniques including keypoint detection, local feature description, point cloud projection, and CBIR matching strategies.
What is the core research question?
The research asks if state-of-the-art 3D feature detectors and descriptors can reliably identify indoor locations from range images compared to traditional 2D RGB-based methods.
Which scientific methods are applied?
The author implements a C++ test program using the Point Cloud Library (PCL) to analyze various detectors (NARF, ISS, HARRIS3D) and descriptors (SHOT, USC, Spin Images) on a specific dataset.
What does the main part cover?
The main part encompasses the implementation of the feature extraction pipeline, a rigorous analysis of parameter tuning, and an evaluation of retrieval performance on the TUMindoor dataset.
Which keywords best characterize this work?
Core keywords include CBIR, 3D Shape Features, Point Cloud, Indoor Location Recognition, NARF, ISS, and HARRIS3D.
How does the author evaluate the "border estimation" in ISS?
The author concludes that border estimation is highly recommended for obtaining stable and repeatable keypoints, as it helps exclude points that are merely dependent on the viewing angle.
What is the main finding regarding the utility of 3D features for this scenario?
The study finds that 3D features, as implemented, are generally inferior to 2D-based descriptors (like SURF) for the TUMindoor scenario, largely due to insufficient point cloud resolution and the monotonous nature of the corridors.
- Quote paper
- Konrad Vowinckel (Author), 2014, Indoor Location Retrieval with Depth Images using 3D Shape Features, Munich, GRIN Verlag, https://www.hausarbeiten.de/document/280435