Abstract
Indexing of k-dimensional point data is becoming again a hot research topic because of the need to efficiently index and retrieve high dimensional vectors (points) in data mining applications. The most common query on such vectors is kNN searching, which is a variation of range searching. Most multidimensional indexes for point data follow the paradigm of the ubiquitous B+tree and store data entries at the leaf level of the index (data nodes). Since this level naturally occupies the majority of nodes in a multidimensional index tree, it is crucial that an index structure achieves the best possible average storage utilization regardless of data distribution and order of data insertion. An additional conflicting goal is the minimization of the index term that is posted at the levels above when data nodes are split. In this paper we revisit data node splitting techniques for point access methods like the KDB-tree, hB-tree, and, in general, any index that stores point data at its leaf level nodes and splits them so that no overlapping subspaces are created at the leaf level. We experiment with various splitting techniques that produce the minimum index term for posting but differ in the shape of the resulting nodes and the average storage utilization. We also test our splitting techniques using uniform and skewed data distributions. The comparison is on the average data node storage utilization and the efficiency of range query searches.
Keywords
Indexing of k-dimensional point data, Point Access Methods
References
Berchtold, S., Böhm, C., and Kriegel, H.P. (1998). The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. In Proc SIGMOD, pp. 142- 153.
Comer, D. (1979). The Ubiquitous B-Tree. ACM Comput. Surv. (CSUR), 11(2), pp. 121-137.
Evangelidis, G., Lomet, D.B., and Salzberg, B. (1997). The hB-Tree: A Multi-Attribute Index Supporting Concurrency, Recovery and Node Consolidation. VLDB J., 6(1), pp. 1-25.
Lomet, D.B. and Salzberg, B. (1990). The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance. ACM Trans. Database Syst. (TODS), 15(4), pp. 625-658.
Robinson, J.T. (1981). The K-D-B-Tree: A Search Structure For Large Multidimensional Dynamic Indexes. In Proc SIGMOD, pp. 10-18.