The Elf Approach
News
Visit our talk at HardBD-Workshop: "Selective Caching: A Persistent Memory Approach for Multi-Dimensional Index Structures" at ICDE 2020 in Dallas!
Our article " Efficient evaluation of multi-column selection predicates in main-memory" has been published under TKDE journal, 31(7):1296–1311, July 2019.
Description
Evaluating selection predicates is a data-intensive task that reduces intermediate results, which will be the input for further operations such as joins. With analytical queries getting more and more complex, the number of evaluated selection predicates per query rises as well. This leads to numerous multi-column selection predicates. Recent approaches to increase the performance of main-memory databases for selection-predicate evaluation aim at optimally exploiting the speed of the CPU by using accelerated scans. However, scanning each column one by one leaves tuning opportunities open that arise if all predicates are considered together. To this end, we introduce Elf, a storage structure that is able to exploit the relation between several selection predicates. Our Elf features cache sensitivity, an optimized storage layout, fixed search paths, and slight data compression. In our evaluation, we compare its query performance to two state-of-the-art approaches and a sequential scan using the concept of single instruction multiple data (SIMD). Our results indicate a clear superiority of our approach. For TPC-H queries with multi-column selection predicates, we achieve a speed-up between factor five and two orders of magnitude, mainly depending on the selectivity of the predicates.
Persons
David Broneske (University of Magdeburg)
Veit Köppen (University of Magdeburg)
Gunter Saake (University of Magdeburg)
Martin Schäler (Karlsruhe Institute of Technology)
Materials
The source code for the ICDE 2017 Paper is available at: http://git.iti.cs.ovgu.de/dbronesk/ICDE-elf
Required Libraries: Boost, CMake
Test data can be found in the source code repository above in the folder OP (TPC-H with scale factor 1).
Awards
- Our Paper "Accelerating Multi-Column Selection Predicates in Main-Memory - The Elf Approach" has been awarded with the FIN Forschungspreis of the University of Magdeburg (PDF)
Publications
Muhammad Attahir Jibril, Philipp Götze, David Broneske, and Kai-Uwe Sattler. Selective Caching: A Persistent Memory Approach for Multi-Dimensional Index Structures. HardBD&Active'20@ICDE.
Since the proposal of Persistent Memory, researchhas focused on tuning a variety of data management problems tothe inherent properties of Persistent Memory – namely persistencebut also compromised read/write performance. These propertiesparticularly affect the performance of index structures, since theyare subject to frequent updates and queries. Nevertheless, themain research focuses on adapting B-Trees and its derivatives toPersistent Memory properties, aiming to reach DRAM processingspeed exploiting the persistence property of Persistent Memory.However, most of the found techniques for B-Trees are notdirectly applicable to other tree-based index structures or evenmulti-dimensional index structures.To exploit Persistent Memory properties for arbitrary indexstructures, we proposeselective caching. It bases on a mixtureof dynamic and static caching of tree nodes in DRAM to reachnear-DRAM access speeds for index structures. In this paper,we investigate the opportunities as well as limitations of selectivecaching on the OLAP-optimized main-memory index structureElf. Our experiments show that selective caching is keeping upwith pure DRAM storage of Elf while guaranteeing persistence.David Broneske. Accelerating mono and multi-column selection predicates in modern main-memory database systems. PhD thesis, University of Magdeburg, May 2019.
Ever-since, database system engineers are striving for peak performance of their database operators. However, this goal is a major endeavor since database operators are influenced not only by the hardware (i.e., the executing processor or the memory hierarchy), but also by the workload (i.e., data distribution, selectivity, etc.). Especially in today’s world of main-memory database systems, there are new processing capabilities (e.g., advanced vector instructions such as AVX-512), new storage devices (e.g., Intel Optane as non-volatile RAM), or new diverse applications for data management (e.g., in-database machine learning) frequently introduced that become essential impact factors. Hence, a once optimal operator has to be frequently adapted with these new arising hardware and workloads.
A typical database operator that is frequently tuned by researchers is the selection operator, because selections are essential to reduce the load of subsequent operators and are usually one of the first operators that are executed in a query plan. Hence, a selection is working on the full amount of data – a fact that emphasizes the importance of tuning this data-intensive operator to avoid a serious bottleneck.
Although the selection operator is frequently tuned for arbitrary use cases mentioned above, there is no comprehensive and holistic way to tune this operator automatically. Furthermore, considering multiple selections on the same table, straight-forward implementations use candidate scans for several selection predicates. However, exploiting the interdependence and, hence, high selectivity is not investigated so far. In this thesis, we tackle the aforementioned challenges of (1) creating hardware-sensitive operator implementations automatically and (2) exploiting the relation between multiple selection predicates.
For solving the first challenge, we investigate the commonalities of different optimizations for arbitrary hardware and workloads on the example of the selection operator. As a result, we introduce the abstraction of code optimizations as a means to generate hardware-sensitive code variants automatically. The solution is completed by the concept of a tuning framework for operators in main-memory database systems.
As a solution for the second challenge, we propose to revive multi-dimensional index structures as a means to exploit the relation between selection predicates on several columns in main-memory database management systems. In order to allow for hardware-sensitivity – especially cache consciousness – we propose our main-memory index structure Elf. Elf is a tree structure combining prefix-redundancy elimination with an optimized memory layout explicitly designed for efficient main-memory access. Our experiments show that Elf is able to outperform several highly-potent baselines (including generated hardware-sensitive scans and state-of-the-art multi-dimensional index structures) by several orders of magnitude for reasonable selection predicates and queries from the standard OLAP benchmark TPC-H. However, our evaluation also identifies that an integration into the query engine of the main-memory database system MonetDB does not only show strengths but also limitations that any sort-based index structure is faced with.
Overall, the resulting approaches can be used in future query engines to form a Swiss army knife for arbitrary selection predicates. Hence, our contribution enriches a query engine far beyond current state-of-the-art-approaches by allowing for efficient execution of a single predicate (i.e., mono-column selection predicates) at bare-metal speed as well as exploiting the combined selectivity of several predicates (i.e., multi-column selection predicates).
David Broneske, Veit Köppen, Gunter Saake, and Martin Schäler. Efficient evaluation of multi-column selection predicates in main-memory. Transactions on Knowledge and Data Engineering, 31(7):1296–1311, July 2019.
David Broneske, Veit Köppen, Gunter Saake, and Martin Schäler. Accelerating multi-column selection predicates in main-memory – the Elf approach. In IEEE International Conference on Data Engineering (ICDE), pages 647 – 658, 2017. (PDF)
Veit Köppen, David Broneske, Gunter Saake, and Martin Schäler. Elf: A Main-Memory Structure for Efficient Multi-Dimensional Range and Partial Match Queries. Technical Report 002-2015, Otto-von-Guericke-University Magdeburg, Magdeburg, 2015.
Jonas Schneider. Analytic Performance Model of a Main-Memory Index Structure. In CoRR, arXiv.org, 2016.