Home
People
Publications
|
![]() |
Refereed International Conference PublicationsSPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMs [abstract] (ACM)
Recent advances in Dual In-line Memory Modules (DIMMs) allow DIMMs to support Processing-In-DIMM (PID) by placing In-DIMM Processors (IDPs) near their memory banks. Prior studies have shown that in-memory joins can benefit from PID by offloading their operations onto the IDPs and exploiting the high internal memory bandwidth of DIMMs. Aimed at evenly balancing the computational loads between the IDPs, the existing algorithms perform IDP-wise global partitioning on input tables and then make each IDP process a partition of the input tables. Unfortunately, we find that the existing PID join algorithms achieve low performance and scalability with skewed input tables. With skewed input tables, the IDP-wise global partitioning incurs imbalanced loads between the IDPs, making the IDPs remain idle until the heaviest-load IDP completes processing its partition. To fully exploit the IDPs for accelerating in-memory joins involving skewed input tables, therefore, we need a new PID join algorithm which achieves high skew resistance by mitigating the imbalanced inter-IDP loads. In this paper, we present SPID-Join, a skew-resistant PID join algorithm which exploits two parallelisms inherent in DIMM architectures, namely bank- and rank-level parallelisms. By replicating join keys across the banks within a rank and across ranks, SPID-Join significantly increases the internal memory bandwidth and computational throughput allocated to each join key, improving the load balance between the IDPs and accelerating join executions. SPID-Join exploits the bank- and the rank-level parallelisms to minimize join key replication overheads and support a wider range of join key replication ratios. Despite achieving high skew resistance, SPID-Join exhibits a trade-off between the join key replication ratio and the join execution latency, making the best-performing join key replication ratio depend on join and PID system configurations. We, therefore, augment SPID-Join with a cost model which identifies the best-performing join key replication ratio for given join and PID system configurations. By accurately modeling and scaling the IDPs' throughput and the inter-IDP communication bandwidth, the cost model accurately captures the impact of the join key replication ratio on SPID-Join. Our experimental results using eight UPMEM DIMMs, which collectively provide a total of 1,024 IDPs, show that SPID-Join achieves up to 10.38x faster join executions over PID-Join, the state-of-the-art PID join algorithm, with highly skewed input tables.
|