Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications

Date:14-06-2022   |   【Print】 【close

Spark SQL is a Spark module for structured data processing, which has been widely deployed in industry but it is challenging to tune its performance. Existing machine learning tuning methods are difficult to be applied in practice because of the high time overhead and the failure to adapt to the changes in the amount of data to be processed. 

To address these problems, a research team led by Prof. YU Zhibin from the Shenzhen Institute of Advanced Technology (SIAT) of the Chinese Academy of Sciences, proposed a low time overhead automatic configuration optimization method named Low-Overhead Online Configuration Auto-Tuning (LOCAT), which can significantly reduce the optimization time of the state-of-the-art approaches and dramatically improve performance. 

The result was published at SIGMOD 2022, which is a leading international forum for database researchers, practitioners, developers, and users. 

Researchers firstly design query and configuration parameter sensitivity analysis techniques for LOCAT. Queries that are insensitive to configuration parameters are identified and removed from a given workload when training samples are collected.  

"For the remaining queries, LOCAT calculated correlation coefficients to identify important configuration parameters,"said Prof. YU,"and then applies kernel principal component analysis to reduce the dimension of configuration parameter search." 

Finally, the LOCAT designs bayesian optimization aware of the dataset size to search for the optimal configuration so that performance can be automatically optimized based on the size of the dataset.  

The experimental results on the ARM cluster showed that the LOCAT accelerated the optimization procedures of the state-of-the-art approaches by at least 4.1x and up to 9.7x, moreover, the LOCAT improved the application performance by at least 1.9x and up to 2.4x. On the x86 cluster, LOCAT showed similar results to those on the ARM cluster. 

Figure. An Overview of LOCAT. (Image by Prof. YU Zhibin)

Media Contact:
ZHANG Xiaomin