Please wait a minute...
Big Data Mining and Analytics  2021, Vol. 4 Issue (4): 252-265    DOI: 10.26599/BDMA.2021.9020009
    
LotusSQL: SQL Engine for High-Performance Big Data Systems
Xiaohan Li(),Bowen Yu(),Guanyu Feng(),Haojie Wang(),Wenguang Chen*()
Department of Computer Science and Technology, Tsinghua University, China
Download: PDF (2651 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

In recent years, Apache Spark has become the de facto standard for big data processing. SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language (SQL). SparkSQL provides convenient data processing interfaces. Despite its efficient optimizer, SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization. Adopting native languages such as C++ could help to avoid such bottlenecks. Benefiting from a bare-metal runtime environment and template usage, systems with C++ interfaces usually achieve superior performance. However, the complexity of native languages also increases the required programming and debugging efforts. In this work, we present LotusSQL, an engine to provide SQL support for dataset abstraction on a native backend Lotus. We employ a convenient SQL processing framework to deal with frontend jobs. Advanced query optimization technologies are added to improve the quality of execution plans. Above the storage design and user interface of the compute engine, LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend. Evaluation results show that LotusSQL achieves a speedup of up to 9× in certain queries and outperforms Spark SQL in a standard query benchmark by more than 2× on average.



Key wordsbig data      C++      Structured Query Language (SQL)      query optimization     
Received: 11 May 2021      Published: 30 August 2021
Corresponding Authors: Wenguang Chen     E-mail: xh-li18@mails.tsinghua.edu.cn;yubw15@mails.tsinghua.edu.cn;fgy18@mails.tsinghua.edu.cn;wang-hj18@mails.tsinghua.edu.cn;cwg@tsinghua.edu.cn
About author: Xiaohan Li received the bachelor degree from Tsinghua University in 2018. She is currently pursuing the master degree at Tsinghua University. Her research interests include parallel computing and big data systems.|Bowen Yu received the bachelor degree from Northwestern Polytechnical University in 2015. He is currently pursuing the PhD degree at Tsinghua University. His research focuses on big data systems.|Guanyu Feng received the bachelor degree from Tsinghua University in 2018. He is currently pursuing the PhD degree at Tsinghua University. His research interests include graph processing, graph database, and streaming system.|Haojie Wang received the bachelor degree from Tsinghua University in 2015. He is currently pursuing the PhD degree at Tsinghua University. His research interests include compiler, program analysis, and AI compiler.|Wenguang Chen currently is a professor at Tsinghua University. He received the bachelor and PhD degrees in computer science from Tsinghua University in 1995 and 2000, respectively. His research focuses on parallel and distributed systems and programming systems.
Cite this article:

Xiaohan Li,Bowen Yu,Guanyu Feng,Haojie Wang,Wenguang Chen. LotusSQL: SQL Engine for High-Performance Big Data Systems. Big Data Mining and Analytics, 2021, 4(4): 252-265.

URL:

http://bigdata.tsinghuajournals.com/10.26599/BDMA.2021.9020009     OR     http://bigdata.tsinghuajournals.com/Y2021/V4/I4/252

Fig. 1 Workflow overview.
Fig. 2 Execution plans for TPC-H Q3.
LogicalOpPhysicalOpDescription
TableScanLotusTableScanRead a table (dataset) from the file system.
FilterLotusFilterFilter a table by given condition.
ProjectLotusSelectSelect some columns from a table.
LotusMapMap table rows by given expression.
AggregateLotusAggregateAggregate all rows by given function.
LotusHash AggregateAggregate rows by given group and function via HashMap.
JoinLotusCartesian ProductCalculate cartesian product of two tables.
LotusBroadcast HashJoinJoin two tables via broadcasting one to the other and HashMap.
LotusShuffle HashJoinJoin two tables via re-partitioning tables and using HashMap.
SortLotusSortSort all rows by given reference key and direction.
LotusTopKFind top-k rows by given reference key and direction.
Table 1 Operator list.
Fig. 3 Calcite decorrelation.
Fig. 4 Decorrelation example.
Join TypeRowCount upper bound
InnerJoin, RightJoinRTable:RowCount
OuterJoin, LeftJoinLTable.RowCount+RTable.RowCount-1
SemiJoin, AntiJoinLTable:RowCount
Table 2 RowCount estimation for join.
ItemDiscription
CPUIntel(R) Xeon(R) E5-2680 v4
Frequency2.40 GHz
Pyhsical Cores28
Virtual Cores56
NUMA Nodes2
Operating SystemUbuntu 16.04.10
Main Memory512 GB
Disk6 TB NVMe SSD
Table 3 Experiment environment.
ItemDiscription
Spark Version3.0.1
Hadoop Version2.7
Java Version11.0.9
Scala Version2.12.10
Executors8
Executor Cores7
Table 4 Spark configuration.
Fig. 5 TPC-H computing time.
Fig. 6 TPC-H memory usage.
Fig. 7 System hierarchy.
[1]   Apache Hadoop, Apache hadoop, , 2021.
[2]   Ekanayake J., Li H., Zhang B. J., Gunarathne T., Bae S. H., Qiu J., and Fox G., Twister: A runtime for iterative mapreduce, in Proc. 19th ACM Int. Symp. on High Performance Distributed Computing, Chicago, IL, USA, 2010, pp. 810-818.
[3]   Bu Y. Y., Howe B., Balazinska M., and Ernst M. D., HaLoop: Efficient iterative data processing on large clusters, Proc. VLDB Endowm., vol. 3, nos. 1&2, pp. 285-296, 2010.
[4]   Zaharia M., Chowdhury M., Das T., Dave A., Ma J., McCauly M., Franklin M. J., Shenker S., and Stoica I., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, in Proc. 9th USENIX Conf. on Networked Systems Design and Implementation, Berkeley, CA, USA, 2012, pp. 15-28.
[5]   Yang F., Li J. F., and Cheng J., Husky: Towards a more efficient and expressive distributed computing framework, Proc. VLDB Endowm., vol. 9, no. 5, pp. 420-431, 2016.
[6]   Carbone P., Katsifodimos A., Ewen S., Markl V., Haridi S., and Tzoumas K., Apache flinkTM: Stream and batch processing in a single engine, Bull. IEEE Comput. Soc. Tech. Comm. Data Eng., vol. 36, no. 4, pp. 28-38, 2015.
[7]   Armbrust M., Xin R. S., Lian C., Huai Y., Liu D., Bradley J. K., Meng X. R., Kaftan T., Franklin M. J., Ghodsi A., et al., SparkSQL: Relational data processing in spark, in Proc. 2015 ACM SIGMOD Int. Conf. on Management of Data, Victoria, Australia, 2015, pp. 1383-1394.
[8]   Anderson M., Smith S., Sundaram N., Capot M.? Zhao Z. G., Dulloor S., Satish N., and Willke T. L., Bridging the gap between HPC and big data frameworks, Proc. VLDB Endowme., vol. 10, no. 8, pp. 901-912, 2017.
[9]   Essertel G. M., Tahboub R. Y., Decker J. M., Brown K. J., Olukotun K., and Rompf T., Flare: Optimizing apache spark with native compilation for scale-up architectures and medium-size data, in Proc. of the 13th USENIX Conf. on Operating Systems Design and Implementation, Berkeley, CA, USA, 2018, pp. 799-815.
[10]   Lu L., Shi X. H., Zhou Y. L., Zhang X., Jin H., Pei C., He L. G., and Geng Y. Z., Lifetime-based memory management for distributed data processing systems, Proc. VLDB Endowm., vol. 9, no. 12, pp. 936-947, 2016.
[11]   Navasca C., Cai C., Nguyen K., Demsky B., Lu S., Kim M., and Xu G. H., Gerenuk: Thin computation over big native data using speculative program transformation, in Proc. 27th ACM Symp. on Operating Systems Principles, Ontario, Canada, 2019, pp. 538-553.
[12]   Arnold J., Glavic B., and Raicu I., A high-performance distributed relational database system for scalable OLAP processing, in 2019 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Rio de Janeiro, Brazil, 2019, pp. 738-748.
[13]   Bingmann T., Axtmann M., J?bstl E., Lamm S., Nguyen H. C., Noe A., Schlag S., Stumpp M., Sturm T., and Sanders P., Thrill: High-performance algorithmic distributed batch data processing with C++, in 2016 IEEE Int. Conf. on Big Data (Big Data), Washington, DC, USA, 2016, pp. 172-183.
[14]   Begoli E., Camacho-Rodríguez J., Hyde J., Mior M. J., and Lemire D., Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources, in Proc. 2018 Int. Conf. on Management of Data, Houston, TX, USA, 2018, pp. 221-230.
[15]   Graefe G., and McKenna W. J., The volcano optimizer generator: extensibility and efficient search, in Proc. IEEE 9th Int. Conf. on Data Engineering, 1993, Vienna, Austria, pp. 209-218.
[16]   Graefe G., The cascades framework for query optimization, Data Eng. Bull., vol. 18, no. 3, pp. 19-29, 1995.
[17]   Neumann T., Efficiently compiling efficient query plans for modern hardware, Proc. VLDB Endowm., vol. 4, no. 9, pp. 539-550, 2011.
[18]   Wikipedia, De Morgan’s, , 2021.
[19]   The Transaction Processing Performance Council, TPC-H vesion 2 and version 3, , 2021.
[20]   Ghemawat S., Gobioff H., and Leung S. T., The Google file system, in Proc. 19th ACM Symp. on Operating Systems Principles, Bolton Landing, NY, USA, 2003, pp. 29-43.
[21]   Dean J. and Ghemawat S., MapReduce: Simplified data processing on large clusters, in 6th Symp. on Operating System Design and Implementation (OSDI 2004), San Francisco, CA, USA, 2004, pp. 137-150.
[22]   Shvachko K., Kuang H. R., Radia S., and Chansler R., The Hadoop distributed file system, in 2010 IEEE 26th Symp. on Mass Storage Systems and Technologies (Msst), Incline Village, NV, USA, 2010, pp. 1-10.
[23]   Swarna C. and Ansari Z., Apache pig-A data flow framework based on Hadoop map reduce, IJETT J., vol. 50, no. 5, pp. 271-275, 2017.
[24]   Thusoo A., Sarma J. S., Jain N., Shao Z., Chakka P., Zhang N., Antony S., Liu H., and Murthy R., Hive-A petabyte scale data warehouse using Hadoop, in 2010 IEEE 26th Int. Conf. on Data Engineering (ICDE 2010), Long Beach, CA, USA, 2010, pp. 996-1005.
[25]   Kornacker M., Behm A., Bittorf V., Bobrovytsky T., Ching C., Choi A., Erickson J., Grund M., Hecht D., Jacobs M., et al., Impala: A modern, open-source SQL engine for Hadoop, presented at 7th Biennial Conf. on Innovative Data Systems Research (CIDR’15), Asilomar, CA, USA, 2015.
[26]   Xin R. S., Rosen J., Zaharia M., Franklin M. J., Shenker S., and Stoica I., Shark: SQL and rich analytics at scale, in Proc. 2013 ACM SIGMOD Int. Conf. on Management of Data, New York, NY, USA, 2013, pp. 13-24.
[27]   Behm A., Borkar V. R., Carey M. J., Grover R., Li C., Onose N., Vernica R., Deutsch A., Papakonstantinou Y., and Tsotras V. J., ASTERIX: Towards a scalable, semistructured data platform for evolving-world models, Distrib. Parallel Databases, vol. 29, no. 3, pp. 185-216, 2011.
[28]   Alexandrov A., Bergmann R., Ewen S., Freytag J. C., Hueske F., Heise A., Kao O., Leich M., Leser U., Markl V., et al., The stratosphere platform for big data analytics, VLDB J., vol. 23, no. 6, pp. 939-964, 2014.
[29]   Crotty A., Galakatos A., Dursun K., Kraska T., Cetintemel U., and Zdonik S., Tupleware: “Big” Data, Big Analytics, Small Clusters, presented at 7th Biennial Conf. on Innovative Data Systems Research (CIDR 2015), Asilomar, CA, USA, 2015.
[30]   Chaiken R., Jenkins B., Larson P. ?., Ramsey B., Shakib D., Weaver S., and Zhou J. R., SCOPE: Easy and efficient parallel processing of massive data sets, Proc. VLDB Endowm., vol. 1, no. 2, pp. 1265-1276, 2008.
[31]   Lorie R. A., XRM-An Extended (N-ary) Relational Memory. Yorktown Heights: IBM, 1974.
[32]   Kemper A. and Neumann T., HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots, in 2011 IEEE 27th Int. Conf. on Data Engineering, Hannover, Germany, 2011, pp. 195-206.
[33]   McSherry F., Isard M., and Murray D. G., Scalability! But at what COST? presented at 15th Workshop on Hot Topics in Operating Systems (HotOS XV), Kartause Ittingen, Switzerland, 2015.
[1] Yuchen Zhang,Xiujuan Lei,Zengqiang Fang,Yi Pan. CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization[J]. Big Data Mining and Analytics, 2020, 3(4): 280-291.
[2] Guanlin Zhai,Yan Yang,Heng Wang,Shengdong Du. Multi-Attention Fusion Modeling for Sentiment Analysis of Educational Big Data[J]. Big Data Mining and Analytics, 2020, 3(4): 311-319.
[3] Mohammad Sultan Mahmud, Joshua Zhexue Huang, Salman Salloum, Tamer Z. Emara, Kuanishbay Sadatdiynov. A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis[J]. Big Data Mining and Analytics, 2020, 3(2): 85-101.
[4] Mingda Li, Hongzhi Wang, Jianzhong Li. Mining Conditional Functional Dependency Rules on Big Data[J]. Big Data Mining and Analytics, 2020, 03(01): 68-84.
[5] Sunil Kumar, Maninder Singh. A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem[J]. Big Data Mining and Analytics, 2019, 2(4): 240-247.
[6] Thosini Bamunu Mudiyanselage, Yanqing Zhang. Feature Selection with Graph Mining Technology[J]. Big Data Mining and Analytics, 2019, 2(2): 73-82.
[7] Sunil Kumar, Maninder Singh. Big Data Analytics for Healthcare Industry: Impact, Applications, and Tools[J]. Big Data Mining and Analytics, 2019, 2(1): 48-57.
[8] Baojun Zhou, Jie Li, Xiaoyan Wang, Yu Gu, Li Xu, Yongqiang Hu, Lihua Zhu. Online Internet Traffic Monitoring System Using Spark Streaming[J]. Big Data Mining and Analytics, 2018, 1(1): 47-56.
[9] Xuedi Qin, Yuyu Luo, Nan Tang, Guoliang Li. DeepEye: An Automatic Big Data Visualization Framework[J]. Big Data Mining and Analytics, 2018, 1(1): 75-82.
[10] Rossella Arcucci, Christopher Pain, Yi-Ke Guo. Effective Variational Data Assimilation in Air-Pollution Prediction[J]. Big Data Mining and Analytics, 2018, 01(04): 297-307.
[11] Qianyu Meng, Kun Wang, Xiaoming He, Minyi Guo. QoE-Driven Big Data Management in Pervasive Edge Computing Environment[J]. Big Data Mining and Analytics, 2018, 01(03): 222-233.
[12] Yan Yang, Hao Wang. Multi-view Clustering: A Survey[J]. Big Data Mining and Analytics, 2018, 01(02): 83-107.
[13] Ling Hu, Qiang Ni, Feng Yuan. Big Data Oriented Novel Background Subtraction Algorithm for Urban Surveillance Systems[J]. Big Data Mining and Analytics, 2018, 01(02): 137-145.