Massively Parallel Processing (MPP) Component
Massively Parallel Processing (MPP) helps enhance performance when querying large datasets stored in blob storage containers, such as Amazon S3, Azure Blob Storage, or Google Cloud Storage. Querying these sources may be slow due to the necessity of processing individual files sequentially. In these cases, MPP provides efficient, distributed query execution, significantly improving query performance on large-scale datasets.
Key Features
MPP offers a high-performance, distributed SQL query engine optimized for big data processing to efficiently query structured and semi-structured data stored in cloud object storage and distributed file systems without the need for external query acceleration tools. Here are the main benefits of MPP:
Unified Data Access: you can query large datasets stored in blob storage directly from CData Virtuality, reducing the need for external compute resources.
Improved Query Performance: MPP architecture allows for parallelized data processing across multiple nodes, significantly reducing query times.
Cost Efficiency: No need to use third-party query accelerators or deploy additional infrastructure for large-scale queries.
Core Components of CData Virtuality MPP
Embedded Trino Engine
Trino (formerly PrestoSQL) is used as the distributed query engine.
Provides support for querying large-scale datasets efficiently.
Optimized for handling object storage file formats.
Supported Storage Solutions
The MPP engine enables querying across multiple data storage solutions, including:
Hive
Apache Iceberg
Delta Lake
Supported File Systems
CData Virtuality MPP integrates with widely used storage solutions:
Amazon S3
Azure Data Lake Storage (ADLS)
Google Cloud Storage
Hadoop Distributed File System (HDFS)
Supported File Formats
To ensure broad compatibility and efficient data processing, the engine supports:
Parquet
ORC
Avro
For instructions on how to install and configure MPP, please refer to this section of the Administration Guide, and for details on how to create an MPP data source, please see the subpage.