Skip to main content
Skip table of contents

Massively Parallel Processing (MPP) Component

Massively Parallel Processing (MPP) helps enhance performance when querying large datasets stored in blob storage containers, such as Amazon S3, Azure Blob Storage, or Google Cloud Storage. Querying these sources may be slow due to the necessity of processing individual files sequentially. In these cases, MPP provides efficient, distributed query execution, significantly improving query performance on large-scale datasets.

Key Features

MPP offers a high-performance, distributed SQL query engine optimized for big data processing to efficiently query structured and semi-structured data stored in cloud object storage and distributed file systems without the need for external query acceleration tools. Here are the main benefits of MPP:

  • Unified Data Access: you can query large datasets stored in blob storage directly from CData Virtuality, reducing the need for external compute resources.

  • Improved Query Performance: MPP architecture allows for parallelized data processing across multiple nodes, significantly reducing query times.

  • Cost Efficiency: No need to use third-party query accelerators or deploy additional infrastructure for large-scale queries.

Core Components of CData Virtuality MPP

Embedded Trino Engine

  • Trino (formerly PrestoSQL) is used as the distributed query engine.

  • Provides support for querying large-scale datasets efficiently.

  • Optimized for handling object storage file formats.

Supported Storage Solutions

The MPP engine enables querying across multiple data storage solutions, including:

  • Hive

  • Apache Iceberg

  • Delta Lake

Supported File Systems

CData Virtuality MPP integrates with widely used storage solutions:

  • Amazon S3

  • Azure Data Lake Storage (ADLS)

  • Google Cloud Storage

  • Hadoop Distributed File System (HDFS)

Supported File Formats

To ensure broad compatibility and efficient data processing, the engine supports:

  • Parquet

  • ORC

  • Avro

For instructions on how to install and configure MPP, please refer to this section of the Administration Guide, and for details on how to create an MPP data source, please see the subpage.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.