Skip to main content
Skip table of contents

Massively Parallel Processing Installation and Configuration

Massively Parallel Processing (MPP) can be deployed using Helm, a package manager for Kubernetes. This guide walks you through adding the MPP Helm repository, installing MPP, upgrading it with custom configurations, and setting essential values in the values.yaml file.

We suggest adjusting the number of MPP worker nodes based on CPU usage. For more information, see MPP Sizing Recommendation.

S3, Azure, and Google storage key values can’t be upgraded without a cluster reboot.

Deployment and Configuration

Installation & Containerization

  • The CData Virtuality MPP is deployed using Kubernetes (K8s).

  • It can be deployed via a Helm chart for easy installation on:

    • Azure Kubernetes Service (AKS)

    • Amazon Elastic Kubernetes Service (EKS)

  • The MPP engine is containerized, with Docker images published to:

    • Docker Hub

    • Quay

  • The system includes three core containers:

    • Coordinator

    • Metastore

    • Worker(s)

Configuration Settings

  • Initial setup is script-based

  • Key configurations include:

    • Worker node settings (CPU, memory allocation)

    • Metastore setup

Pre-requisites

Before proceeding with the installation, please ensure you have the following:

  • A Kubernetes cluster running;

  • Helm installed on your local machine;

  • Access to cloud storage credentials (AWS, Azure, Google Cloud, or a CDV license file).

Step 1: Add the Helm Repository

Add the MPP Helm repository to your Helm configuration:

CODE
helm repo add mpp <path_to_chart>

This command adds the repository so that Helm can fetch the MPP chart when needed.

Step 2: Install MPP

To install MPP, run the following command:

CODE
helm install my-mpp mpp/mpp

This installs MPP with default settings. You may want to customize the configuration using a values.yaml file, as explained in the next steps.

Step 3: Configure MPP with values.yaml

Create a values.yaml file to define cloud storage credentials and other configurations. Below are the supported keys and their purposes:

CODE
S3_ACCESS_KEY: "your-aws-access-key"
S3_SECRET_KEY: "your-aws-secret-key"
AZURE_ABFS_STORAGE_ACCOUNT: "your-azure-storage-account"
AZURE_ABFS_ACCESS_KEY: "your-azure-access-key"
GOOGLE_CLOUD_KEY_FILE_PATH: "/path/to/google/cloud/key.json"
CDV_LICENSE_FILE_PATH: "/path/to/cdv/license.file"
  • S3_ACCESS_KEY & S3_SECRET_KEY: Required if using Amazon S3 for storage.

  • AZURE_ABFS_STORAGE_ACCOUNT & AZURE_ABFS_ACCESS_KEY: Required for Azure Blob File System storage.

  • GOOGLE_CLOUD_KEY_FILE_PATH: Path to the Google Cloud key file for authentication.

  • CDV_LICENSE_FILE_PATH: Path to the license file for using CDV.

Step 4: Upgrade MPP with Custom Configurations

After defining values.yaml, apply the custom configurations using the following command:

CODE
helm upgrade my-mpp mpp/mpp -f values.yaml

This upgrades the existing deployment with the specified configurations.

Verify the Installation

To ensure MPP is running correctly, check the status of the Helm release:

CODE
helm list

To inspect the running pods:

CODE
kubectl get pods

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.