Massively Parallel Processing Installation and Configuration
Massively Parallel Processing (MPP) can be deployed using Helm, a package manager for Kubernetes. This guide walks you through adding the MPP Helm repository, installing MPP, upgrading it with custom configurations, and setting essential values in the values.yaml file.
We suggest adjusting the number of MPP worker nodes based on CPU usage. For more information, see MPP Sizing Recommendation.
S3, Azure, and Google storage key values can’t be upgraded without a cluster reboot.
Deployment and Configuration
Installation & Containerization
The CData Virtuality MPP is deployed using Kubernetes (K8s).
It can be deployed via a Helm chart for easy installation on:
Azure Kubernetes Service (AKS)
Amazon Elastic Kubernetes Service (EKS)
The MPP engine is containerized, with Docker images published to:
Docker Hub
Quay
The system includes three core containers:
Coordinator
Metastore
Worker(s)
Configuration Settings
Initial setup is script-based
Key configurations include:
Worker node settings (CPU, memory allocation)
Metastore setup
Pre-requisites
Before proceeding with the installation, please ensure you have the following:
A Kubernetes cluster running;
Helm installed on your local machine;
Access to cloud storage credentials (AWS, Azure, Google Cloud, or a CDV license file).
Step 1: Add the Helm Repository
Add the MPP Helm repository to your Helm configuration:
helm repo add mpp <path_to_chart>
This command adds the repository so that Helm can fetch the MPP chart when needed.
Step 2: Install MPP
To install MPP, run the following command:
helm install my-mpp mpp/mpp
This installs MPP with default settings. You may want to customize the configuration using a values.yaml file, as explained in the next steps.
Step 3: Configure MPP with values.yaml
Create a values.yaml file to define cloud storage credentials and other configurations. Below are the supported keys and their purposes:
S3_ACCESS_KEY: "your-aws-access-key"
S3_SECRET_KEY: "your-aws-secret-key"
AZURE_ABFS_STORAGE_ACCOUNT: "your-azure-storage-account"
AZURE_ABFS_ACCESS_KEY: "your-azure-access-key"
GOOGLE_CLOUD_KEY_FILE_PATH: "/path/to/google/cloud/key.json"
CDV_LICENSE_FILE_PATH: "/path/to/cdv/license.file"
S3_ACCESS_KEY & S3_SECRET_KEY: Required if using Amazon S3 for storage.
AZURE_ABFS_STORAGE_ACCOUNT & AZURE_ABFS_ACCESS_KEY: Required for Azure Blob File System storage.
GOOGLE_CLOUD_KEY_FILE_PATH: Path to the Google Cloud key file for authentication.
CDV_LICENSE_FILE_PATH: Path to the license file for using CDV.
Step 4: Upgrade MPP with Custom Configurations
After defining values.yaml, apply the custom configurations using the following command:
helm upgrade my-mpp mpp/mpp -f values.yaml
This upgrades the existing deployment with the specified configurations.
Verify the Installation
To ensure MPP is running correctly, check the status of the Helm release:
helm list
To inspect the running pods:
kubectl get pods