Data Sources Statistics
You are looking at an older version of the documentation. The latest version is found here.
In addition to gathering query frequencies, an internal statistics component periodically scans the following parameters of the original data sources:
- Table statistics
- total number of records
- Column statistics
- number of distinct values throughout the column
- number of null values throughout the column (where applicable)
- minimum column value (where applicable)
- maximum column value (where applicable)
The gathered statistics can help the query engine make the right decision for optimizing queries in many cases.
The Data Virtuality Server has several special stored procedures for gathering statistics.
Gathering Statistics for a Particular Data Source Once
CREATE FOREIGN PROCEDURE gatherStatistics(IN datasource string, IN gatherTableCardinalities boolean NOT NULL,
IN gatherColumnCardinalities boolean NOT NULL, IN gatherNetworkStats boolean NOT NULL, IN excludeTables string, IN gatherForOnlyTablesWithoutStats boolean)
This procedure is synchronous and takes the following parameters:
Parameter | Description |
---|---|
datasource | Data source name for that the statistics should be gathered |
gatherTableCardinalities | Whether the tables' statistics should be gathered or not |
gatherColumnCardinalities | Whether the columns' statistics should be gathered or not |
gatherNetworkStats | Whether the network statistics should be gathered or not |
excludeTables | Tables to be excluded from the gathered statistics (regular expression) |
gatherForOnlyTablesWithoutStats | If TRUE , statistics will only be collected for tables which have no statistics yet; default: FALSE |
Creating a Job to Gather Statistics for a Particular Data Source
CREATE FOREIGN PROCEDURE createGatherStatisticsJob(IN "datasource" string, IN gatherTableCardinalities boolean NOT NULL, IN gatherColumnCardinalities boolean NOT NULL, IN gatherNetworkStats boolean NOT NULL, IN excludeTables string, IN gatherForOnlyTablesWithoutStats boolean, IN parallelRunsAllowed integer, IN retryCounter integer, IN retryDelay integer, OUT jobID biginteger NOT NULL RESULT)
This procedure is synchronous and takes the following parameters:
Parameter | Description |
---|---|
datasource | Data source name for that the statistics should be gathered |
gatherTableCardinalities | Whether the tables' statistics should be gathered or not |
gatherColumnCardinalities | Whether the columns' statistics should be gathered or not |
gatherNetworkStats | Whether the network statistics should be gathered or not |
excludeTables | List of tables for which the statistics are not collected; supports wildcards like <SCHEMA_NAME>.* |
parallelRunsAllowed | Number of simultaneous job executions; default: 1 |
retryCounter | Number of retry attempts to perform if the job fails |
retryDelay | Delay between the job retry attempts, in seconds |
gatherForOnlyTablesWithoutStats | If TRUE , statistics will only be collected for tables which have no statistics yet; default: FALSE |