Skip to main content
Skip table of contents

Data Sources Statistics

You are looking at an older version of the documentation. The latest version is found here.

In addition to gathering query frequencies, an internal statistics component periodically scans the following parameters of the original data sources:

  1. Table statistics
    1. total number of records
  2. Column statistics
    1. number of distinct values throughout the column
    2. number of null values throughout the column (where applicable)
    3. minimum column value (where applicable)
    4. maximum column value (where applicable)

The gathered statistics can help the query engine make the right decision for optimizing queries in many cases.

The Data Virtuality Server has several special stored procedures for gathering statistics.

Gathering Statistics for a Particular Data Source Once

SQL
CREATE FOREIGN PROCEDURE gatherStatistics(IN datasource string, IN gatherTableCardinalities boolean NOT NULL,
IN gatherColumnCardinalities boolean NOT NULL, IN gatherNetworkStats boolean NOT NULL, IN excludeTables string, IN gatherForOnlyTablesWithoutStats boolean)

This procedure is synchronous and takes the following parameters:

ParameterDescription
datasourceData source name for that the statistics should be gathered
gatherTableCardinalitiesWhether the tables' statistics should be gathered or not
gatherColumnCardinalitiesWhether the columns' statistics should be gathered or not
gatherNetworkStatsWhether the network statistics should be gathered or not
excludeTablesTables to be excluded from the gathered statistics (regular expression)
gatherForOnlyTablesWithoutStats If TRUE, statistics will only be collected for tables which have no statistics yet; default: FALSE

Creating a Job to Gather Statistics for a Particular Data Source

SQL
CREATE FOREIGN PROCEDURE createGatherStatisticsJob(IN "datasource" string, IN gatherTableCardinalities boolean NOT NULL, IN gatherColumnCardinalities boolean NOT NULL, IN gatherNetworkStats boolean NOT NULL, IN excludeTables string, IN gatherForOnlyTablesWithoutStats boolean, IN parallelRunsAllowed integer, IN retryCounter integer, IN retryDelay integer, OUT jobID biginteger NOT NULL RESULT)

This procedure is synchronous and takes the following parameters:

ParameterDescription
datasourceData source name for that the statistics should be gathered
gatherTableCardinalitiesWhether the tables' statistics should be gathered or not
gatherColumnCardinalitiesWhether the columns' statistics should be gathered or not
gatherNetworkStatsWhether the network statistics should be gathered or not
excludeTablesList of tables for which the statistics are not collected; supports wildcards like <SCHEMA_NAME>.*
parallelRunsAllowedNumber of simultaneous job executions; default: 1
retryCounterNumber of retry attempts to perform if the job fails
retryDelayDelay between the job retry attempts, in seconds
gatherForOnlyTablesWithoutStats If TRUE, statistics will only be collected for tables which have no statistics yet; default: FALSE
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.