Google Analytics Connector
You are looking at an older version of the documentation. The latest version is found here.
The Google Analytics Connector is used to read data from the Google Analytics Web Service API and make it available in the Data Virtuality Server in the relational form. In this way, users can query a Google Analytics Data via SQL and write complex queries that are directed to multiple different data sources, e.g. a single SQL query can join tables from MySQL and combine them with data coming from Google Analytics.
Connector Configuration
Metadata
Before issuing the SQL queries to the Google Analytics Web Service, we need to configure a connection using the SYSADMIN.createConnection() procedure.
The connection can be created in several ways:
- via client email and private key:
CALL SYSADMIN.createConnection(name => 'analytics', jbossCliTemplateName => 'analytics', connectionOrResourceAdapterProperties => 'clientkeypath=<keypath>, clientemail=<clientemail>, timeout=60', encryptedProperties => '');;
CALL SYSADMIN.createDatasource(name => 'analytics', translator => 'analytics', modelProperties => '', translatorProperties => '', encryptedModelProperties => '', encryptedTranslatorProperties => '');;
- via OAuth2 providing an
authCode
:
CALL SYSADMIN.createConnection(name => 'analytics', jbossCliTemplateName => 'analytics', "connectionOrResourceAdapterProperties" => 'SecurityType=OAuth2, authCode=<authCode>, clientId=<clientId>, clientSecret=<clientSecret>, redirectUri=<redirectUri>, timeout=60', encryptedProperties => '');
CALL SYSADMIN.createDatasource(name => 'analytics', translator => 'analytics', modelProperties => '', translatorProperties => '', encryptedModelProperties => '', encryptedTranslatorProperties => '');;
- via OAuth2 providing a
refreshToken
:
CALL SYSADMIN.createConnection(name => 'analytics', jbossCliTemplateName => 'analytics', "connectionOrResourceAdapterProperties" => 'SecurityType=OAuth2,
RefreshToken=<RefreshToken>, ClientId=<ClientId>, ClientSecret=<ClientSecret>, redirectUri=<redirectUri>, timeout=60', encryptedProperties => '');
CALL SYSADMIN.createDatasource(name => 'analytics', translator => 'analytics', modelProperties => '', translatorProperties => '', encryptedModelProperties => '', encryptedTranslatorProperties => '');;
Paging Data
- analytics.batchSize
Standard Google Analytics API imposes a limit of 10.000 rows per request. To overcome this limit, Data Virtuality Server uses paging automatically whenever there are more than 10.000 rows in the result. In addition, the user can specify the importer.batchSize property, to change the maximum number of rows in a results page.
Usage
The Google Analytics data is represented as relational tables accounts, profiles and properties in the schema named as the alias name given in createDatasource().
Select Query
To query these tables a normal SQL select query like the following can be used:
SELECT * FROM analytics.profiles
Examples
SELECT
CAST(browser AS varchar(200)),
CAST(city AS varchar(200)),
CAST(visits AS integer),
CAST(visitors AS integer),
CAST(pageviews AS integer)
FROM analytics.profiles prof,
TABLE
(
CALL analytics.get
(
profile=>prof.id,
startDate=>timestampadd(SQL_TSI_DAY,-7,curdate()),
endDate=>curdate(),
metrics=>'visits,visitors,pageviews',
dimensions=>'browser,city'
)
)
w
;
SELECT
browser,city,visits,visitors,pageviews
FROM analytics.profiles p,
TABLE
(
CALL analytics.get
(
profile=>p.id,
startDate=>'2011-01-01',
endDate=>'2013-01-01',
metrics=>'visits,visitors,pageviews',
dimensions=>'browser,city'
)
)
w
WHERE p.id='11111111'
Example with indexed fields:
SELECT
visitors, array_get("goal(n)Value", 1), array_get("goal(n)Value", 2)
FROM
(
CALL analytics.get
(
'494899',
'2012-10-13',
'2012-11-14',
'visitors, goal1Value, goal2Value',
null,
1000,
0
)
)
a
Please note: in projection you need to use a generic metric/dimension name (like goal(n)Value
), but in metrics/dimensions parameters to an analytics.get()
call you should use particular metric/dimension name (like goal1Value
).
'get' Procedure
To query actual data from an Analytics account you can use get() stored procedure:
SELECT
a.browser, a.pageviews
FROM
(
CALL analytics.get
(
profile=>'64444786',
startDate=>'2012-09-01',
endDate=>'2013-09-30',
metrics=>'pageviews',
dimensions='browser',
sort='browser'
)
)
AS a
Here is the full list and meaning of the procedure parameters:
- profile - profile id
- startDate - start date to query data for
- endDate - end date to query data for
- metrics - a comma-separated list of requested metrics (specified without 'ga:' prefix)
- dimensions - a comma-separated list of requested dimensions; may be null or skipped (specified without 'ga:' prefix)
- maxResults - maximum results to return for the request; may be null or skipped
- startIndex - index of record to start returning data from; may be null or skipped
- filters - a comma-separated list of filters; may be null or skipped
- segment - an Analytics advanced segment to be applied to data; may be null or skipped
- sort - comma-separated list of dimensions or metrics that determine the sort order for Analytics data; may be null or skipped (specified without 'ga:' prefix)
- fields - selector specifying which fields to include in a partial response; may be null or skipped
- userIp - IP address of the site where the request originates. Use this if you want to enforce per-user limits; may be null or skipped
- quotaUser - available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters. Overrides userIp if both are provided; may be null or skipped
- key - API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token; may be null or skipped
Formally the procedure returns all possible columns (for all possible metrics and dimensions), but for all columns that aren't passed with metrics and dimensions parameters, it returns null. To cut off such redundant columns use appropriate projection in SELECT clause.
Data Types
Google Analytics data types are cast to Data Virtuality data types as follows:
* TIME
, CURRENCY
, FLOAT
, US_CURRENCY
, and PERCENT
data types are mapped to Double;
* INTEGER
data type is mapped to Integer
;
* String
is mapped to String
.
See Also
Split Google Analytics Campaign and Keyword Fields on the Fly in Columns for a guide on how to expand campaign names and keywords like DE_DE_foo_bar_11112013_baz to columns on the fly
Google Analytics Historical Data Using Batch Update and Avoiding Sampling for a workaround in case your Google Analytics data is just too large
Create Google Analytics Service Account for Data Source Connection and how to connect Google Analytics as data source using this service account
Generate Base64 Representation of Private Key File for Google Analytics if your private key is stored in the p12 format
Google Analytics: How to migrate from Reporting API v3 to v4, a step-by-step guide
Improving Query Performance Using the Filter Parameter for Google Analytics for a short, but useful guide