Google Sheets Authentication
Creating Project
1. Open Google Developers Console:
2. Click on the list of projects:
3. Click on New project to add a new one:
4. Give your project a name:
5. Create the project and select it, then go to ENABLE APIS AND SERVICES and choose Google Drive API and Google Sheets API:
The result should look like this:
6. Go to the OAuth consent screen. Choose the configuration and register your app:
Provide your details, continue, and save:
Add or remove scopes on the next step:
Check Summary and go back to the Dashboard.
Generating Credentials
1. Go to the Credentials page and click on CREATE CREDENTIALS:
2. Select OAuth client ID:
3. In the Application type field, select Desktop app and give it a name:
If you choose a different option, like Web application, you need to provide one of the authorized redirect URIs like https://localhost:
4. Save your Client ID and Client Secret:
Getting Authorization Tokens
1. Install a dummy connector:
/* Create connection */
CALL SYSADMIN.createConnection(name => 'gsheets_modular_empty', jbossCliTemplateName => 'ws', connectionOrResourceAdapterProperties => 'EndPoint=https://sheets.googleapis.com/v4/spreadsheets,SecurityType=OAuth2,ClientId=ClientId,AccessTokenEndpoint=https://accounts.google.com/o/oauth2/token,TranslatorName=ws,redirectUri=http://localhost:8080,RefreshToken=RefreshToken,ClientSecret=ClientSecret', encryptedProperties => '');;
/* Create data source */
CALL SYSADMIN.createDatasource(name => 'gsheets_modular_empty', translator => 'google_spreadsheets', modelProperties => 'skipInitializeMetadata=true', translatorProperties => '', encryptedModelProperties => '', encryptedTranslatorProperties => '');;
2. Get an authorization code by calling this procedure (remember to fill in client_id
, client_secret
, redirectUri
):
CALL "gsheets_modular_empty.helper_oauth_1"(
"client_id" => 'string_client_id' /* Client Id from OAuth credentials */,
"client_secret" => 'string_client_secret' /* Client Secret from OAuth credentials */,
"redirectUri" => 'http://localhost' /* Redirect URI from OAuth credentials */,
"in_src_name" => 'google_sheets' /* Desired connector name, default google_sheets */
);;
3. Copy your URL (it looks like https://accounts.google.com/o/oauth2/auth?response_type=code&prompt=consent&access_type=offline&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fspreadsheets&redirect_uri=http://localhost&client_id=297344387294-s2eern9nqg3jmoufdlh0dcai6ri8800s.apps.googleusercontent.com) and insert it into a browser to receive the code:
After permissions are granted, an access code will be returned:
4. Replace the access_code_acquired_earlier code in the script below. Remember that the access_code_acquired_earlier is valid for a short time:
CALL gsheets_modular_empty.helper_oauth_2(
code => 'access_code_acquired_earlier',
client_id => 'string_client_id' /* Client id from OAuth credentials */,
client_secret => 'string_client_secret' /* Client secret from OAuth credentials */,
in_src_name => 'google_sheets', /* Desired connector name, default google_sheets */
redirectUri => 'http://localhost' /* Redirect_uri from OAuth credentials */
);;
This call returns a code snippet to create a connection and a data source:
Google Sheets Connector Cache
The Google Spreadsheets Connector cache is a collection of locally cached metadata that speeds up working with Google Spreadsheets.
Spreadsheets are available as views on an installed modular connector.
There are two helper views that keep track of processed spreadsheets: successded_documents
and skipped_documents
. Any successfully completed document is also available as a view, any document that could not be processed has a link to it and the reason for the failure.
Connector Installation Tips
The next steps are applicable if there are a lot of spreadsheets in Google Drive (more than 100 tables with 2 worksheets per each at least). If there are only a few spreadsheets, the next steps can be skipped.
If you are not sure about the spreadsheet quantity, use skipInitializeMetadata=true
. This will skip spreadsheets caching and install the connector without getting information about tables. Then you can use a script that returns the number of spreadsheets, searched through the entire Google Drive.
SELECT count(*) FROM (
CALL "gsheets_modular_empty.FileList"(
"isSpreadsheet" => true
))s;;
If there are a lot of spreadsheets, cache filtering can be applied in the following ways:
- by an exact match of the name of the spreadsheet (names are separated by commas). Example:
modelProperties => 'skipInitializeMetadata=false,spreadsheet_names="report,report_sheet,Report MyDocument"'
- by a match of a part of a spreadsheet name (parts of the string are separated by commas). Example:
modelProperties => 'skipInitializeMetadata=false,spreadsheet_name_contains="doc,report"'
- the folder name is searched for one folder at a time, child or related directories are not scanned. Example:
modelProperties => 'skipInitializeMetadata=false,folders="report_folder1,report_folder_2"'
If several types of filters are applied, they are associated with an or condition (for example, folder = report or table_name = report_doc
).
Example
In this example, a data source with filtering options is created:
/* Create connection */
CALL SYSADMIN.createConnection(name => 'gsheets_modular_empty', jbossCliTemplateName => 'ws', connectionOrResourceAdapterProperties => 'EndPoint=https://sheets.googleapis.com/v4/spreadsheets,SecurityType=OAuth2,ClientId=ClientId,AccessTokenEndpoint=https://accounts.google.com/o/oauth2/token,TranslatorName=ws,redirectUri=http://localhost:8080,RefreshToken=RefreshToken,ClientSecret=ClientSecret', encryptedProperties => '');;
/* Create data source */
CALL SYSADMIN.createDatasource(name => 'gsheets_modular_empty', translator => 'google_spreadsheets', modelProperties => 'skipInitializeMetadata=false,folders="report_folder1,report_folder_2",spreadsheet_names="report,report_sheet,Report MyDocument",spreadsheet_name_contains="doc,report", translatorProperties => '', encryptedModelProperties => '', encryptedTranslatorProperties => '');;
If there are a lot of spreadsheets with several worksheets in each, and you want to cache them all, prepare for a long installation time. For this case, there is a procedure that displays current progress information. To use it, follow these steps:
1. Open two tabs in the Data Virtuality Studio.
2. Run the data source creation statements in the first tab.
3. Copy the showProgress
query into the second tab and run it.
CALL "gsheets_modular_empty.showProgress"(
"lines" => 200 /* Lines to scan for progress */
);;
Please keep in mind that any step can take some time if there are several thousand tables.