File-based Connectors

You are looking at an older version of the documentation. The latest version is found here.

The file-based connectors bridge between Data Virtuality Server and local and remote file storage systems.

Type name	Description	Specific features
`ufile`	Accessing and managing files at local filesystem
`ftp`	Accessing and managing files via FTP
`sftp`	Accessing and managing files via SFTP
`scp`	Accessing and managing files via SCP
`s3`	Accessing and managing files stored at Amazon AWS S3 storage
`blob`	Accessing and managing files stored at Azure Blob Storage

Metadata

Before issuing queries to the file data source, we need to configure the data source using the appropriate Data Virtuality Server procedures:

SQL

CALL SYSADMIN.createConnection( name => <alias>, jbossCLITemplateName => <type name>, 'connectionOrResourceAdapterProperties => '<connector specific setting depending on type>');
CALL SYSADMIN.createDatasource( name => <alias>, translator => 'ufile', modelProperties => '', translatorProperties => '');

The translator has to be ufile for all file-based data sources.

The Data Virtuality Studio provides a comfortable way to connect to data sources using graphical wizards. In order to do so, use the corresponding data source type under the File section in the Add data source wizard.

Usage

File data sources are utilizing stored procedures shared by all file-based connectors to gather data from their sources. These data may be further processed by the Data Virtuality Server. This is commonly done with table functions (like TABLE, TEXTTABLE, and XMLTABLE) in combination with parsing functions depending on a data structure.

The Data Virtuality Studio provides a variety of Query Builders for that purpose. They allow an easy specification for file encoding and structure of data. These Query Builders are accessible via SQL editor -> Tools.

In Amazon S3, buckets and objects are the primary resources, and objects are stored in buckets. Amazon S3 has a flat structure instead of a hierarchy like you would see in a file system. However, for the sake of organizational simplicity, Data Virtuality supports the folder concept as a means of grouping objects. Data Virtuality does this by using a shared name prefix for the grouped objects. In other words, the grouped objects have names that begin with a common string. This common string, or shared prefix, is the folder name.

Stored Procedures Shared by All File-based Connectors

To view the full table, click the expand button in its top right corner

Procedure name	Input parameter (data type / nulls allowed)	Example call & purpose
`getFiles`	`pathAndPattern (string / not null)`	Retrieves all files as `blobs` with an optional extension at the given path. If the extension path is specified, then it will filter all of the files in the directory referenced by the base path. If the extension pattern is not specified and the path is a directory, all files in the directory will be returned. Otherwise, the single file referenced will be returned. Supported wildcards: * for all file-based connectors ? for S3 and Azure Blob connectors For the S3 and Azure Blob connectors, the wildcards can be used anywhere - in the file name or path. Usage: SQL `CALL <alias>.getFiles('pathAndPattern')` Example: SQL `CALL s3.getFiles('marketing/*.csv')`
`getTextFiles`	`pathAndPattern (string / not null)`	Retrieves all files as `clobs` with an optional extension at the given path. All the same files as with `getFiles` will be retrieved, the only difference is that the results will be `clob` values. Supported wildcards: * for all file-based connectors ? for S3 and Azure Blob connectors For the S3 and Azure Blob connectors, the wildcards can be used anywhere - in the file name or path. Usage: SQL `CALL <alias>.getTextFiles('pathAndPattern')` Example: SQL `CALL s3.getTextFiles('marketing/*.csv')`
`saveFile`	`pathAndPattern (string / not null)` `value (clob, blob, xml / not null)`	Saves the `clob`, `blob` or `xml` value to a given path. The path should reference a new file location or an existing file to overwrite completely. Both absolute and relative paths are valid as input. Usage: SQL `CALL <alias>.saveFile('pathAndPattern', value)` `PathAndPattern` has slightly different meaning here than for the other file-based connector procedures. Actually, it's specifying path and filename, not a pattern. Example: SQL `CALL <alias>.saveFile('marketing/july_2016.csv', (SELECT cast(data as xml) FROM report_server.marketing_data_july_2016));`
`listFiles`	`pathAndPattern (string / not null)`	Lists all files from the specified directory. Supported wildcards: * for all file-based connectors ? for S3 and Azure Blob connectors For the S3 and Azure Blob connectors, the wildcards can be used anywhere - in the file name or path. Usage: SQL `CALL <alias>.listFile('pathAndPattern')` Example: SQL `CALL s3.getTextFiles('marketing/*.csv'))`
`deleteFiles`	`pathAndPattern (string / not null)`	Deletes all files matching the pattern. Usage: SQL `CALL <alias>.deleteFile('pathAndPattern')` `CALL <alias>.deleteFile('')` will delete all files in the directory without further confirmation.

The listFiles(), getFiles(), getTextFiles() procedures work with wildcards * and ? since v4.0.7