Flat Files and Document Stores
CData Virtuality can connect and query semi-structured and unstructured files from local or cloud-based storage. Features include automatic schema detection and batch and streaming file access. For more information, please refer to File-based Connectors.
It supports the following formats:
CSV, TSV, TXT
Excel (XLS/XLSX)
JSON, XML, Parquet, Avro
Files can be located in the following systems and services:
Local file system
Amazon S3
Google Cloud Storage
Azure Blob Storage
FTP/SFTP
For flat files such as CSV or TXT files, CData Virtuality reads the first row as headers in order to deduce the column names and number of columns. It can then scan the rows to infer data types.
For Excel files, CData Virtuality connects via CData’s Excel drivers. For each sheet in an Excel Workbook, CData Virtuality reads the header rows for column names and infers the data types of the column data.
For JSON files, CData Virtuality samples JSON documents, detects keys as columns, and flattens nested objects into dotted paths. For example, if name
appears nested under user
, the flattened object is user.name
.
For XML files, CData Virtuality parses the XML structure and identifies repeating elements. It then maps XML attributes to columns.
Parquet files already contain embedded metadata, so CData Virtuality reads the header file and extracts the field names and data types for the table.