MongoDB Connector
You are looking at an older version of the documentation. The latest version is found here.
The MongoDB Connector is used to read data from a MongoDB Server and make it available in the Data Virtuality Server in the relational form. In this way, users can query a MongoDB Server via SQL query language and also write complex queries that are directed to multiple different Data Sources, e.g. a single SQL query can join tables from MySQL and combine them with collections (that are represented as tables) coming from MongoDB.
Connector Configuration
Connect to MongoDB
Before issuing the SQL queries to the MongoDB Server, we need to represent Mongo's Metadata in relational format, i.e. represent non-relation collections as relation tables etc. This is done by the connecting to MongoDB data source using createConnection/createDatasource commands.
The complete list of connection properties can be found in chapter Connection Configuration Interface.
The MongoDB specific connection properties options
and readMode
, translator property queryOption
model properties (all) are explained in the sections below.
Example
CALL "SYSADMIN.createConnection"(
"name" => 'mongodb',
"jbossCLITemplateName" => 'mongodb',
"connectionOrResourceAdapterProperties" =>
'server=localhost
,port=27019
,database=test
,options="connectionsPerHost=51,description=test,alwaysUseMBeans=true"',
"encryptedProperties" => null
) ;;
CALL "SYSADMIN.createDataSource"(
"name" => 'mongodb',
"translator" => 'mongodb',
"modelProperties" =>
'importer.useFullSchemaName=false
,importer.TableTypes="TABLE,VIEW"
,mongodb.maxColumnsNumber=200
,mongodb.maxRowsNumber=100
,mongodb.maxVarcharColumnWidth=256
,mongodb.maxRecursionDepth=2
,mongodb.fieldDelimiterChar="_"',
"translatorProperties" => 'queryOption="queryoption_notimeout"',
"encryptedModelProperties" => null,
"encryptedTranslatorProperties" => null
) ;;
MongoDB Driver Options
MongoDB driver options may be provided as a list of comma-separated properties via connection property options
.
Supported MongoDB driver options:
Option type | Option name | Default value |
---|---|---|
String | description | |
int | minConnectionsPerHost | 0 |
int | connectionsPerHost | 100 |
int | threadsAllowedToBlockForConnectionMultiplier | 5 |
int | serverSelectionTimeout | 30000 |
int | maxWaitTime | 120000 |
int | maxConnectionIdleTime | 0 |
int | maxConnectionLifeTime | 0 |
int | connectTimeout | 10000 |
int | socketTimeout | 0 |
boolean | socketKeepAlive | false |
boolean | sslEnabled | false |
boolean | sslInvalidHostNameAllowed | false |
boolean | alwaysUseMBeans | false |
int | heartbeatFrequency | 10000 |
int | minHeartbeatFrequency | 500 |
int | heartbeatConnectTimeout | 20000 |
int | heartbeatSocketTimeout | 20000 |
int | localThreshold | 15 |
String | requiredReplicaSetName | |
boolean | cursorFinalizerEnabled | true |
Important
The list of all possible options are supported dynamically by MongoDB Connector and can be extended in future versions of MongoDB driver automatically.
MongoDB readMode
MongoDB readMode may be provided via connection property readMode
.
The not mandatory parameter readMode specifies how MongoDB connection routes read operations to the members of a replica set (a cluster of MongoDB servers that implements master-slave replication and automated failover. MongoDB’s recommended replication strategy). By default, a connection directs its read operations to the primary member (the primary member is the current master instance, which receives all write operations) in a replica set. Because write operations are issued to the single primary, reading from the primary returns the latest version of a document. For a connection that does not require fully up-to-date data, you can improve read throughput or reduce latency by distributing some or all reads to secondary members of the replica set. MongoDB driver supports five read modes:
- primary Default mode. All operations read from the current replica set primary.
- primaryPreferred In most situations, operations read from the primary but if it is unavailable, operations read from secondary members.
- secondary All operations read from the secondary members of the replica set.
- secondaryPreferred In most situations, operations read from secondary members but if no secondary members are available, operations read from the primary.
- nearest Operations read from the member of the replica set with the least network latency, irrespective of the member’s type.
Important
All read preference modes except primary may return stale data because secondaries replicate operations from the primary with some delay.
MongoDB Query Options
The MongoDB specific query options can be set via MongoDB translator property queryOption.
These options are supported:
QUERYOPTION_TAILABLE
QUERYOPTION_SLAVEOK
QUERYOPTION_OPLOGREPLAY
QUERYOPTION_NOTIMEOUT
QUERYOPTION_AWAITDATA
QUERYOPTION_EXHAUST
QUERYOPTION_PARTIAL
For details about these options consider MongoDB documentation.
MongoDB Model Properties
MongoDB model properties may be provided via modelProperies in the createDatasource statement.
Importing Collection Fields
mongodb.maxColumnsNumber, mongodb.maxRowsNumber
To represent a non-relation MongoDB collection as a relational table, we need to represent the semi-structured collection rows as structured relational rows. To do this completely, we need to check all the fields in all rows of a MongoDB's collection and to add them as columns to a relation table representing that collection. Since extracting MongoDB's collection metadata in this way could be a very costly operation on big collections, Data Virtuality Server iterates only through the maximal number of rows (defined in mongodb.maxRowsNumber) and adds at most maxColumnsNumber of columns to the resulting relational table.
In addition, the same-named collection field can have a different format in different rows. Data Virtuality Server first takes to declare the column type as it has found it first in the field. If it finds out that the field type has changed, it changes the resulting column type to the "bigger" type if needed.
In general, that a type widening goes from a Numeric type to String to CLOB. Per default maxColumnsNumber = 200 and maxRowsNumber = 100. *
- mongodb.maxVarcharColumnWidth
States, which column length should be used for varchar columns, e.g. sets varchar(mongodb.maxVarcharColumnWidth). Default value is 256.
- mongodb.overrideStructureByExample
On connecting or refreshing of MongoDB data source, Data Virtuality Server always reads the collections to determine their structure. Sometimes the collections are extremely large, but still need to be read completely, because the latest documents have the latest version of the schema. This operation can take a large amount of time and decrease it we introduced the mongodb.overrideStructureByExample model property. This property can hold a potentially very large text, which represents a strict mode JSON document containing one or multiple comma-separated examples of the structure of MongoDB collections to be overridden. These documents will be read instead of the documents from the real collections to determine their structure.
Creation and usage of the current property explained in detail in the Usage section of this page.
Nested Objects
mongodb.maxRecursionDepth
Nested Objects in a MongoDB's collection row can be represented either as a JSON string contained in the table's column or them as additional fields in the resulting table. This parameter controls, how far inside the particular fields should be analyzed to represent their sub-fields as separate columns in the resulting table. The default value is 0 (output the complete JSON string with no additional columns)
- mongodb.fieldDelimiterChar
Additional columns generated for the nested objects have the name of their parent objects as a prefix inside. This parameter controls the character, that is used to separate this prefix from the field name itself. Default is '_'.
Usage
The MongoDB's collections are represented as relational tables in the schema named like the name given in createDatasource()
Select Query
To query these tables a normal SQL select query like the following can be used:
SELECT * FROM mongodb.table1
Stored Procedures
Alternatively, one can use the stored procedure named query to pass a native query inside. This stored procedure has the following parameters:
- collection - the name of the collection to query
- query - the query in the native JSON format
- columns - the output columns in the JSON format like
{ '_id' : 1, 'name': 1 }
or NULL if all columns should be outputted
- result - the CLOB containing the results
CALL "mongodb".query('table1','{"city":"berlin"}',NULL)
XML Functions
In order to parse the JSON string values in a field, XML Functions
- jsonToXml
- xpathValue
- xmlserialize
of the Data Virtuality Server can be used as follows:
Examples
SELECT xpathvalue( jsontoxml( 'test', a.result), '/test/test/[2]/firstname' ), xmlserialize( jsontoxml( 'test', a.result ) ), result FROM ( call "mongodb".query( 'table1', '{"city":"berlin"}', NULL) ) AS a;
SELECT CAST(xpathvalue(jsontoxml('root',"zipcodes"),'/root/root/[1]') AS string) FROM "mongodb"."zip_history_log";
SELECT CAST(jsontoxml('root',"zipcodes") AS string) FROM "mongodb"."zip_history_log";
SELECT * FROM (CALL mongo.query('foo', '{ $query : {},$orderby : {} }','{_id:1,desription:1}')) w, xmltable('/root' passing jsontoxml('root',w.result)) t
Creation and Usage of the JSON Examples with the mongodb.overrideStructureByExample Property
In order to create JSON example for CollectionA in database test containing such documents:
{ "_id" : ObjectId("57ea8fc806f2c1c32bcc4a11"), "col1" : 10 } { "_id" : ObjectId("57ea8fd106f2c1c32bcc4a12"), "col1" : NumberLong(200) }
next steps should be performed:
Document with the desired structure for the whole collection should be obtained using mongoexport CLI tool or constructed manually using strict mode JSON syntax. For CollectionA the second document can be used, as a data type of "col1" is wider, than the first one. To do it with mongoexport one of the next commands can be used:
POWERSHELLmongoexport --db test --collection CollectionA --query "{'col1': {$type: 'long'}}" mongoexport --db test --collection CollectionA --query "{'col1': 200}"
Such JSON will be a result:
JS{"_id":{"$oid":"57ea8fd106f2c1c32bcc4a12"},"col1":{"$numberLong":"200"}}
Collection name should be added as a key to the JSON document obtained on the previous step:
JS{ "CollectionA": {"_id":{"$oid":"57ea8fd106f2c1c32bcc4a12"},"col1":{"$numberLong":"200"}} }
All double quotes should be escaped with the backslash symbol:
JS{ \"CollectionA\": {\"_id\":{\"$oid\":\"57ea8fd106f2c1c32bcc4a12\"},\"col1\":{\"$numberLong\":\"200\"}} }
Resulting document now can be set as a value to the mongodb.overrideStructureByExample property:
JSmongodb.overrideStructureByExample=" { \"CollectionA\": {\"_id\":{\"$oid\":\"57ea8fd106f2c1c32bcc4a12\"},\"col1\":{\"$numberLong\":\"200\"}} } "
Multiple examples should be separated by a comma:
JSmongodb.overrideStructureByExample=" { \"CollectionA\": {\"_id\":{\"$oid\":\"57ea8fd106f2c1c32bcc4a12\"},\"col1\":{\"$numberLong\":\"200\"}}, \"CollectionB\": {\"_id\":{\"$oid\":\"57dfffcf190889d653a5374b\"},\"col2\":{\"$date\":\"2012-12-19T06:01:17.171Z\"}} } "
The example above can be used to add the data source as follows:
CALL SYSADMIN.createConnection
(
"name" => 'mongodb',
"jbossCLITemplateName" => 'mongodb',
"connectionOrResourceAdapterProperties" => 'server=localhost,port=27017,database=test'
) ;;
CALL SYSADMIN.createDatasource
(
"name" => 'mongodb',
"translator" => 'mongodb',
"modelProperties" => 'mongodb.overrideStructureByExample="
{
\"CollectionA\":
{\"_id\":{\"$oid\":\"57ea8fd106f2c1c32bcc4a12\"},\"col1\":{\"$numberLong\":\"200\"}},
\"CollectionB\":
{\"_id\":{\"$oid\":\"57dfffcf190889d653a5374b\"},\"col2\":{\"$date\":\"2012-12-19T06:01:17.171Z\"}}
}
"',
"translatorProperties" => ''
) ;;