Arrow JDBC Adapter#
The Arrow JDBC Adapter assists with working with JDBC and Arrowdata. Currently, it supports reading JDBC ResultSets into ArrowVectorSchemaRoots.
ResultSet to VectorSchemaRoot Conversion#
This can be accessed via the JdbcToArrow class. The resultingArrowVectorIterator will convert a ResultSet to Arrow data in batchesof rows.
try(ArrowVectorIteratorit=JdbcToArrow.sqlToArrowVectorIterator(resultSet,allocator)){while(it.hasNext()){VectorSchemaRootroot=it.next();// Consume the root…}}
The batch size and type mapping can both be customized:
JdbcToArrowConfigconfig=newJdbcToArrowConfigBuilder(allocator,/*calendar=*/null).setReuseVectorSchemaRoot(reuseVectorSchemaRoot).setJdbcToArrowTypeConverter((jdbcFieldInfo->{switch(jdbcFieldInfo.getJdbcType()){caseTypes.BIGINT:// Assume actual value range is SMALLINTreturnnewArrowType.Int(16,true);default:returnnull;}})).build();try(ArrowVectorIteratoriter=JdbcToArrow.sqlToArrowVectorIterator(rs,config)){while(iter.hasNext()){VectorSchemaRootroot=iter.next();// Consume the root…}}
The JDBC type can be explicitly specified, which is useful since JDBCdrivers can give spurious type information. For example, the Postgresdriver has been observed to use Decimal types with scale and precision0; these cases can be handled by specifying the type explicitly beforereading. Also, some JDBC drivers may return BigDecimal values withinconsistent scale. A RoundingMode can be set to handle these cases:
Map<Integer,JdbcFieldInfo>mapping=newHashMap<>();mapping.put(1,newJdbcFieldInfo(Types.DECIMAL,20,7));JdbcToArrowConfigconfig=newJdbcToArrowConfigBuilder(allocator,/*calendar=*/null).setBigDecimalRoundingMode(RoundingMode.UNNECESSARY).setExplicitTypesByColumnIndex(mapping).build();try(ArrowVectorIteratoriter=JdbcToArrow.sqlToArrowVectorIterator(rs,config)){while(iter.hasNext()){VectorSchemaRootroot=iter.next();// Consume the root…}}
The mapping from JDBC type to Arrow type can be overridden via theJdbcToArrowConfig, but it is not possible to customize theconversion from JDBC value to Arrow value itself, nor is it possibleto define a conversion for an unsupported type.
Type Mapping#
The JDBC to Arrow type mapping can be obtained at runtime fromJdbcToArrowUtils.getArrowTypeFromJdbcType.
JDBC Type | Arrow Type | Notes |
|---|---|---|
ARRAY | List | (1) |
BIGINT | Int64 | |
BINARY | Binary | |
BIT | Bool | |
BLOB | Binary | |
BOOLEAN | Bool | |
CHAR | Utf8 | |
CLOB | Utf8 | |
DATE | Date32 | |
DECIMAL | Decimal128 | (2) |
DOUBLE | Double | |
FLOAT | Float32 | |
INTEGER | Int32 | |
LONGVARBINARY | Binary | |
LONGNVARCHAR | Utf8 | |
LONGVARCHAR | Utf8 | |
NCHAR | Utf8 | |
NULL | Null | |
NUMERIC | Decimal128 | |
NVARCHAR | Utf8 | |
REAL | Float32 | |
SMALLINT | Int16 | |
STRUCT | Struct | (3) |
TIME | Time32[ms] | |
TIMESTAMP | Timestamp[ms] | (4) |
TINYINT | Int8 | |
VARBINARY | Binary | |
VARCHAR | Utf8 |
(1) The list value type must be explicitly configured and cannot beinferred. UsesetArraySubTypeByColumnIndexMap orsetArraySubTypeByColumnNameMap.
(2) By default, the scale of decimal values must match the scale inthe type exactly; precision is allowed to be any value greater orequal to the type precision. If there is a mismatch, by default, anexception will be thrown. This can be configured by setting adifferent RoundingMode with setBigDecimalRoundingMode.
(3) Not fully supported: while the type conversion is defined, thevalue conversion is not. SeeARROW-17006.
(4) If a Calendar is provided, then the timestamp will have thetimezone of the calendar, else it will be a timestamp withouttimezone.
VectorSchemaRoot to PreparedStatement Parameter Conversion#
The adapter can bind rows of Arrow data from a VectorSchemaRoot toparameters of a JDBC PreparedStatement. This can be accessed via theJdbcParameterBinder class. Each call to next() will bind parametersfrom the next row of data, and then the application can execute thestatement, call addBatch(), etc. as desired. Null values will lead toa setNull call with an appropriate JDBC type code (listed below).
finalJdbcParameterBinderbinder=JdbcParameterBinder.builder(statement,root).bindAll().build();while(binder.next()){statement.executeUpdate();}// Use a VectorLoader to update the rootbinder.reset();while(binder.next()){statement.executeUpdate();}
The mapping of vectors to parameters, the JDBC type code used by theconverters, and the type conversions themselves can all be customized:
finalJdbcParameterBinderbinder=JdbcParameterBinder.builder(statement,root).bind(/*parameterIndex*/2,/*columnIndex*/0).bind(/*parameterIndex*/1,customColumnBinderInstance).build();
Type Mapping#
The Arrow to JDBC type mapping can be obtained at runtime viaa method on ColumnBinder.
Arrow Type | JDBC Type | Notes |
|---|---|---|
Binary | VARBINARY (setBytes) | |
Bool | BOOLEAN (setBoolean) | |
Date32 | DATE (setDate) | |
Date64 | DATE (setDate) | |
Decimal128 | DECIMAL (setBigDecimal) | |
Decimal256 | DECIMAL (setBigDecimal) | |
FixedSizeBinary | BINARY (setBytes) | |
Float32 | REAL (setFloat) | |
Int8 | TINYINT (setByte) | |
Int16 | SMALLINT (setShort) | |
Int32 | INTEGER (setInt) | |
Int64 | BIGINT (setLong) | |
LargeBinary | LONGVARBINARY (setBytes) | |
LargeUtf8 | LONGVARCHAR (setString) | (1) |
Time[s] | TIME (setTime) | |
Time[ms] | TIME (setTime) | |
Time[us] | TIME (setTime) | |
Time[ns] | TIME (setTime) | |
Timestamp[s] | TIMESTAMP (setTimestamp) | (2) |
Timestamp[ms] | TIMESTAMP (setTimestamp) | (2) |
Timestamp[us] | TIMESTAMP (setTimestamp) | (2) |
Timestamp[ns] | TIMESTAMP (setTimestamp) | (2) |
Utf8 | VARCHAR (setString) |
(1) Strings longer than Integer.MAX_VALUE bytes (the maximum lengthof a Java
byte[]) will cause a runtime exception.(2) If the timestamp has a timezone, the JDBC type defaults toTIMESTAMP_WITH_TIMEZONE. If the timestamp has no timezone,technically there is not a correct conversion from Arrow value toJDBC value, because a JDBC Timestamp is in UTC, and we have notimezone information. In this case, the default binder will callsetTimestamp(int, Timestamp),which will lead to the driver using the “default timezone” (that ofthe Java VM).

