- Notifications
You must be signed in to change notification settings - Fork9
Docker image for Apiary Data Lake metastore
License
NotificationsYou must be signed in to change notification settings
ExpediaGroup/apiary-metastore-docker
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
For more information please refer to the mainApiary project page.
Environment Variable | Required | Description |
---|---|---|
APIARY_S3_INVENTORY_PREFIX | No | Prefix used by S3 Inventory when creating data in the inventory bucket. Default isEntireBucketDaily . |
APIARY_S3_INVENTORY_TABLE_FORMAT | No | Format of S3 inventory data. Valid options areORC ,Parquet , orCSV . Default isORC . |
APIARY_SYSTEM_SCHEMA | No | Name for internal system database. Default isapiary_system . |
AWS_REGION | Yes | AWS region to configure various AWS clients. |
AWS_WEB_IDENTITY_TOKEN_FILE | No | Path of the AWS Web Identity Token File for IRSA/OIDC AWS authentication. |
DATANUCLEUS_CONNECTION_POOLING_TYPE | No | Type of connection pooling. Valid options areBoneCP ,DBCP ,DBCP2 ,C3P0 ,HikariCP . |
DATANUCLEUS_CONNECTION_POOL_MAX_POOLSIZE | No | Maximum pool size for the connection pool. |
DATANUCLEUS_CONNECTION_POOL_MIN_POOLSIZE | No | Minimum pool size for the connection pool. |
DATANUCLEUS_CONNECTION_POOL_INITIAL_POOLSIZE | No | Initial pool size for the connection pool (C3P0 only). |
DATANUCLEUS_CONNECTION_POOL_MAX_IDLE | No | Maximum idle connections for the connection pool. |
DATANUCLEUS_CONNECTION_POOL_MIN_IDLE | No | Minimum idle connections for the connection pool. |
DATANUCLEUS_CONNECTION_POOL_MIN_ACTIVE | No | Maximum active connections for the connection pool (DBCP/DBCP2 only). |
DATANUCLEUS_CONNECTION_POOL_MAX_WAIT | No | Maximum wait time for the connection pool (DBCP/DBCP2 only). |
DATANUCLEUS_CONNECTION_POOL_VALIDATION_TIMEOUT | No | Validation timeout for the connection pool (DBCP/DBCP2/HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_LEAK_DETECTION_THRESHOLD | No | Leak detection threshold for the connection pool (HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_LEAK_MAX_LIFETIME | No | Maximum lifetime for the connection pool (HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_AUTO_COMMIT | No | Auto commit for the connection pool (HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_IDLE_TIMEOUT | No | Idle timeout for the connection pool (HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_CONNECTION_WAIT_TIMEOUT | No | Connection wait timeout for the connection pool (HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_READ_ONLY | No | Read only mode for the connection pool (HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_NAME | No | Connection pool name (HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_CATALOG | No | Connection pool catalog (HikariCP only). |
DATANUCLEUS_CONNECTION_POOL_REGISTER_MBEANS | No | Register MBeans for the connection pool (HikariCP only). |
DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES | No | true /false value for hive.metastore.disallow.incompatible.col.type.changes, defaulttrue . |
ENABLE_GLUESYNC | No | Option to turn on GlueSync Hive Metastore listener. |
ENABLE_HIVE_LOCK_HOUSE_KEEPER | No | Option to turn on Hive Metastore Hive Lock House Keeper. |
ENABLE_METRICS | No | Option to enable sending Hive Metastore and JMX metrics to Prometheus. |
ENABLE_S3_INVENTORY | No | Option to create Hive tables on top of S3 inventory data if enabled inapiary-data-lake . Enabled if value is not null/empty. |
ENABLE_S3_LOGS | No | Option to create Hive tables on top of S3 access logs data if enabled inapiary-data-lake . Enabled if value is not null/empty. |
EXTERNAL_DATABASE | No | Option to enable external database mode, when specified it disables managing Hive Metastore MySQL database schema. |
GLUE_PREFIX | No | Prefix added to Glue databases to handle database name collisions when synchronizing multiple Hive Metastores to the Glue catalog. |
HADOOP_HEAPSIZE | No | Hive Metastore Java process heapsize. Default is1024 . |
HMS_AUTOGATHER_STATS | No | Whether or not to create basic statistics on table/partition creation. Valid values aretrue orfalse . Default istrue . |
LIMIT_PARTITION_REQUEST_NUMBER | No | To protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value-1 means no limit. The limit on partitions does not affect metadata-only queries. |
HIVE_METASTORE_ACCESS_MODE | No | Hive Metastore access mode, applicable values are: readwrite, readonly. |
HIVE_DB_NAMES | No | Comma separated list of Hive database names, when specified Hive databases will be created and mapped to corresponding S3 buckets. |
HIVE_METASTORE_LOG_LEVEL | No | Hive Metastore service Log4j log level. Default isINFO . |
HMS_MIN_THREADS | No | Minimum size of the Hive metastore thread pool. Default is200 . |
HMS_MAX_THREADS | No | Maximum size of the Hive metastore thread pool. Default is1000 . |
INSTANCE_NAME | Yes | Apiary instance name, will be used as prefix on most AWS resources to allow multiple Apiary instance deployments. |
KAFKA_BOOTSTRAP_SERVERS | No | Kafka Bootstrap Servers to enable Kafka Metastore listener and send Metastore events to Kafka. |
KAFKA_CLIENT_ID | No | Kafka label you define that names the Kafka producer. |
KAFKA_COMPRESSION_TYPE | No | Kafka Compression type, if none is specified there is no compression enabled. Values available are gzip, lz4 and snappy. Default is1048576 . |
KAFKA_MAX_REQUEST_SIZE | No | The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum uncompressed record batch size. |
LDAP_BASE | No | LDAP base DN used to search for user groups. |
LDAP_CA_CERT | No | Base64 encoded Certificate Authority Bundle to validate LDAP SSL connection. |
LDAP_SECRET_ARN | No | LDAP bind DN SecretsManager secret ARN. |
LDAP_URL | No | Active Directory URL to enable group mapping in metastore. |
MYSQL_CONNECTION_DRIVER_NAME | No | Hive Metastore MySQL database JDBC connection Driver Name. Default iscom.mysql.jdbc.Driver . |
MYSQL_CONNECTION_POOL_SIZE | No | MySQL Connection pool size for Hive Metastore. Default is10 . Seehere for more info. |
MYSQL_DB_HOST | Yes | Hive Metastore MySQL database hostname. |
MYSQL_DB_NAME | Yes | Hive Metastore MySQL database name. |
MYSQL_SECRET_ARN | Yes | Hive Metastore MySQL SecretsManager secret ARN. |
MYSQL_SECRET_USERNAME_KEY | No | Hive Metastore MySQL SecretsManager secret username key. Default isusername . |
MYSQL_TYPE | No | Hive Metastore MySQL database Type (mariadb, mysql). Default ismysql . |
MYSQL_DRIVER_JAR | No | Hive Metastore MySQL connector JAR location. Default is/usr/share/java/mysql-connector-java.jar . |
RANGER_AUDIT_DB_URL | No | Ranger audit database JDBC URL. |
RANGER_AUDIT_SECRET_ARN | No | Ranger audit database secret ARN. |
RANGER_AUDIT_SOLR_URL | No | Ranger Solr audit URL. |
RANGER_POLICY_MANAGER_URL | No | Ranger admin URL from where policies will be downloaded. |
RANGER_SERVICE_NAME | No | Ranger service name used to configure RangerAuth plugin. |
SNS_ARN | No | The SNS topic ARN to which metadata updates will be |
|
If you would like to ask any questions about or discuss Apiary please join our mailing list at
https://groups.google.com/forum/#!forum/apiary-user
This project is available under theApache 2.0 License.
Copyright 2018-2019 Expedia, Inc.
About
Docker image for Apiary Data Lake metastore
Topics
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.