- Notifications
You must be signed in to change notification settings - Fork0
Gaurav2327/terraform-aws-msk-pipeline
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains Terraform configurations and Jenkins pipeline for deploying AWS MSK (Managed Streaming for Apache Kafka) with RDS CDC (Change Data Capture) using Debezium connector.
The infrastructure is deployed in the following order:
- Security Groups (
terraform/sg/) - Network security configurations for RDS, MSK, and Connectors - RDS Aurora MySQL (
terraform/rds/) - Source database for CDC - MSK Cluster (
terraform/msk/) - Managed Kafka cluster - MSK Connector (
terraform/msk/) - Debezium CDC connector
- Jenkins server with:
- Terraform v1.11 installed
- AWS credentials stored in Jenkins (see Quick Start below)
- Git plugin
- AWS Account with appropriate permissions
- S3 bucket for Terraform state (
terraform-state-bucket-dops) - Debezium plugin uploaded to S3 (use
.github/workflows/cicd.ymlworkflow)
New to this project? Follow these steps:
Add AWS credentials to Jenkins:
- Jenkins → Manage Jenkins → Manage Credentials → (global) → Add Credentials
- Add first credential:
- Kind: Secret text
- ID:
aws-access-key-id - Secret: Your AWS Access Key ID
- Add second credential:
- Kind: Secret text
- ID:
aws-secret-access-key - Secret: Your AWS Secret Access Key
💡Getting AWS credentials: Go to AWS Console → IAM → Users → Your User → Security credentials → Create access key
- Go to your Jenkins pipeline job
- Click"Build with Parameters"
- SelectACTION:
apply(to create resources) - Click"Build"
That's it! Jenkins will use the stored credentials to create your AWS infrastructure.
- AWS_MSK_CDC_CONCEPTS.md - Complete guide to AWS MSK, CDC, Debezium, and all concepts
- JENKINS_GUIDE.md - Complete pipeline reference with execution times and costs
- demo.md - Step-by-step guide to test CDC functionality with examples
- Create a new Pipeline job in Jenkins
- Point it to this repository
- Set the script path to
Jenkinsfile
The pipeline is parameterized with two options:
ACTION: Choose between:
apply- Create/Update infrastructuredestroy- Destroy infrastructure
AUTO_APPROVE:
false(default) - Manual approval required at each stagetrue- Automatic approval (use with caution)
WhenACTION = apply, the pipeline creates resources in this order:
1. Security Groups (sg) ↓2. RDS Database (rds) ↓3. MSK Cluster (msk cluster + IAM roles) ↓4. MSK Connector (connector resources)Each stage requires approval unlessAUTO_APPROVE = true.
WhenACTION = destroy, the pipeline destroys resources in reverse order:
1. MSK Connector (connector resources) ↓2. MSK Cluster (cluster + configurations) ↓3. IAM Resources (roles + policies) ↓4. RDS Database (rds) ↓5. Security Groups (sg)terraform-aws-msk-pipeline/├── .github/workflows/│ └── cicd.yml # GitHub Actions for Debezium plugin upload├── terraform/│ ├── debezium-plugin/ # S3 bucket and Debezium plugin setup│ │ ├── backend.tf│ │ ├── global_variables.tf│ │ ├── s3.tf│ │ └── variables.tf│ ├── sg/ # Security Groups│ │ ├── data.tf│ │ ├── global_variables.tf│ │ └── sg.tf│ ├── rds/ # RDS Aurora MySQL│ │ ├── data.tf│ │ ├── global_variables.tf│ │ ├── rds.tf│ │ └── variable.tf│ └── msk/ # MSK Cluster and Connector│ ├── cluster.tf│ ├── connector.tf│ ├── data.tf│ ├── global_variables.tf│ ├── iam.tf│ └── variables.tf└── Jenkinsfile # Jenkins Pipeline definitionIf you prefer to run Terraform manually:
# 1. Create Security Groupscd terraform/sgterraform initterraform planterraform apply# 2. Create RDScd ../rdsterraform initterraform planterraform apply# 3. Create MSK Clustercd ../mskterraform initterraform plan -target=aws_cloudwatch_log_group.msk_log_group \ -target=aws_msk_configuration.cluster_configuration \ -target=aws_msk_cluster.msk_cluster \ -target=aws_iam_role.msk_role \ -target=aws_iam_policy.msk_policy \ -target=aws_iam_role_policy_attachment.attach_msk_policyterraform apply [targets...]# 4. Create MSK Connectorterraform plan -target=aws_mskconnect_custom_plugin.connector_plugin_debezium \ -target=aws_mskconnect_worker_configuration.connector_configuration \ -target=aws_mskconnect_connector.msk_cdc_connectorterraform apply [targets...]
# 1. Destroy MSK Connectorcd terraform/mskterraform destroy -target=aws_mskconnect_connector.msk_cdc_connector \ -target=aws_mskconnect_worker_configuration.connector_configuration \ -target=aws_mskconnect_custom_plugin.connector_plugin_debezium# 2. Destroy MSK Clusterterraform destroy -target=aws_msk_cluster.msk_cluster \ -target=aws_msk_configuration.cluster_configuration \ -target=aws_cloudwatch_log_group.msk_log_group# 3. Destroy IAM Resourcesterraform destroy -target=aws_iam_role_policy_attachment.attach_msk_policy \ -target=aws_iam_policy.msk_policy \ -target=aws_iam_role.msk_role# 4. Destroy RDScd ../rdsterraform destroy# 5. Destroy Security Groupscd ../sgterraform destroy
- Region:
us-east-1(configured inglobal_variables.tf) - MSK Cluster Name:
msk-cluster(interraform/msk/variables.tf) - RDS Cluster Name:
rds-cdc-cluster(interraform/rds/variable.tf) - S3 Bucket:
aws-msk-resources-bucket(for logs and plugins)
Modify variables in respectivevariables.tf files:
terraform/sg/- Security group configurationsterraform/rds/variable.tf- RDS instance type, engine version, etc.terraform/msk/variables.tf- MSK broker count, instance type, Kafka version, etc.
- State Management: Terraform state is stored in S3 backend (
terraform-state-bucket-dops) - Resource Dependencies: The pipeline enforces proper ordering to handle dependencies
- Debezium Plugin: Must be uploaded to S3 before creating MSK Connector (use GitHub Actions workflow)
- Costs: MSK and RDS resources incur costs - remember to destroy when not needed (~$210-285/month)
- Security: RDS password is auto-generated and stored in AWS Secrets Manager - no hardcoded passwords!
- Testing: Seedemo.md for instructions on how to test the CDC pipeline
- State Lock: If Terraform state is locked, check S3 DynamoDB table for locks
- Dependency Errors: Ensure resources are created in the correct order
- Connector Fails: Verify Debezium plugin exists in S3 and RDS is accessible
- Security Group Issues: Ensure VPC and subnets are properly tagged
This project currently implements a basic CDC pipeline from MySQL to MSK. Here are potential enhancements for production use:
IAM Authentication for MSK
- Replace plaintext (port 9092) with IAM authentication (port 9098)
- Eliminate need for network-based security
- Fine-grained access control per topic/consumer group
TLS Encryption
- Enable in-transit encryption for MSK cluster
- Use TLS endpoints (port 9094) instead of plaintext
- Secure data transmission between connectors and brokers
Enhanced Secrets Management
- Direct Secrets Manager integration with MSK Connector
- Automatic password rotation support
- Eliminate need to retrieve secrets in Terraform
Sink Connectors (MSK → External Systems)
- S3 Sink: Archive CDC events to S3 for data lake
- Elasticsearch Sink: Index changes for real-time search
- MongoDB Sink: Write to NoSQL database
- Lambda Sink: Trigger serverless functions on changes
AWS Flink Integration (Stream Processing)
- Real-time data transformation and enrichment
- Complex event processing (CEP)
- Stream joins across multiple topics
- Aggregations and windowing operations
- Use Case: Transform and route CDC events to multiple destinations
MySQL → MSK → AWS Flink → MongoDB
Multi-Database CDC
- Add PostgreSQL source connectors
- MongoDB CDC support
- Oracle Debezium connector
- Multi-source data integration
Monitoring & Alerting
- CloudWatch custom metrics for CDC lag
- SNS alerts for connector failures
- Grafana dashboards for real-time monitoring
- X-Ray tracing for end-to-end visibility
Auto-Scaling
- MSK broker auto-scaling based on throughput
- Connector auto-scaling (when available)
- Dynamic partition adjustment
Disaster Recovery
- Cross-region MSK replication
- Automated backup and restore procedures
- Multi-region active-active setup
Schema Registry
- AWS Glue Schema Registry integration
- Avro schema evolution
- Schema validation and compatibility checks
Optimization
- Kafka compression (snappy/lz4)
- Batch size tuning
- Partition strategy optimization
- Connection pooling
Advanced CDC Features
- Incremental snapshots
- Signal table for ad-hoc snapshots
- Custom transformations (SMT)
- Multi-table CDC with different patterns
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
Gaurav2327.
About
Data streaming/change data capture from AWS RDS(mysql) using AWS managed apache kafka
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.

