8.0 KiB
title | type | bookToc |
---|---|---|
Apache Flink CDC | docs | false |
Flink CDC
A streaming data integration tool
{{< img src="/fig/cdc-flow.png" alt="Flink CDC Flow">}}
What is Flink CDC?
Flink CDC is a distributed data integration tool for real time data and batch data. Flink CDC brings the simplicity and elegance of data integration via YAML to describe the data movement and transformation.
{{< img src="/fig/index-yaml-example.png" alt="Flink CDC Example">}}
Key Features
Change Data Capture
Flink CDC supports distributed scanning of historical data of database and then automatically switches to change data capturing. The switch uses the incremental snapshot algorithm which ensure the switch action does not lock the database.
Schema Evolution
Flink CDC has the ability of automatically creating downstream table using the inferred table structure based on upstream table, and applying upstream DDL to downstream systems during change data capturing.
Streaming Pipeline
Flink CDC jobs run in streaming mode by default, providing sub-second end-to-end latency in real-time binlog synchronization scenarios, effectively ensuring data freshness for downstream businesses.
Data Transformation
Flink CDC will soon support data transform operations of ETL, including column projection, computed column, filter expression and classical scalar functions.
Full Database Sync
Flink CDC supports synchronizing all tables of source database instance to downstream in one job by configuring the captured database list and table list.
Exactly-Once Semantics
Flink CDC supports reading database historical data and continues to read CDC events with exactly-once processing, even after job failures.
Learn More
Try Flink CDC
Flink CDC provides a series of quick start demos without any dependencies or java code. A Linux or MacOS computer with Docker installed is enough. Please check out our }}">Quick Start for more information.
Get Help with Flink CDC
If you get stuck, check out our community support resources. In particular, Apache Flink’s user mailing list (user@flink.apache.org) is consistently ranked as one of the most active of any Apache project, and is a great way to get help quickly.
Flink CDC is developed under the umbrella of Apache Flink.
Flink CDC is developed under the umbrella of Apache Flink.