This tutorial is to show how to quickly build streaming ETL for PolarDB-X with Flink CDC.
Assuming we are running an e-commerce business. The product and order data stored in PolarDB-X.
We want to enrich the orders using the product table, and then load the enriched orders to ElasticSearch in real time.
In the following sections, we will describe how to use Flink PolarDB-X CDC to implement it.
All exercises in this tutorial are performed in the Flink SQL CLI, and the entire process uses standard SQL syntax, without a single line of Java/Scala code or IDE installation.
## Preparation
Prepare a Linux or MacOS computer with Docker installed.
### Starting components required
The components required in this demo are all managed in containers, so we will use `docker-compose` to start them.
Create `docker-compose.yml` file using following contents:
```
version: '2.1'
services:
polardbx:
polardbx:
image: polardbx/polardb-x:2.0.1
container_name: polardbx
ports:
- "8527:8527"
elasticsearch:
image: 'elastic/elasticsearch:7.6.0'
container_name: elasticsearch
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- ES_JAVA_OPTS=-Xms512m -Xmx512m
- discovery.type=single-node
ports:
- '9200:9200'
- '9300:9300'
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
kibana:
image: 'elastic/kibana:7.6.0'
container_name: kibana
ports:
- '5601:5601'
volumes:
- '/var/run/docker.sock:/var/run/docker.sock'
```
The Docker Compose environment consists of the following containers:
- PolarDB-X: the `products`,`orders` tables will be store in the database. They will be joined enrich the orders.
- Elasticsearch: mainly used as a data sink to store enriched orders.
- Kibana: used to visualize the data in Elasticsearch.
To start all containers, run the following command in the directory that contains the `docker-compose.yml` file.
```shell
docker-compose up -d
```
This command automatically starts all the containers defined in the Docker Compose configuration in a detached mode. Run docker ps to check whether these containers are running properly.
We can also visit [http://localhost:5601/](http://localhost:5601/) to see if Kibana is running normally.
1. Download [Flink 1.16.0](https://archive.apache.org/dist/flink/flink-1.16.0/flink-1.16.0-bin-scala_2.12.tgz) and unzip it to the directory `flink-1.16.0`
2. Download following JAR package required and put them under `flink-1.16.0/lib/`:
## Creating tables using Flink DDL in Flink SQL CLI
First, enable checkpoints every 3 seconds
```sql
-- Flink SQL
Flink SQL> SET execution.checkpointing.interval = 3s;
```
Then, create tables that capture the change data from the corresponding database tables.
```sql
-- Flink SQL
Flink SQL> SET execution.checkpointing.interval = 3s;
-- create source table2 - orders
Flink SQL> CREATE TABLE orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = '127.0.0.1',
'port' = '8527',
'username' = 'polardbx_root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'orders'
);
-- create source table2 - products
CREATE TABLE products (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = '127.0.0.1',
'port' = '8527',
'username' = 'polardbx_root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'products'
);
```
Finally, create `enriched_orders` table that is used to load data to the Elasticsearch.
```sql
-- Flink SQL
-- create sink table - enrich_orders
Flink SQL> CREATE TABLE enriched_orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
product_name STRING,
product_description STRING,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'elasticsearch-7',
'hosts' = 'http://localhost:9200',
'index' = 'enriched_orders'
);
```
## Enriching orders and load to ElasticSearch
Use Flink SQL to join the `order` table with the `products` table to enrich orders and write to the Elasticsearch.
```sql
-- Flink SQL
Flink SQL> INSERT INTO enriched_orders
SELECT o.order_id,
o.order_date,
o.customer_name,
o.price,
o.product_id,
o.order_status,
p.name,
p.description
FROM orders AS o
LEFT JOIN products AS p ON o.product_id = p.id;
```
Now, the enriched orders should be shown in Kibana.
Visit [http://localhost:5601/app/kibana#/management/kibana/index_pattern](http://localhost:5601/app/kibana#/management/kibana/index_pattern) to create an index pattern `enriched_orders`.

Visit [http://localhost:5601/app/kibana#/discover](http://localhost:5601/app/kibana#/discover) to find the enriched orders.