[FLINK-35634][build] Add CdcUp playground CLI scripts

This closes #3605
pull/3723/head
yuxiqian 2 weeks ago committed by GitHub
parent 9992584365
commit 865e14bfd7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -28,14 +28,41 @@ and elegance of data integration via YAML to describe the data movement and tran
The Flink CDC prioritizes efficient end-to-end data integration and offers enhanced functionalities such as
full database synchronization, sharding table synchronization, schema evolution and data transformation.
![Flink CDC framework desigin](docs/static/fig/architecture.png)
![Flink CDC framework design](docs/static/fig/architecture.png)
### Quickstart Guide
Flink CDC provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs.
You will need to have a working Docker and Docker compose environment to use it.
1. Run `git clone https://github.com/apache/flink-cdc.git --depth=1` to retrieve a copy of Flink CDC source code.
2. Run `cd tools/cdcup/ && ./cdcup.sh init` to use the CdcUp tool to start a playground environment.
3. Run `./cdcup.sh up` to boot-up docker containers, and wait for them to be ready.
4. Run `./cdcup.sh mysql` to open a MySQL session, and create at least one table.
```sql
-- initialize db and table
CREATE DATABASE cdc_playground;
USE cdc_playground;
CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32));
-- insert test data
INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida');
-- verify if it has been successfully inserted
SELECT * FROM test_table;
```
5. Run `./cdcup.sh pipeline pipeline-definition.yaml` to submit the pipeline job. You may also edit the pipeline definition file for further configurations.
6. Run `./cdcup.sh flink` to access the Flink Web UI.
### Getting Started
1. Prepare a [Apache Flink](https://nightlies.apache.org/flink/flink-docs-master/docs/try-flink/local_installation/#starting-and-stopping-a-local-cluster) cluster and set up `FLINK_HOME` environment variable.
2. [Download](https://github.com/apache/flink-cdc/releases) Flink CDC tar, unzip it and put jars of pipeline connector to Flink `lib` directory.
> If you're using macOS or Linux, you may use `brew install apache-flink-cdc` to install Flink CDC and compatible connectors quickly.
3. Create a **YAML** file to describe the data source and data sink, the following example synchronizes all tables under MySQL app_db database to Doris :
```yaml
source:
@ -89,8 +116,6 @@ Try it out yourself with our more detailed [tutorial](docs/content/docs/get-star
You can also see [connector overview](docs/content/docs/connectors/pipeline-connectors/overview.md) to view a comprehensive catalog of the
connectors currently provided and understand more detailed configurations.
### Join the Community
There are many ways to participate in the Apache Flink CDC community. The

@ -0,0 +1,108 @@
---
title: "CdcUp Quickstart Guide"
weight: 3
type: docs
aliases:
- /get-started/quickstart/cdc-up-quickstart-guide
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Setting-up a CDC Pipeline Environment with CdcUp CLI
Flink CDC now provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs quickly.
You will need a working Docker and Docker Compose V2 environment to use it.
## Step-by-step Guide
1. Open a terminal and run `git clone https://github.com/apache/flink-cdc.git --depth=1` to retrieve a copy of Flink CDC source code.
2. Run `cd flink-cdc/tools/cdcup/` to enter CDC directory and run `./cdcup.sh` to use the "cdc-up" tool to start a playground environment. You should see the following output:
```
Usage: ./cdcup.sh { init | up | pipeline <yaml> | flink | mysql | stop | down | help }
Commands:
* init:
Initialize a playground environment, and generate configuration files.
* up:
Start docker containers. This may take a while before database is ready.
* pipeline <yaml>:
Submit a YAML pipeline job.
* flink:
Print Flink Web dashboard URL.
* mysql:
Open MySQL console.
* stop:
Stop all running playground containers.
* down:
Stop and remove containers, networks, and volumes.
* help:
Print this message.
```
3. Run `./cdcup.sh init`, waiting for pulling base docker image, and you should see the following output:
```
🎉 Welcome to cdc-up quickstart wizard!
There are a few questions to ask before getting started:
🐿️ Which Flink version would you like to use? (Press ↑/↓ arrow to move and Enter to select)
1.17.2
1.18.1
1.19.1
‣ 1.20.0
```
Use the arrow keys to navigate and press Enter to select specified Flink and CDC version, source, and sink connectors.
4. Run `./cdcup.sh up` to boot-up docker containers, and wait for them to be ready. Some sink connectors (like Doris and StarRocks) need some time to initialize and ready for handling requests.
5. If you're choosing MySQL as data source, you need to run `./cdcup.sh mysql` to open a MySQL session and create at least one table.
For example, the following SQL commands create a database and a table, insert some test data, and verify the result:
```sql
-- initialize db and table
CREATE DATABASE cdc_playground;
USE cdc_playground;
CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32));
-- insert test data
INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida');
-- verify if it has been successfully inserted
SELECT * FROM test_table;
```
6. Run `./cdcup.sh pipeline pipeline-definition.yaml` to submit the pipeline job. You may also edit the pipeline definition file for further configurations.
7. Run `./cdcup.sh flink` and navigate to the printed URL to access the Flink Web UI.
```
$ ./cdcup.sh flink
🚩 Visit Flink Dashboard at: http://localhost:33448
```

@ -0,0 +1,108 @@
---
title: "CdcUp Quickstart Guide"
weight: 3
type: docs
aliases:
- /get-started/quickstart/cdc-up-quickstart-guide
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Setting-up a CDC Pipeline Environment with CdcUp CLI
Flink CDC now provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs quickly.
You will need a working Docker and Docker Compose V2 environment to use it.
## Step-by-step Guide
1. Open a terminal and run `git clone https://github.com/apache/flink-cdc.git --depth=1` to retrieve a copy of Flink CDC source code.
2. Run `cd flink-cdc/tools/cdcup/` to enter CDC directory and run `./cdcup.sh` to use the "cdc-up" tool to start a playground environment. You should see the following output:
```
Usage: ./cdcup.sh { init | up | pipeline <yaml> | flink | mysql | stop | down | help }
Commands:
* init:
Initialize a playground environment, and generate configuration files.
* up:
Start docker containers. This may take a while before database is ready.
* pipeline <yaml>:
Submit a YAML pipeline job.
* flink:
Print Flink Web dashboard URL.
* mysql:
Open MySQL console.
* stop:
Stop all running playground containers.
* down:
Stop and remove containers, networks, and volumes.
* help:
Print this message.
```
3. Run `./cdcup.sh init`, waiting for pulling base docker image, and you should see the following output:
```
🎉 Welcome to cdc-up quickstart wizard!
There are a few questions to ask before getting started:
🐿️ Which Flink version would you like to use? (Press ↑/↓ arrow to move and Enter to select)
1.17.2
1.18.1
1.19.1
‣ 1.20.0
```
Use the arrow keys to navigate and press Enter to select specified Flink and CDC version, source, and sink connectors.
4. Run `./cdcup.sh up` to boot-up docker containers, and wait for them to be ready. Some sink connectors (like Doris and StarRocks) need some time to initialize and ready for handling requests.
5. If you're choosing MySQL as data source, you need to run `./cdcup.sh mysql` to open a MySQL session and create at least one table.
For example, the following SQL commands create a database and a table, insert some test data, and verify the result:
```sql
-- initialize db and table
CREATE DATABASE cdc_playground;
USE cdc_playground;
CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32));
-- insert test data
INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida');
-- verify if it has been successfully inserted
SELECT * FROM test_table;
```
6. Run `./cdcup.sh pipeline pipeline-definition.yaml` to submit the pipeline job. You may also edit the pipeline definition file for further configurations.
7. Run `./cdcup.sh flink` and navigate to the printed URL to access the Flink Web UI.
```
$ ./cdcup.sh flink
🚩 Visit Flink Dashboard at: http://localhost:33448
```

@ -0,0 +1,63 @@
*.gem
*.rbc
/.config
/coverage/
/InstalledFiles
/pkg/
/spec/reports/
/spec/examples.txt
/test/tmp/
/test/version_tmp/
/tmp/
# Used by dotenv library to load environment variables.
# .env
# Ignore Byebug command history file.
.byebug_history
## Specific to RubyMotion:
.dat*
.repl_history
build/
*.bridgesupport
build-iPhoneOS/
build-iPhoneSimulator/
## Specific to RubyMotion (use of CocoaPods):
#
# We recommend against adding the Pods directory to your .gitignore. However
# you should judge for yourself, the pros and cons are mentioned at:
# https://guides.cocoapods.org/using/using-cocoapods.html#should-i-check-the-pods-directory-into-source-control
#
# vendor/Pods/
## Documentation cache and generated files:
/.yardoc/
/_yardoc/
/doc/
/rdoc/
## Environment normalization:
/.bundle/
/vendor/bundle
/lib/bundler/man/
# for a library or gem, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# Gemfile.lock
# .ruby-version
# .ruby-gemset
# unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
.rvmrc
# Used by RuboCop. Remote config files pulled in from inherit_from directive.
# .rubocop-https?--*
.idea/**
# These are generated files
cdc/**
/docker-compose.yaml
/pipeline-definition.yaml

@ -0,0 +1,23 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
FROM ruby:3.3-slim
WORKDIR /src
RUN apt-get update && apt-get install -y wget
RUN gem install tty-prompt
COPY src /src
RUN chmod +x /src/app.rb
ENTRYPOINT ["/src/app.rb"]

@ -0,0 +1,30 @@
# cdcup
A `docker` (`compose`) environment on Linux / macOS is required to play with this. Ruby is **not** necessary.
## `./cdcup.sh init`
Initialize a playground environment, and generate configuration files.
## `./cdcup.sh up`
Start docker containers. Note that it may take a while before database is ready.
## `./cdcup.sh pipeline <yaml>`
Submit a YAML pipeline job. Before executing this, please ensure that:
1. All container are running and ready for connections
2. (For MySQL) You've created at least one database & tables to be captured
## `./cdcup.sh flink`
Print Flink Web dashboard URL.
## `./cdcup.sh stop`
Stop all running playground containers.
## `./cdcup.sh down`
Stop and remove containers, networks, and volumes.

@ -0,0 +1,91 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Do not continue after error
set -e
display_help() {
echo "Usage: ./cdcup.sh { init | up | pipeline <yaml> | flink | mysql | stop | down | help }"
echo
echo "Commands:"
echo " * init:"
echo " Initialize a playground environment, and generate configuration files."
echo
echo " * up:"
echo " Start docker containers. This may take a while before database is ready."
echo
echo " * pipeline <yaml>:"
echo " Submit a YAML pipeline job."
echo
echo " * flink:"
echo " Print Flink Web dashboard URL."
echo
echo " * mysql:"
echo " Open MySQL console."
echo
echo " * stop:"
echo " Stop all running playground containers."
echo
echo " * down:"
echo " Stop and remove containers, networks, and volumes."
echo
echo " * help:"
echo " Print this message."
}
if [ "$1" == 'init' ]; then
printf "🚩 Building bootstrap docker image...\n"
docker build -q -t cdcup/bootstrap .
rm -rf cdc && mkdir -p cdc
printf "🚩 Starting bootstrap wizard...\n"
docker run -it --rm -v "$(pwd)/cdc":/cdc cdcup/bootstrap
mv cdc/docker-compose.yaml ./docker-compose.yaml
mv cdc/pipeline-definition.yaml ./pipeline-definition.yaml
elif [ "$1" == 'up' ]; then
printf "🚩 Starting playground...\n"
docker compose up -d
docker compose exec jobmanager bash -c 'rm -rf /opt/flink-cdc'
docker compose cp cdc jobmanager:/opt/flink-cdc
elif [ "$1" == 'pipeline' ]; then
if [ -z "$2" ]; then
printf "Usage: ./cdcup.sh pipeline <pipeline-definition.yaml>\n"
exit 1
fi
printf "🚩 Submitting pipeline job...\n"
docker compose cp "$2" jobmanager:/opt/flink-cdc/pipeline-definition.yaml
startup_script="cd /opt/flink-cdc && ./bin/flink-cdc.sh ./pipeline-definition.yaml --flink-home /opt/flink"
if test -f ./cdc/lib/hadoop-uber.jar; then
startup_script="$startup_script --jar lib/hadoop-uber.jar"
fi
if test -f ./cdc/lib/mysql-connector-java.jar; then
startup_script="$startup_script --jar lib/mysql-connector-java.jar"
fi
docker compose exec jobmanager bash -c "$startup_script"
elif [ "$1" == 'flink' ]; then
port_info="$(docker compose port jobmanager 8081)"
printf "🚩 Visit Flink Dashboard at: http://localhost:%s\n" "${port_info##*:}"
elif [ "$1" == 'mysql' ]; then
docker compose exec -it mysql bash -c "mysql -uroot" || echo "❌ Unable to find MySQL container."
elif [ "$1" == 'stop' ]; then
printf "🚩 Stopping playground...\n"
docker compose stop
elif [ "$1" == 'down' ]; then
printf "🚩 Purging playground...\n"
docker compose down -v
else
display_help
fi

@ -0,0 +1,127 @@
#!/usr/bin/env ruby
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
require 'tty-prompt'
require 'yaml'
require_relative 'download_libs'
require_relative 'source/my_sql'
require_relative 'source/values_source'
require_relative 'sink/doris'
require_relative 'sink/kafka'
require_relative 'sink/paimon'
require_relative 'sink/star_rocks'
require_relative 'sink/values_sink'
CDC_DATA_VOLUME = 'cdc-data'
@prompt = TTY::Prompt.new
@docker_compose_file_content = {
'services' => {},
'volumes' => {
CDC_DATA_VOLUME => {}
}
}
@pipeline_yaml_file_content = {
'pipeline' => {
'parallelism' => 1
}
}
SOURCES = {
mysql: MySQL,
values: ValuesSource
}.freeze
SINKS = {
kafka: Kafka,
paimon: Paimon,
doris: Doris,
starrocks: StarRocks,
values: ValuesSink
}.freeze
FLINK_VERSIONS = %w[
1.17.2
1.18.1
1.19.1
1.20.0
].freeze
FLINK_CDC_VERSIONS = %w[
3.0.0
3.0.1
3.1.0
3.1.1
3.2.0
3.2.1
].freeze
puts
@prompt.say '🎉 Welcome to cdc-up quickstart wizard!'
@prompt.say ' There are a few questions to ask before getting started:'
flink_version = @prompt.select('🐿️ Which Flink version would you like to use?', FLINK_VERSIONS,
default: FLINK_VERSIONS.last)
flink_cdc_version = @prompt.select(' Which Flink CDC version would you like to use?', FLINK_CDC_VERSIONS,
default: FLINK_CDC_VERSIONS.last)
@docker_compose_file_content['services']['jobmanager'] = {
'image' => "flink:#{flink_version}-scala_2.12",
'hostname' => 'jobmanager',
'ports' => ['8081'],
'command' => 'jobmanager',
'environment' => {
'FLINK_PROPERTIES' => "jobmanager.rpc.address: jobmanager\nexecution.checkpointing.interval: 3000"
}
}
@docker_compose_file_content['services']['taskmanager'] = {
'image' => "flink:#{flink_version}-scala_2.12",
'hostname' => 'taskmanager',
'command' => 'taskmanager',
'environment' => {
'FLINK_PROPERTIES' => "jobmanager.rpc.address: jobmanager\ntaskmanager.numberOfTaskSlots: 4\nexecution.checkpointing.interval: 3000"
}
}
source = @prompt.select('🚰 Which data source to use?', SOURCES.keys)
sink = @prompt.select('🪣 Which data sink to use?', SINKS.keys)
SOURCES[source].prepend_to_docker_compose_yaml(@docker_compose_file_content)
SOURCES[source].prepend_to_pipeline_yaml(@pipeline_yaml_file_content)
SINKS[sink].prepend_to_docker_compose_yaml(@docker_compose_file_content)
SINKS[sink].prepend_to_pipeline_yaml(@pipeline_yaml_file_content)
File.write('/cdc/docker-compose.yaml', YAML.dump(@docker_compose_file_content))
File.write('/cdc/pipeline-definition.yaml', YAML.dump(@pipeline_yaml_file_content))
@prompt.say "\n3⃣ Preparing CDC #{flink_cdc_version}..."
connectors_name = Set.new [
SOURCES[source].connector_name,
SINKS[sink].connector_name
]
download_cdc(flink_cdc_version, '/cdc/', connectors_name)
@prompt.say '🥳 All done!'

@ -0,0 +1,119 @@
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CDC_VERSIONS = {
'3.0.0': {
tar: 'https://github.com/ververica/flink-cdc-connectors/releases/download/release-3.0.0/flink-cdc-3.0.0-bin.tar.gz',
connectors: %w[
https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-doris/3.0.0/flink-cdc-pipeline-connector-doris-3.0.0.jar
https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-mysql/3.0.0/flink-cdc-pipeline-connector-mysql-3.0.0.jar
https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-starrocks/3.0.0/flink-cdc-pipeline-connector-starrocks-3.0.0.jar
https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-values/3.0.0/flink-cdc-pipeline-connector-values-3.0.0.jar
]
},
'3.0.1': {
tar: 'https://github.com/ververica/flink-cdc-connectors/releases/download/release-3.0.1/flink-cdc-3.0.1-bin.tar.gz',
connectors: %w[
https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-doris/3.0.1/flink-cdc-pipeline-connector-doris-3.0.1.jar
https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-mysql/3.0.1/flink-cdc-pipeline-connector-mysql-3.0.1.jar
https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-starrocks/3.0.1/flink-cdc-pipeline-connector-starrocks-3.0.1.jar
https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-values/3.0.1/flink-cdc-pipeline-connector-values-3.0.1.jar
]
},
'3.1.0': {
tar: 'https://dlcdn.apache.org/flink/flink-cdc-3.1.0/flink-cdc-3.1.0-bin.tar.gz',
connectors: %w[
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.1.0/flink-cdc-pipeline-connector-mysql-3.1.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-doris/3.1.0/flink-cdc-pipeline-connector-doris-3.1.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.1.0/flink-cdc-pipeline-connector-starrocks-3.1.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/3.1.0/flink-cdc-pipeline-connector-kafka-3.1.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-paimon/3.1.0/flink-cdc-pipeline-connector-paimon-3.1.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-values/3.1.0/flink-cdc-pipeline-connector-values-3.1.0.jar
]
},
'3.1.1': {
tar: 'https://dlcdn.apache.org/flink/flink-cdc-3.1.1/flink-cdc-3.1.1-bin.tar.gz',
connectors: %w[
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.1.1/flink-cdc-pipeline-connector-mysql-3.1.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-doris/3.1.1/flink-cdc-pipeline-connector-doris-3.1.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.1.1/flink-cdc-pipeline-connector-starrocks-3.1.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/3.1.1/flink-cdc-pipeline-connector-kafka-3.1.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-paimon/3.1.1/flink-cdc-pipeline-connector-paimon-3.1.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-values/3.1.1/flink-cdc-pipeline-connector-values-3.1.1.jar
]
},
'3.2.0': {
tar: 'https://dlcdn.apache.org/flink/flink-cdc-3.2.0/flink-cdc-3.2.0-bin.tar.gz',
connectors: %w[
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.2.0/flink-cdc-pipeline-connector-mysql-3.2.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-doris/3.2.0/flink-cdc-pipeline-connector-doris-3.2.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.2.0/flink-cdc-pipeline-connector-starrocks-3.2.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/3.2.0/flink-cdc-pipeline-connector-kafka-3.2.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-paimon/3.2.0/flink-cdc-pipeline-connector-paimon-3.2.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-elasticsearch/3.2.0/flink-cdc-pipeline-connector-elasticsearch-3.2.0.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-values/3.2.0/flink-cdc-pipeline-connector-values-3.2.0.jar
]
},
'3.2.1': {
tar: 'https://dlcdn.apache.org/flink/flink-cdc-3.2.1/flink-cdc-3.2.1-bin.tar.gz',
connectors: %w[
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.2.1/flink-cdc-pipeline-connector-mysql-3.2.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-doris/3.2.1/flink-cdc-pipeline-connector-doris-3.2.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.2.1/flink-cdc-pipeline-connector-starrocks-3.2.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/3.2.1/flink-cdc-pipeline-connector-kafka-3.2.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-paimon/3.2.1/flink-cdc-pipeline-connector-paimon-3.2.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-elasticsearch/3.2.1/flink-cdc-pipeline-connector-elasticsearch-3.2.1.jar
https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-values/3.2.1/flink-cdc-pipeline-connector-values-3.2.1.jar
]
},
}.freeze
def download_cdc_bin(version, dest_path)
cdc_version = CDC_VERSIONS[version]
puts "\tDownloading #{File.basename(cdc_version[:tar])}..."
`wget -q -O /tmp/cdc-bin.tar.gz #{cdc_version[:tar]}`
`tar --strip-components=1 -xzvf /tmp/cdc-bin.tar.gz -C #{dest_path}`
end
def download_connectors(version, dest_path, connectors)
CDC_VERSIONS[version][:connectors].each do |link|
jar_name = File.basename(link)
if connectors.any? { |c| jar_name.include? c }
puts "\tDownloading #{jar_name}..."
`wget -q -O /#{dest_path}/lib/#{jar_name}.jar #{link}`
end
end
end
def download_mysql_driver(dest_path)
puts "\tDownloading MySQL Java Connector..."
`wget -q https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.27/mysql-connector-java-8.0.27.jar \
-O #{dest_path}/lib/mysql-connector-java.jar`
end
def download_hadoop_common(dest_path)
puts "\tDownloading Hadoop Uber jar..."
`wget -q https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar \
-O #{dest_path}/lib/hadoop-uber.jar`
end
def download_cdc(version, dest_path, connectors = [])
download_cdc_bin(version.to_sym, dest_path)
download_connectors(version.to_sym, dest_path, connectors)
download_mysql_driver(dest_path) if connectors.include? MySQL.connector_name
download_hadoop_common(dest_path) if connectors.include? Paimon.connector_name
end

@ -0,0 +1,47 @@
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Doris sink definition generator class.
class Doris
class << self
def connector_name
'flink-cdc-pipeline-connector-doris'
end
def prepend_to_docker_compose_yaml(docker_compose_yaml)
docker_compose_yaml['services']['doris'] = {
'image' => 'apache/doris:doris-all-in-one-2.1.0',
'hostname' => 'doris',
'ports' => %w[8030 8040 9030],
'volumes' => ["#{CDC_DATA_VOLUME}:/data"]
}
end
def prepend_to_pipeline_yaml(pipeline_yaml)
pipeline_yaml['sink'] = {
'type' => 'doris',
'fenodes' => 'doris:8030',
'benodes' => 'doris:8040',
'jdbc-url' => 'jdbc:mysql://doris:9030',
'username' => 'root',
'password' => '',
'table.create.properties.light_schema_change' => true,
'table.create.properties.replication_num' => 1
}
end
end
end

@ -0,0 +1,58 @@
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Kafka sink definition generator class.
class Kafka
class << self
def connector_name
'flink-cdc-pipeline-connector-kafka'
end
def prepend_to_docker_compose_yaml(docker_compose_yaml)
docker_compose_yaml['services']['zookeeper'] = {
'image' => 'confluentinc/cp-zookeeper:7.4.4',
'hostname' => 'zookeeper',
'ports' => ['2181'],
'environment' => {
'ZOOKEEPER_CLIENT_PORT' => 2181,
'ZOOKEEPER_TICK_TIME' => 2000
}
}
docker_compose_yaml['services']['kafka'] = {
'image' => 'confluentinc/cp-kafka:7.4.4',
'depends_on' => ['zookeeper'],
'hostname' => 'kafka',
'ports' => ['9092'],
'environment' => {
'KAFKA_BROKER_ID' => 1,
'KAFKA_ZOOKEEPER_CONNECT' => 'zookeeper:2181',
'KAFKA_ADVERTISED_LISTENERS' => 'PLAINTEXT://kafka:9092',
'KAFKA_LISTENER_SECURITY_PROTOCOL_MAP' => 'PLAINTEXT:PLAINTEXT',
'KAFKA_INTER_BROKER_LISTENER_NAME' => 'PLAINTEXT',
'KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR' => 1
}
}
end
def prepend_to_pipeline_yaml(pipeline_yaml)
pipeline_yaml['sink'] = {
'type' => 'kafka',
'properties.bootstrap.servers' => 'PLAINTEXT://kafka:9092'
}
end
end
end

@ -0,0 +1,36 @@
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Paimon sink definition generator class.
class Paimon
class << self
def connector_name
'flink-cdc-pipeline-connector-paimon'
end
# Nothing to do
def prepend_to_docker_compose_yaml(_); end
def prepend_to_pipeline_yaml(pipeline_yaml)
pipeline_yaml['sink'] = {
'type' => 'paimon',
'catalog.properties.metastore' => 'filesystem',
'catalog.properties.warehouse' => '/data/paimon-warehouse'
}
end
end
end

@ -0,0 +1,45 @@
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# StarRocks sink definition generator class.
class StarRocks
class << self
def connector_name
'flink-cdc-pipeline-connector-starrocks'
end
def prepend_to_docker_compose_yaml(docker_compose_yaml)
docker_compose_yaml['services']['starrocks'] = {
'image' => 'starrocks/allin1-ubuntu:3.2.6',
'hostname' => 'starrocks',
'ports' => %w[8080 9030],
'volumes' => ["#{CDC_DATA_VOLUME}:/data"]
}
end
def prepend_to_pipeline_yaml(pipeline_yaml)
pipeline_yaml['sink'] = {
'type' => 'starrocks',
'jdbc-url' => 'jdbc:mysql://starrocks:9030',
'load-url' => 'starrocks:8080',
'username' => 'root',
'password' => '',
'table.create.properties.replication_num' => 1
}
end
end
end

@ -0,0 +1,34 @@
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Values sink definition generator class.
class ValuesSink
class << self
def connector_name
'flink-cdc-pipeline-connector-values'
end
# Nothing to do
def prepend_to_docker_compose_yaml(_); end
def prepend_to_pipeline_yaml(pipeline_yaml)
pipeline_yaml['sink'] = {
'type' => 'values'
}
end
end
end

@ -0,0 +1,51 @@
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# MySQL source definition generator class.
class MySQL
class << self
def connector_name
'flink-cdc-pipeline-connector-mysql'
end
def prepend_to_docker_compose_yaml(docker_compose_yaml)
docker_compose_yaml['services']['mysql'] = {
'image' => 'mysql:8.0',
'hostname' => 'mysql',
'environment' => {
'MYSQL_ALLOW_EMPTY_PASSWORD' => true,
'MYSQL_DATABASE' => 'cdcup'
},
'ports' => ['3306'],
'volumes' => ["#{CDC_DATA_VOLUME}:/data"]
}
end
def prepend_to_pipeline_yaml(pipeline_yaml)
pipeline_yaml['source'] = {
'type' => 'mysql',
'hostname' => 'mysql',
'port' => 3306,
'username' => 'root',
'password' => '',
'tables' => '\.*.\.*',
'server-id' => '5400-6400',
'server-time-zone' => 'UTC'
}
end
end
end

@ -0,0 +1,35 @@
# frozen_string_literal: true
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Values source definition generator class.
class ValuesSource
class << self
def connector_name
'flink-cdc-pipeline-connector-values'
end
# Nothing to do
def prepend_to_docker_compose_yaml(_); end
def prepend_to_pipeline_yaml(pipeline_yaml)
pipeline_yaml['source'] = {
'type' => 'values',
'event-set.id' => 'SINGLE_SPLIT_MULTI_TABLES'
}
end
end
end
Loading…
Cancel
Save