diff --git a/README.md b/README.md index 480cf7585..ab17cfa66 100644 --- a/README.md +++ b/README.md @@ -28,14 +28,41 @@ and elegance of data integration via YAML to describe the data movement and tran The Flink CDC prioritizes efficient end-to-end data integration and offers enhanced functionalities such as full database synchronization, sharding table synchronization, schema evolution and data transformation. -![Flink CDC framework desigin](docs/static/fig/architecture.png) +![Flink CDC framework design](docs/static/fig/architecture.png) +### Quickstart Guide +Flink CDC provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs. +You will need to have a working Docker and Docker compose environment to use it. + +1. Run `git clone https://github.com/apache/flink-cdc.git --depth=1` to retrieve a copy of Flink CDC source code. +2. Run `cd tools/cdcup/ && ./cdcup.sh init` to use the CdcUp tool to start a playground environment. +3. Run `./cdcup.sh up` to boot-up docker containers, and wait for them to be ready. +4. Run `./cdcup.sh mysql` to open a MySQL session, and create at least one table. + +```sql +-- initialize db and table +CREATE DATABASE cdc_playground; +USE cdc_playground; +CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32)); + +-- insert test data +INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida'); + +-- verify if it has been successfully inserted +SELECT * FROM test_table; +``` + +5. Run `./cdcup.sh pipeline pipeline-definition.yaml` to submit the pipeline job. You may also edit the pipeline definition file for further configurations. +6. Run `./cdcup.sh flink` to access the Flink Web UI. ### Getting Started 1. Prepare a [Apache Flink](https://nightlies.apache.org/flink/flink-docs-master/docs/try-flink/local_installation/#starting-and-stopping-a-local-cluster) cluster and set up `FLINK_HOME` environment variable. 2. [Download](https://github.com/apache/flink-cdc/releases) Flink CDC tar, unzip it and put jars of pipeline connector to Flink `lib` directory. + +> If you're using macOS or Linux, you may use `brew install apache-flink-cdc` to install Flink CDC and compatible connectors quickly. + 3. Create a **YAML** file to describe the data source and data sink, the following example synchronizes all tables under MySQL app_db database to Doris : ```yaml source: @@ -89,8 +116,6 @@ Try it out yourself with our more detailed [tutorial](docs/content/docs/get-star You can also see [connector overview](docs/content/docs/connectors/pipeline-connectors/overview.md) to view a comprehensive catalog of the connectors currently provided and understand more detailed configurations. - - ### Join the Community There are many ways to participate in the Apache Flink CDC community. The diff --git a/docs/content.zh/docs/get-started/quickstart/cdc-up-quickstart-guide.md b/docs/content.zh/docs/get-started/quickstart/cdc-up-quickstart-guide.md new file mode 100644 index 000000000..9739a6d28 --- /dev/null +++ b/docs/content.zh/docs/get-started/quickstart/cdc-up-quickstart-guide.md @@ -0,0 +1,108 @@ +--- +title: "CdcUp Quickstart Guide" +weight: 3 +type: docs +aliases: +- /get-started/quickstart/cdc-up-quickstart-guide +--- + + +# Setting-up a CDC Pipeline Environment with CdcUp CLI + +Flink CDC now provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs quickly. + +You will need a working Docker and Docker Compose V2 environment to use it. + +## Step-by-step Guide + +1. Open a terminal and run `git clone https://github.com/apache/flink-cdc.git --depth=1` to retrieve a copy of Flink CDC source code. + +2. Run `cd flink-cdc/tools/cdcup/` to enter CDC directory and run `./cdcup.sh` to use the "cdc-up" tool to start a playground environment. You should see the following output: + +``` +Usage: ./cdcup.sh { init | up | pipeline | flink | mysql | stop | down | help } + +Commands: + * init: + Initialize a playground environment, and generate configuration files. + + * up: + Start docker containers. This may take a while before database is ready. + + * pipeline : + Submit a YAML pipeline job. + + * flink: + Print Flink Web dashboard URL. + + * mysql: + Open MySQL console. + + * stop: + Stop all running playground containers. + + * down: + Stop and remove containers, networks, and volumes. + + * help: + Print this message. +``` + +3. Run `./cdcup.sh init`, waiting for pulling base docker image, and you should see the following output: + +``` +🎉 Welcome to cdc-up quickstart wizard! + There are a few questions to ask before getting started: +🐿️ Which Flink version would you like to use? (Press ↑/↓ arrow to move and Enter to select) + 1.17.2 + 1.18.1 + 1.19.1 +‣ 1.20.0 +``` + +Use the arrow keys to navigate and press Enter to select specified Flink and CDC version, source, and sink connectors. + +4. Run `./cdcup.sh up` to boot-up docker containers, and wait for them to be ready. Some sink connectors (like Doris and StarRocks) need some time to initialize and ready for handling requests. + +5. If you're choosing MySQL as data source, you need to run `./cdcup.sh mysql` to open a MySQL session and create at least one table. + +For example, the following SQL commands create a database and a table, insert some test data, and verify the result: + +```sql +-- initialize db and table +CREATE DATABASE cdc_playground; +USE cdc_playground; +CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32)); + +-- insert test data +INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida'); + +-- verify if it has been successfully inserted +SELECT * FROM test_table; +``` + +6. Run `./cdcup.sh pipeline pipeline-definition.yaml` to submit the pipeline job. You may also edit the pipeline definition file for further configurations. + +7. Run `./cdcup.sh flink` and navigate to the printed URL to access the Flink Web UI. + +``` +$ ./cdcup.sh flink +🚩 Visit Flink Dashboard at: http://localhost:33448 +``` \ No newline at end of file diff --git a/docs/content/docs/get-started/quickstart/cdc-up-quickstart-guide.md b/docs/content/docs/get-started/quickstart/cdc-up-quickstart-guide.md new file mode 100644 index 000000000..9739a6d28 --- /dev/null +++ b/docs/content/docs/get-started/quickstart/cdc-up-quickstart-guide.md @@ -0,0 +1,108 @@ +--- +title: "CdcUp Quickstart Guide" +weight: 3 +type: docs +aliases: +- /get-started/quickstart/cdc-up-quickstart-guide +--- + + +# Setting-up a CDC Pipeline Environment with CdcUp CLI + +Flink CDC now provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs quickly. + +You will need a working Docker and Docker Compose V2 environment to use it. + +## Step-by-step Guide + +1. Open a terminal and run `git clone https://github.com/apache/flink-cdc.git --depth=1` to retrieve a copy of Flink CDC source code. + +2. Run `cd flink-cdc/tools/cdcup/` to enter CDC directory and run `./cdcup.sh` to use the "cdc-up" tool to start a playground environment. You should see the following output: + +``` +Usage: ./cdcup.sh { init | up | pipeline | flink | mysql | stop | down | help } + +Commands: + * init: + Initialize a playground environment, and generate configuration files. + + * up: + Start docker containers. This may take a while before database is ready. + + * pipeline : + Submit a YAML pipeline job. + + * flink: + Print Flink Web dashboard URL. + + * mysql: + Open MySQL console. + + * stop: + Stop all running playground containers. + + * down: + Stop and remove containers, networks, and volumes. + + * help: + Print this message. +``` + +3. Run `./cdcup.sh init`, waiting for pulling base docker image, and you should see the following output: + +``` +🎉 Welcome to cdc-up quickstart wizard! + There are a few questions to ask before getting started: +🐿️ Which Flink version would you like to use? (Press ↑/↓ arrow to move and Enter to select) + 1.17.2 + 1.18.1 + 1.19.1 +‣ 1.20.0 +``` + +Use the arrow keys to navigate and press Enter to select specified Flink and CDC version, source, and sink connectors. + +4. Run `./cdcup.sh up` to boot-up docker containers, and wait for them to be ready. Some sink connectors (like Doris and StarRocks) need some time to initialize and ready for handling requests. + +5. If you're choosing MySQL as data source, you need to run `./cdcup.sh mysql` to open a MySQL session and create at least one table. + +For example, the following SQL commands create a database and a table, insert some test data, and verify the result: + +```sql +-- initialize db and table +CREATE DATABASE cdc_playground; +USE cdc_playground; +CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32)); + +-- insert test data +INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida'); + +-- verify if it has been successfully inserted +SELECT * FROM test_table; +``` + +6. Run `./cdcup.sh pipeline pipeline-definition.yaml` to submit the pipeline job. You may also edit the pipeline definition file for further configurations. + +7. Run `./cdcup.sh flink` and navigate to the printed URL to access the Flink Web UI. + +``` +$ ./cdcup.sh flink +🚩 Visit Flink Dashboard at: http://localhost:33448 +``` \ No newline at end of file diff --git a/tools/cdcup/.gitignore b/tools/cdcup/.gitignore new file mode 100644 index 000000000..656717fdc --- /dev/null +++ b/tools/cdcup/.gitignore @@ -0,0 +1,63 @@ +*.gem +*.rbc +/.config +/coverage/ +/InstalledFiles +/pkg/ +/spec/reports/ +/spec/examples.txt +/test/tmp/ +/test/version_tmp/ +/tmp/ + +# Used by dotenv library to load environment variables. +# .env + +# Ignore Byebug command history file. +.byebug_history + +## Specific to RubyMotion: +.dat* +.repl_history +build/ +*.bridgesupport +build-iPhoneOS/ +build-iPhoneSimulator/ + +## Specific to RubyMotion (use of CocoaPods): +# +# We recommend against adding the Pods directory to your .gitignore. However +# you should judge for yourself, the pros and cons are mentioned at: +# https://guides.cocoapods.org/using/using-cocoapods.html#should-i-check-the-pods-directory-into-source-control +# +# vendor/Pods/ + +## Documentation cache and generated files: +/.yardoc/ +/_yardoc/ +/doc/ +/rdoc/ + +## Environment normalization: +/.bundle/ +/vendor/bundle +/lib/bundler/man/ + +# for a library or gem, you might want to ignore these files since the code is +# intended to run in multiple environments; otherwise, check them in: +# Gemfile.lock +# .ruby-version +# .ruby-gemset + +# unless supporting rvm < 1.11.0 or doing something fancy, ignore this: +.rvmrc + +# Used by RuboCop. Remote config files pulled in from inherit_from directive. +# .rubocop-https?--* + +.idea/** + +# These are generated files +cdc/** +/docker-compose.yaml +/pipeline-definition.yaml diff --git a/tools/cdcup/Dockerfile b/tools/cdcup/Dockerfile new file mode 100644 index 000000000..8eaf75456 --- /dev/null +++ b/tools/cdcup/Dockerfile @@ -0,0 +1,23 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +FROM ruby:3.3-slim + +WORKDIR /src +RUN apt-get update && apt-get install -y wget +RUN gem install tty-prompt +COPY src /src +RUN chmod +x /src/app.rb +ENTRYPOINT ["/src/app.rb"] diff --git a/tools/cdcup/README.md b/tools/cdcup/README.md new file mode 100644 index 000000000..c46dfcb71 --- /dev/null +++ b/tools/cdcup/README.md @@ -0,0 +1,30 @@ +# cdcup + +A `docker` (`compose`) environment on Linux / macOS is required to play with this. Ruby is **not** necessary. + +## `./cdcup.sh init` + +Initialize a playground environment, and generate configuration files. + +## `./cdcup.sh up` + +Start docker containers. Note that it may take a while before database is ready. + +## `./cdcup.sh pipeline ` + +Submit a YAML pipeline job. Before executing this, please ensure that: + +1. All container are running and ready for connections +2. (For MySQL) You've created at least one database & tables to be captured + +## `./cdcup.sh flink` + +Print Flink Web dashboard URL. + +## `./cdcup.sh stop` + +Stop all running playground containers. + +## `./cdcup.sh down` + +Stop and remove containers, networks, and volumes. diff --git a/tools/cdcup/cdcup.sh b/tools/cdcup/cdcup.sh new file mode 100755 index 000000000..0f01a781d --- /dev/null +++ b/tools/cdcup/cdcup.sh @@ -0,0 +1,91 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Do not continue after error +set -e + +display_help() { + echo "Usage: ./cdcup.sh { init | up | pipeline | flink | mysql | stop | down | help }" + echo + echo "Commands:" + echo " * init:" + echo " Initialize a playground environment, and generate configuration files." + echo + echo " * up:" + echo " Start docker containers. This may take a while before database is ready." + echo + echo " * pipeline :" + echo " Submit a YAML pipeline job." + echo + echo " * flink:" + echo " Print Flink Web dashboard URL." + echo + echo " * mysql:" + echo " Open MySQL console." + echo + echo " * stop:" + echo " Stop all running playground containers." + echo + echo " * down:" + echo " Stop and remove containers, networks, and volumes." + echo + echo " * help:" + echo " Print this message." +} + +if [ "$1" == 'init' ]; then + printf "🚩 Building bootstrap docker image...\n" + docker build -q -t cdcup/bootstrap . + rm -rf cdc && mkdir -p cdc + printf "🚩 Starting bootstrap wizard...\n" + docker run -it --rm -v "$(pwd)/cdc":/cdc cdcup/bootstrap + mv cdc/docker-compose.yaml ./docker-compose.yaml + mv cdc/pipeline-definition.yaml ./pipeline-definition.yaml +elif [ "$1" == 'up' ]; then + printf "🚩 Starting playground...\n" + docker compose up -d + docker compose exec jobmanager bash -c 'rm -rf /opt/flink-cdc' + docker compose cp cdc jobmanager:/opt/flink-cdc +elif [ "$1" == 'pipeline' ]; then + if [ -z "$2" ]; then + printf "Usage: ./cdcup.sh pipeline \n" + exit 1 + fi + printf "🚩 Submitting pipeline job...\n" + docker compose cp "$2" jobmanager:/opt/flink-cdc/pipeline-definition.yaml + startup_script="cd /opt/flink-cdc && ./bin/flink-cdc.sh ./pipeline-definition.yaml --flink-home /opt/flink" + if test -f ./cdc/lib/hadoop-uber.jar; then + startup_script="$startup_script --jar lib/hadoop-uber.jar" + fi + if test -f ./cdc/lib/mysql-connector-java.jar; then + startup_script="$startup_script --jar lib/mysql-connector-java.jar" + fi + docker compose exec jobmanager bash -c "$startup_script" +elif [ "$1" == 'flink' ]; then + port_info="$(docker compose port jobmanager 8081)" + printf "🚩 Visit Flink Dashboard at: http://localhost:%s\n" "${port_info##*:}" +elif [ "$1" == 'mysql' ]; then + docker compose exec -it mysql bash -c "mysql -uroot" || echo "❌ Unable to find MySQL container." +elif [ "$1" == 'stop' ]; then + printf "🚩 Stopping playground...\n" + docker compose stop +elif [ "$1" == 'down' ]; then + printf "🚩 Purging playground...\n" + docker compose down -v +else + display_help +fi diff --git a/tools/cdcup/src/app.rb b/tools/cdcup/src/app.rb new file mode 100644 index 000000000..af66b404c --- /dev/null +++ b/tools/cdcup/src/app.rb @@ -0,0 +1,127 @@ +#!/usr/bin/env ruby +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +require 'tty-prompt' +require 'yaml' + +require_relative 'download_libs' + +require_relative 'source/my_sql' +require_relative 'source/values_source' + +require_relative 'sink/doris' +require_relative 'sink/kafka' +require_relative 'sink/paimon' +require_relative 'sink/star_rocks' +require_relative 'sink/values_sink' + +CDC_DATA_VOLUME = 'cdc-data' + +@prompt = TTY::Prompt.new + +@docker_compose_file_content = { + 'services' => {}, + 'volumes' => { + CDC_DATA_VOLUME => {} + } +} + +@pipeline_yaml_file_content = { + 'pipeline' => { + 'parallelism' => 1 + } +} + +SOURCES = { + mysql: MySQL, + values: ValuesSource +}.freeze + +SINKS = { + kafka: Kafka, + paimon: Paimon, + doris: Doris, + starrocks: StarRocks, + values: ValuesSink +}.freeze + +FLINK_VERSIONS = %w[ + 1.17.2 + 1.18.1 + 1.19.1 + 1.20.0 +].freeze + +FLINK_CDC_VERSIONS = %w[ + 3.0.0 + 3.0.1 + 3.1.0 + 3.1.1 + 3.2.0 + 3.2.1 +].freeze + +puts +@prompt.say '🎉 Welcome to cdc-up quickstart wizard!' +@prompt.say ' There are a few questions to ask before getting started:' + +flink_version = @prompt.select('🐿️ Which Flink version would you like to use?', FLINK_VERSIONS, + default: FLINK_VERSIONS.last) +flink_cdc_version = @prompt.select(' ️ Which Flink CDC version would you like to use?', FLINK_CDC_VERSIONS, + default: FLINK_CDC_VERSIONS.last) + +@docker_compose_file_content['services']['jobmanager'] = { + 'image' => "flink:#{flink_version}-scala_2.12", + 'hostname' => 'jobmanager', + 'ports' => ['8081'], + 'command' => 'jobmanager', + 'environment' => { + 'FLINK_PROPERTIES' => "jobmanager.rpc.address: jobmanager\nexecution.checkpointing.interval: 3000" + } +} + +@docker_compose_file_content['services']['taskmanager'] = { + 'image' => "flink:#{flink_version}-scala_2.12", + 'hostname' => 'taskmanager', + 'command' => 'taskmanager', + 'environment' => { + 'FLINK_PROPERTIES' => "jobmanager.rpc.address: jobmanager\ntaskmanager.numberOfTaskSlots: 4\nexecution.checkpointing.interval: 3000" + } +} + +source = @prompt.select('🚰 Which data source to use?', SOURCES.keys) +sink = @prompt.select('🪣 Which data sink to use?', SINKS.keys) + +SOURCES[source].prepend_to_docker_compose_yaml(@docker_compose_file_content) +SOURCES[source].prepend_to_pipeline_yaml(@pipeline_yaml_file_content) +SINKS[sink].prepend_to_docker_compose_yaml(@docker_compose_file_content) +SINKS[sink].prepend_to_pipeline_yaml(@pipeline_yaml_file_content) + +File.write('/cdc/docker-compose.yaml', YAML.dump(@docker_compose_file_content)) +File.write('/cdc/pipeline-definition.yaml', YAML.dump(@pipeline_yaml_file_content)) + +@prompt.say "\n3️⃣ Preparing CDC #{flink_cdc_version}..." + +connectors_name = Set.new [ + SOURCES[source].connector_name, + SINKS[sink].connector_name +] + +download_cdc(flink_cdc_version, '/cdc/', connectors_name) + +@prompt.say '🥳 All done!' diff --git a/tools/cdcup/src/download_libs.rb b/tools/cdcup/src/download_libs.rb new file mode 100644 index 000000000..c2aac2bf3 --- /dev/null +++ b/tools/cdcup/src/download_libs.rb @@ -0,0 +1,119 @@ +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +CDC_VERSIONS = { + '3.0.0': { + tar: 'https://github.com/ververica/flink-cdc-connectors/releases/download/release-3.0.0/flink-cdc-3.0.0-bin.tar.gz', + connectors: %w[ + https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-doris/3.0.0/flink-cdc-pipeline-connector-doris-3.0.0.jar + https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-mysql/3.0.0/flink-cdc-pipeline-connector-mysql-3.0.0.jar + https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-starrocks/3.0.0/flink-cdc-pipeline-connector-starrocks-3.0.0.jar + https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-values/3.0.0/flink-cdc-pipeline-connector-values-3.0.0.jar + ] + }, + '3.0.1': { + tar: 'https://github.com/ververica/flink-cdc-connectors/releases/download/release-3.0.1/flink-cdc-3.0.1-bin.tar.gz', + connectors: %w[ + https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-doris/3.0.1/flink-cdc-pipeline-connector-doris-3.0.1.jar + https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-mysql/3.0.1/flink-cdc-pipeline-connector-mysql-3.0.1.jar + https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-starrocks/3.0.1/flink-cdc-pipeline-connector-starrocks-3.0.1.jar + https://repo1.maven.org/maven2/com/ververica/flink-cdc-pipeline-connector-values/3.0.1/flink-cdc-pipeline-connector-values-3.0.1.jar + ] + }, + '3.1.0': { + tar: 'https://dlcdn.apache.org/flink/flink-cdc-3.1.0/flink-cdc-3.1.0-bin.tar.gz', + connectors: %w[ + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.1.0/flink-cdc-pipeline-connector-mysql-3.1.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-doris/3.1.0/flink-cdc-pipeline-connector-doris-3.1.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.1.0/flink-cdc-pipeline-connector-starrocks-3.1.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/3.1.0/flink-cdc-pipeline-connector-kafka-3.1.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-paimon/3.1.0/flink-cdc-pipeline-connector-paimon-3.1.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-values/3.1.0/flink-cdc-pipeline-connector-values-3.1.0.jar + ] + }, + '3.1.1': { + tar: 'https://dlcdn.apache.org/flink/flink-cdc-3.1.1/flink-cdc-3.1.1-bin.tar.gz', + connectors: %w[ + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.1.1/flink-cdc-pipeline-connector-mysql-3.1.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-doris/3.1.1/flink-cdc-pipeline-connector-doris-3.1.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.1.1/flink-cdc-pipeline-connector-starrocks-3.1.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/3.1.1/flink-cdc-pipeline-connector-kafka-3.1.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-paimon/3.1.1/flink-cdc-pipeline-connector-paimon-3.1.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-values/3.1.1/flink-cdc-pipeline-connector-values-3.1.1.jar + ] + }, + '3.2.0': { + tar: 'https://dlcdn.apache.org/flink/flink-cdc-3.2.0/flink-cdc-3.2.0-bin.tar.gz', + connectors: %w[ + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.2.0/flink-cdc-pipeline-connector-mysql-3.2.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-doris/3.2.0/flink-cdc-pipeline-connector-doris-3.2.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.2.0/flink-cdc-pipeline-connector-starrocks-3.2.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/3.2.0/flink-cdc-pipeline-connector-kafka-3.2.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-paimon/3.2.0/flink-cdc-pipeline-connector-paimon-3.2.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-elasticsearch/3.2.0/flink-cdc-pipeline-connector-elasticsearch-3.2.0.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-values/3.2.0/flink-cdc-pipeline-connector-values-3.2.0.jar + ] + }, + '3.2.1': { + tar: 'https://dlcdn.apache.org/flink/flink-cdc-3.2.1/flink-cdc-3.2.1-bin.tar.gz', + connectors: %w[ + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-mysql/3.2.1/flink-cdc-pipeline-connector-mysql-3.2.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-doris/3.2.1/flink-cdc-pipeline-connector-doris-3.2.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-starrocks/3.2.1/flink-cdc-pipeline-connector-starrocks-3.2.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-kafka/3.2.1/flink-cdc-pipeline-connector-kafka-3.2.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-paimon/3.2.1/flink-cdc-pipeline-connector-paimon-3.2.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-elasticsearch/3.2.1/flink-cdc-pipeline-connector-elasticsearch-3.2.1.jar + https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-pipeline-connector-values/3.2.1/flink-cdc-pipeline-connector-values-3.2.1.jar + ] + }, +}.freeze + +def download_cdc_bin(version, dest_path) + cdc_version = CDC_VERSIONS[version] + puts "\tDownloading #{File.basename(cdc_version[:tar])}..." + `wget -q -O /tmp/cdc-bin.tar.gz #{cdc_version[:tar]}` + `tar --strip-components=1 -xzvf /tmp/cdc-bin.tar.gz -C #{dest_path}` +end + +def download_connectors(version, dest_path, connectors) + CDC_VERSIONS[version][:connectors].each do |link| + jar_name = File.basename(link) + if connectors.any? { |c| jar_name.include? c } + puts "\tDownloading #{jar_name}..." + `wget -q -O /#{dest_path}/lib/#{jar_name}.jar #{link}` + end + end +end + +def download_mysql_driver(dest_path) + puts "\tDownloading MySQL Java Connector..." + `wget -q https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.27/mysql-connector-java-8.0.27.jar \ + -O #{dest_path}/lib/mysql-connector-java.jar` +end + +def download_hadoop_common(dest_path) + puts "\tDownloading Hadoop Uber jar..." + `wget -q https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar \ + -O #{dest_path}/lib/hadoop-uber.jar` +end + +def download_cdc(version, dest_path, connectors = []) + download_cdc_bin(version.to_sym, dest_path) + download_connectors(version.to_sym, dest_path, connectors) + download_mysql_driver(dest_path) if connectors.include? MySQL.connector_name + download_hadoop_common(dest_path) if connectors.include? Paimon.connector_name +end diff --git a/tools/cdcup/src/sink/doris.rb b/tools/cdcup/src/sink/doris.rb new file mode 100644 index 000000000..a191230f1 --- /dev/null +++ b/tools/cdcup/src/sink/doris.rb @@ -0,0 +1,47 @@ +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Doris sink definition generator class. +class Doris + class << self + def connector_name + 'flink-cdc-pipeline-connector-doris' + end + + def prepend_to_docker_compose_yaml(docker_compose_yaml) + docker_compose_yaml['services']['doris'] = { + 'image' => 'apache/doris:doris-all-in-one-2.1.0', + 'hostname' => 'doris', + 'ports' => %w[8030 8040 9030], + 'volumes' => ["#{CDC_DATA_VOLUME}:/data"] + } + end + + def prepend_to_pipeline_yaml(pipeline_yaml) + pipeline_yaml['sink'] = { + 'type' => 'doris', + 'fenodes' => 'doris:8030', + 'benodes' => 'doris:8040', + 'jdbc-url' => 'jdbc:mysql://doris:9030', + 'username' => 'root', + 'password' => '', + 'table.create.properties.light_schema_change' => true, + 'table.create.properties.replication_num' => 1 + } + end + end +end diff --git a/tools/cdcup/src/sink/kafka.rb b/tools/cdcup/src/sink/kafka.rb new file mode 100644 index 000000000..58a1f8c4f --- /dev/null +++ b/tools/cdcup/src/sink/kafka.rb @@ -0,0 +1,58 @@ +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Kafka sink definition generator class. +class Kafka + class << self + def connector_name + 'flink-cdc-pipeline-connector-kafka' + end + + def prepend_to_docker_compose_yaml(docker_compose_yaml) + docker_compose_yaml['services']['zookeeper'] = { + 'image' => 'confluentinc/cp-zookeeper:7.4.4', + 'hostname' => 'zookeeper', + 'ports' => ['2181'], + 'environment' => { + 'ZOOKEEPER_CLIENT_PORT' => 2181, + 'ZOOKEEPER_TICK_TIME' => 2000 + } + } + docker_compose_yaml['services']['kafka'] = { + 'image' => 'confluentinc/cp-kafka:7.4.4', + 'depends_on' => ['zookeeper'], + 'hostname' => 'kafka', + 'ports' => ['9092'], + 'environment' => { + 'KAFKA_BROKER_ID' => 1, + 'KAFKA_ZOOKEEPER_CONNECT' => 'zookeeper:2181', + 'KAFKA_ADVERTISED_LISTENERS' => 'PLAINTEXT://kafka:9092', + 'KAFKA_LISTENER_SECURITY_PROTOCOL_MAP' => 'PLAINTEXT:PLAINTEXT', + 'KAFKA_INTER_BROKER_LISTENER_NAME' => 'PLAINTEXT', + 'KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR' => 1 + } + } + end + + def prepend_to_pipeline_yaml(pipeline_yaml) + pipeline_yaml['sink'] = { + 'type' => 'kafka', + 'properties.bootstrap.servers' => 'PLAINTEXT://kafka:9092' + } + end + end +end diff --git a/tools/cdcup/src/sink/paimon.rb b/tools/cdcup/src/sink/paimon.rb new file mode 100644 index 000000000..1261bb50d --- /dev/null +++ b/tools/cdcup/src/sink/paimon.rb @@ -0,0 +1,36 @@ +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Paimon sink definition generator class. +class Paimon + class << self + def connector_name + 'flink-cdc-pipeline-connector-paimon' + end + + # Nothing to do + def prepend_to_docker_compose_yaml(_); end + + def prepend_to_pipeline_yaml(pipeline_yaml) + pipeline_yaml['sink'] = { + 'type' => 'paimon', + 'catalog.properties.metastore' => 'filesystem', + 'catalog.properties.warehouse' => '/data/paimon-warehouse' + } + end + end +end diff --git a/tools/cdcup/src/sink/star_rocks.rb b/tools/cdcup/src/sink/star_rocks.rb new file mode 100644 index 000000000..1b3a72b46 --- /dev/null +++ b/tools/cdcup/src/sink/star_rocks.rb @@ -0,0 +1,45 @@ +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# StarRocks sink definition generator class. +class StarRocks + class << self + def connector_name + 'flink-cdc-pipeline-connector-starrocks' + end + + def prepend_to_docker_compose_yaml(docker_compose_yaml) + docker_compose_yaml['services']['starrocks'] = { + 'image' => 'starrocks/allin1-ubuntu:3.2.6', + 'hostname' => 'starrocks', + 'ports' => %w[8080 9030], + 'volumes' => ["#{CDC_DATA_VOLUME}:/data"] + } + end + + def prepend_to_pipeline_yaml(pipeline_yaml) + pipeline_yaml['sink'] = { + 'type' => 'starrocks', + 'jdbc-url' => 'jdbc:mysql://starrocks:9030', + 'load-url' => 'starrocks:8080', + 'username' => 'root', + 'password' => '', + 'table.create.properties.replication_num' => 1 + } + end + end +end diff --git a/tools/cdcup/src/sink/values_sink.rb b/tools/cdcup/src/sink/values_sink.rb new file mode 100644 index 000000000..a8ff1ad99 --- /dev/null +++ b/tools/cdcup/src/sink/values_sink.rb @@ -0,0 +1,34 @@ +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Values sink definition generator class. +class ValuesSink + class << self + def connector_name + 'flink-cdc-pipeline-connector-values' + end + + # Nothing to do + def prepend_to_docker_compose_yaml(_); end + + def prepend_to_pipeline_yaml(pipeline_yaml) + pipeline_yaml['sink'] = { + 'type' => 'values' + } + end + end +end diff --git a/tools/cdcup/src/source/my_sql.rb b/tools/cdcup/src/source/my_sql.rb new file mode 100644 index 000000000..91c332287 --- /dev/null +++ b/tools/cdcup/src/source/my_sql.rb @@ -0,0 +1,51 @@ +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# MySQL source definition generator class. +class MySQL + class << self + def connector_name + 'flink-cdc-pipeline-connector-mysql' + end + + def prepend_to_docker_compose_yaml(docker_compose_yaml) + docker_compose_yaml['services']['mysql'] = { + 'image' => 'mysql:8.0', + 'hostname' => 'mysql', + 'environment' => { + 'MYSQL_ALLOW_EMPTY_PASSWORD' => true, + 'MYSQL_DATABASE' => 'cdcup' + }, + 'ports' => ['3306'], + 'volumes' => ["#{CDC_DATA_VOLUME}:/data"] + } + end + + def prepend_to_pipeline_yaml(pipeline_yaml) + pipeline_yaml['source'] = { + 'type' => 'mysql', + 'hostname' => 'mysql', + 'port' => 3306, + 'username' => 'root', + 'password' => '', + 'tables' => '\.*.\.*', + 'server-id' => '5400-6400', + 'server-time-zone' => 'UTC' + } + end + end +end diff --git a/tools/cdcup/src/source/values_source.rb b/tools/cdcup/src/source/values_source.rb new file mode 100644 index 000000000..41b370283 --- /dev/null +++ b/tools/cdcup/src/source/values_source.rb @@ -0,0 +1,35 @@ +# frozen_string_literal: true +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Values source definition generator class. +class ValuesSource + class << self + def connector_name + 'flink-cdc-pipeline-connector-values' + end + + # Nothing to do + def prepend_to_docker_compose_yaml(_); end + + def prepend_to_pipeline_yaml(pipeline_yaml) + pipeline_yaml['source'] = { + 'type' => 'values', + 'event-set.id' => 'SINGLE_SPLIT_MULTI_TABLES' + } + end + end +end