diff --git a/docs/content.zh/docs/deployment/kubernetes.md b/docs/content.zh/docs/deployment/kubernetes.md index 95539112b..c9c320aca 100644 --- a/docs/content.zh/docs/deployment/kubernetes.md +++ b/docs/content.zh/docs/deployment/kubernetes.md @@ -24,53 +24,50 @@ specific language governing permissions and limitations under the License. --> -# Introduction +# 简介 -Kubernetes is a popular container-orchestration system for automating computer application deployment, scaling, and management. -Flink's native Kubernetes integration allows you to directly deploy Flink on a running Kubernetes cluster. -Moreover, Flink is able to dynamically allocate and de-allocate TaskManagers depending on the required resources because it can directly talk to Kubernetes. +Kubernetes(K8s)是一种流行的容器编排系统,用于自动化部署、扩展和管理应用程序。Flink的原生Kubernetes集成允许您直接在正在运行的 Kubernetes 集群上部署 Flink。此外,Flink 能够根据所需资源动态分配和取消分配TaskManager,因为它可以直接与Kubernetes通信。 -Apache Flink also provides a Kubernetes operator for managing Flink clusters on Kubernetes. It supports both standalone and native deployment mode and greatly simplifies deployment, configuration and the life cycle management of Flink resources on Kubernetes. +Apache Flink还提供了Kubernetes Operator,用于管理Kubernetes上的Flink集群。它支持独立部署和原生部署模式,极大简化了Flink在Kubernetes上的部署、配置和生命周期管理。 -For more information, please refer to the [Flink Kubernetes Operator documentation](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/). +更多信息请参考:[Flink Kubernetes Operator文档](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)。 -## Preparation +## 准备 -The doc assumes a running Kubernetes cluster fulfilling the following requirements: +假设您正在运行的Kubernetes集群满足以下要求: -- Kubernetes >= 1.9. -- KubeConfig, which has access to list, create, delete pods and services, configurable via `~/.kube/config`. You can verify permissions by running `kubectl auth can-i pods`. -- Enabled Kubernetes DNS. -- `default` service account with [RBAC](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#rbac) permissions to create, delete pods. +- Kubernetes版本 >= 1.9。 +- KubeConfig,作为列出、创建、删除pods和services权限的入口,可通过`~/.kube/config`进行配置。 您可以通过运行命令:`kubectl auth can-i pods` 来验证权限。 +- 已启用 Kubernetes DNS。 +- `default`用户具有创建、删除POD的 [RBAC](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#rbac) 权限。 -If you have problems setting up a Kubernetes cluster, please take a look at [how to setup a Kubernetes cluster](https://kubernetes.io/docs/setup/). +如果您在配置Kubernetes集群时遇到问题,请参考:[如何配置Kubernetes集群](https://kubernetes.io/docs/setup/)。 -## Session Mode +## Session模式 -Flink runs on all UNIX-like environments, i.e. Linux, Mac OS X, and Cygwin (for Windows). -You can refer [overview]({{< ref "docs/connectors/pipeline-connectors/overview" >}}) to check supported versions and download [the binary release](https://flink.apache.org/downloads/) of Flink, -then extract the archive: +Flink可以在所有类UNIX环境上运行,即Linux、Mac OS X和Cygwin(适用于 Windows)。 +您可以参考 [overview]({{< ref "docs/connectors/pipeline-connectors/overview" >}})页面,查看支持的Flink版本并下载[发行包](https://flink.apache.org/downloads/),然后解压: ```bash tar -xzf flink-*.tgz ``` -You should set `FLINK_HOME` environment variables like: +设置`FLINK_HOME`环境变量: ```bash export FLINK_HOME=/path/flink-* ``` -### Start a session cluster +### 启动Session集群 -To start a session cluster on k8s, run the bash script that comes with Flink: +要在k8s上启动Session集群,请运行 Flink 附带的 bash 脚本: ```bash cd /path/flink-* ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=my-first-flink-cluster ``` -After successful startup, the return information is as follows: +成功启动集群后,返回如下信息: ``` org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124 @@ -79,33 +76,33 @@ org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note th org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster my-first-flink-cluster successfully, JobManager Web Interface: http://my-first-flink-cluster-rest.default:8081 ``` -{{< hint info >}} -please refer to [Flink documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui) to expose Flink’s Web UI and REST endpoint. -You should ensure that REST endpoint can be accessed by the node of your submission. -{{< /hint >}} -Then, you need to add these two config to your flink-conf.yaml: +{{< hint info >}} +请参考[Flink文档](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui)来暴露Flink Web UI和REST端口。 +请确保您提交作业的节点可以访问REST端口。 +{{< /hint >}} +然后,将以下两个配置添加到flink-conf.yaml中: ```yaml rest.bind-port: {{REST_PORT}} rest.address: {{NODE_IP}} ``` -{{REST_PORT}} and {{NODE_IP}} should be replaced by the actual values of your JobManager Web Interface. +{{REST_PORT}}和{{NODE_IP}}替换为JobManager Web界面的实际值。 -### Set up Flink CDC -Download the tar file of Flink CDC from [release page](https://github.com/apache/flink-cdc/releases), then extract the archive: +### 配置Flink CDC +从[发行页面](https://github.com/apache/flink-cdc/releases)下载Flink CDC的tar文件并解压: ```bash tar -xzf flink-cdc-*.tar.gz ``` -Extracted `flink-cdc` contains four directories: `bin`,`lib`,`log` and `conf`. +解压后的`flink-cdc`包含四个目录: `bin`,`lib`,`log`和`conf`。 -Download the connector jars from [release page](https://github.com/apache/flink-cdc/releases), and move it to the `lib` directory. -Download links are available only for stable releases, SNAPSHOT dependencies need to be built based on specific branch by yourself. +从[发行页面](https://github.com/apache/flink-cdc/releases)下载连接器,并移动到`lib`路径下。 +下载链接仅适用于稳定版本,SNAPSHOT依赖项需要您根据特定分支自行构建。 -### Submit a Flink CDC Job -Here is an example file for synchronizing the entire database `mysql-to-doris.yaml`: +### 提交Flink CDC作业 +以下是mysql整库同步到doris的示例配置文件:`mysql-to-doris.yaml` ```yaml ################################################################################ @@ -133,18 +130,18 @@ pipeline: ``` -You need to modify the configuration file according to your needs, refer to connectors more information. -- [MySQL pipeline connector]({{< ref "docs/connectors/pipeline-connectors/mysql.md" >}}) -- [Apache Doris pipeline connector]({{< ref "docs/connectors/pipeline-connectors/doris.md" >}}) +请参考连接器信息,按需修改配置文件。 +- [MySQL Pipeline连接器]({{< ref "docs/connectors/pipeline-connectors/mysql.md" >}}) +- [Apache Doris Pipeline连接器]({{< ref "docs/connectors/pipeline-connectors/doris.md" >}}) -Finally, submit job to Flink Standalone cluster using Cli. +最后,通过Cli将作业提交到Flink Standalone集群。 ```bash cd /path/flink-cdc-* ./bin/flink-cdc.sh mysql-to-doris.yaml ``` -After successful submission, the return information is as follows: +成功提交作业后,返回如下信息: ```bash Pipeline has been submitted to cluster. @@ -152,8 +149,149 @@ Job ID: ae30f4580f1918bebf16752d4963dc54 Job Description: Sync MySQL Database to Doris ``` -Then you can find a job named `Sync MySQL Database to Doris` running through Flink Web UI. +通过Flink Web UI,您可以找到一个名为`Sync MySQL Database to Doris`的作业正在运行。 + +## Kubernetes Operator模式 +假设您已经在K8S集群上部署[Flink Kubernetes Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/),您只需构建自定义的Flink CDC Docker镜像即可。 + +### 构建自定义Docker镜像 +1. 从[发行页面](https://github.com/apache/flink-cdc/releases)下载Flink CDC的tar文件和需要的连接器,并移动到Docker镜像构建目录。 + 假设您的Docker构建目录为`/opt/docker/flink-cdc`,此时该目录下的文件结构如下: + ```text + /opt/docker/flink-cdc + ├── flink-cdc-3.1.0-bin.tar.gz + ├── flink-cdc-pipeline-connector-doris-3.1.0.jar + ├── flink-cdc-pipeline-connector-mysql-3.1.0.jar + ├── mysql-connector-java-8.0.27.jar + └── ... + ``` +2. 创建Dockerfile文件,从官方`flink`镜像构建出自定义镜像并添加Flink CDC的依赖。 + ```shell script + FROM flink:1.18.0-java8 + ADD *.jar $FLINK_HOME/lib/ + ADD flink-cdc*.tar.gz $FLINK_HOME/ + RUN mv $FLINK_HOME/flink-cdc-3.1.0/lib/flink-cdc-dist-3.1.0.jar $FLINK_HOME/lib/ + ``` + Docker镜像构建目录最终如下: + ```text + /opt/docker/flink-cdc + ├── Dockerfile + ├── flink-cdc-3.1.0-bin.tar.gz + ├── flink-cdc-pipeline-connector-doris-3.1.0.jar + ├── flink-cdc-pipeline-connector-mysql-3.1.0.jar + ├── mysql-connector-java-8.0.27.jar + └── ... + ``` +3. 构建自定义镜像并推送至仓库 + ```bash + docker build -t flink-cdc-pipeline:3.1.0 . + + docker push flink-cdc-pipeline:3.1.0 + ``` + +### 创建ConfigMap用于挂载Flink CDC配置文件 +以下是一个示例文件,请修改其中对应的连接参数为实际值: +```yaml +--- +apiVersion: v1 +data: + flink-cdc.yaml: |- + parallelism: 4 + schema.change.behavior: EVOLVE + mysql-to-doris.yaml: |- + source: + type: mysql + hostname: localhost + port: 3306 + username: root + password: 123456 + tables: app_db.\.* + server-id: 5400-5404 + server-time-zone: UTC + + sink: + type: doris + fenodes: 127.0.0.1:8030 + username: root + password: "" + + pipeline: + name: Sync MySQL Database to Doris + parallelism: 2 +kind: ConfigMap +metadata: + name: flink-cdc-pipeline-configmap +``` +### 创建FlinkDeployment YAML文件 +以下是示例文件`flink-cdc-pipeline-job.yaml`: +```yaml +--- +apiVersion: flink.apache.org/v1beta1 +kind: FlinkDeployment +metadata: + name: flink-cdc-pipeline-job +spec: + flinkConfiguration: + classloader.resolve-order: parent-first + state.checkpoints.dir: 'file:///tmp/checkpoints' + state.savepoints.dir: 'file:///tmp/savepoints' + flinkVersion: v1_18 + image: 'flink-cdc-pipeline:3.1.0' + imagePullPolicy: Always + job: + args: + - '--use-mini-cluster' + - /opt/flink/flink-cdc-3.1.0/conf/mysql-to-doris.yaml + entryClass: org.apache.flink.cdc.cli.CliFrontend + jarURI: 'local:///opt/flink/flink-cdc-3.1.0/lib/flink-cdc-dist-3.1.0.jar' + parallelism: 1 + state: running + upgradeMode: savepoint + jobManager: + replicas: 1 + resource: + cpu: 1 + memory: 1024m + podTemplate: + apiVersion: v1 + kind: Pod + spec: + containers: + # don't modify this name + - name: flink-main-container + volumeMounts: + - mountPath: /opt/flink/flink-cdc-3.1.0/conf + name: flink-cdc-pipeline-config + volumes: + - configMap: + name: flink-cdc-pipeline-configmap + name: flink-cdc-pipeline-config + restartNonce: 0 + serviceAccount: flink + taskManager: + resource: + cpu: 1 + memory: 1024m +``` {{< hint info >}} -Please note that submitting with **native application mode** and **Flink Kubernetes operator** are not supported for now. +1. 由于Flink的类加载机制,参数`classloader.resolve-order`必须设置为`parent-first`。 +2. Flink CDC默认提交作业到远程Flink集群,在Operator模式下,您需要通过指定`--use-mini-cluster`参数在pod内部启动一个Standalone Flink集群。 +{{< /hint >}} + +### 提交Flink CDC作业 +ConfigMap和FlinkDeployment YAML文件创建完成后,即可通过kubectl提交作业到Operator: +```bash +kubectl apply -f flink-cdc-pipeline-job.yaml +``` + +成功提交作业后,返回信息如下: +```shell +flinkdeployment.flink.apache.org/flink-cdc-pipeline-job created +``` +如您需要查看日志、暴露Flink Web UI等,请参考:[Flink Kubernetes Operator文档](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)。 + + +{{< hint info >}} +请注意,目前不支持使用**native application mode**提交作业。 {{< /hint >}} \ No newline at end of file diff --git a/docs/content/docs/deployment/kubernetes.md b/docs/content/docs/deployment/kubernetes.md index 95539112b..d1928b256 100644 --- a/docs/content/docs/deployment/kubernetes.md +++ b/docs/content/docs/deployment/kubernetes.md @@ -79,10 +79,10 @@ org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note th org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster my-first-flink-cluster successfully, JobManager Web Interface: http://my-first-flink-cluster-rest.default:8081 ``` -{{< hint info >}} +{{< hint info >}} please refer to [Flink documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui) to expose Flink’s Web UI and REST endpoint. -You should ensure that REST endpoint can be accessed by the node of your submission. -{{< /hint >}} +You should ensure that REST endpoint can be accessed by the node of your submission. +{{< /hint >}} Then, you need to add these two config to your flink-conf.yaml: ```yaml @@ -152,8 +152,148 @@ Job ID: ae30f4580f1918bebf16752d4963dc54 Job Description: Sync MySQL Database to Doris ``` -Then you can find a job named `Sync MySQL Database to Doris` running through Flink Web UI. +Then you can find a job named `Sync MySQL Database to Doris` running through Flink Web UI. + +## Kubernetes Operator Mode +The doc assumes a [Flink Kubernetes Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/) has been deployed on your K8s cluster, then you only need to build a Docker image of Flink CDC. + +### Build a custom Docker image +1. Download the tar file of Flink CDC and needed connectors from [release page](https://github.com/apache/flink-cdc/releases), then move them to the docker image build directory. + Assume that your docker image build directory is `/opt/docker/flink-cdc`, The structure of this directory is as follow: + ```text + /opt/docker/flink-cdc + ├── flink-cdc-3.1.0-bin.tar.gz + ├── flink-cdc-pipeline-connector-doris-3.1.0.jar + ├── flink-cdc-pipeline-connector-mysql-3.1.0.jar + ├── mysql-connector-java-8.0.27.jar + └── ... + ``` +2. Create a Dockerfile to build a custom image from the `flink` official image and add Flink CDC dependencies. + ```shell script + FROM flink:1.18.0-java8 + ADD *.jar $FLINK_HOME/lib/ + ADD flink-cdc*.tar.gz $FLINK_HOME/ + RUN mv $FLINK_HOME/flink-cdc-3.1.0/lib/flink-cdc-dist-3.1.0.jar $FLINK_HOME/lib/ + ``` + Finally, The structure is as follow: + ```text + /opt/docker/flink-cdc + ├── Dockerfile + ├── flink-cdc-3.1.0-bin.tar.gz + ├── flink-cdc-pipeline-connector-doris-3.1.0.jar + ├── flink-cdc-pipeline-connector-mysql-3.1.0.jar + ├── mysql-connector-java-8.0.27.jar + └── ... + ``` +3. Build the custom Docker image then push. + ```bash + docker build -t flink-cdc-pipeline:3.1.0 . + + docker push flink-cdc-pipeline:3.1.0 + ``` + +### Create a ConfigMap for mounting Flink CDC configuration files +Here is an example file, please change the connection parameters into your actual values: +```yaml +--- +apiVersion: v1 +data: + flink-cdc.yaml: |- + parallelism: 4 + schema.change.behavior: EVOLVE + mysql-to-doris.yaml: |- + source: + type: mysql + hostname: localhost + port: 3306 + username: root + password: 123456 + tables: app_db.\.* + server-id: 5400-5404 + server-time-zone: UTC + + sink: + type: doris + fenodes: 127.0.0.1:8030 + username: root + password: "" + + pipeline: + name: Sync MySQL Database to Doris + parallelism: 2 +kind: ConfigMap +metadata: + name: flink-cdc-pipeline-configmap +``` + +### Create a FlinkDeployment YAML +Here is an example file `flink-cdc-pipeline-job.yaml`: +```yaml +--- +apiVersion: flink.apache.org/v1beta1 +kind: FlinkDeployment +metadata: + name: flink-cdc-pipeline-job +spec: + flinkConfiguration: + classloader.resolve-order: parent-first + state.checkpoints.dir: 'file:///tmp/checkpoints' + state.savepoints.dir: 'file:///tmp/savepoints' + flinkVersion: v1_18 + image: 'flink-cdc-pipeline:3.1.0' + imagePullPolicy: Always + job: + args: + - '--use-mini-cluster' + - /opt/flink/flink-cdc-3.1.0/conf/mysql-to-doris.yaml + entryClass: org.apache.flink.cdc.cli.CliFrontend + jarURI: 'local:///opt/flink/lib/flink-cdc-dist-3.1.0.jar' + parallelism: 1 + state: running + upgradeMode: savepoint + jobManager: + replicas: 1 + resource: + cpu: 1 + memory: 1024m + podTemplate: + apiVersion: v1 + kind: Pod + spec: + containers: + # don't modify this name + - name: flink-main-container + volumeMounts: + - mountPath: /opt/flink/flink-cdc-3.1.0/conf + name: flink-cdc-pipeline-config + volumes: + - configMap: + name: flink-cdc-pipeline-configmap + name: flink-cdc-pipeline-config + restartNonce: 0 + serviceAccount: flink + taskManager: + resource: + cpu: 1 + memory: 1024m +``` +{{< hint info >}} +1. Due to Flink's class loader, the parameter of `classloader.resolve-order` must be `parent-first`. +2. Flink CDC submits a job to a remote Flink cluster by default, you should start a Standalone Flink cluster in the pod by `--use-mini-cluster` in Operator mode. +{{< /hint >}} + +### Submit a Flink CDC Job +After the ConfigMap and FlinkDeployment YAML are created, you can submit the Flink CDC job to the Operator through kubectl like: +```bash +kubectl apply -f flink-cdc-pipeline-job.yaml +``` + +After successful submission, the return information is as follows: +```shell +flinkdeployment.flink.apache.org/flink-cdc-pipeline-job created +``` +If you want to trace the logs or expose the Flink Web UI, please refer to: [Flink Kubernetes Operator documentation](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)。 -{{< hint info >}} -Please note that submitting with **native application mode** and **Flink Kubernetes operator** are not supported for now. +{{< hint info >}} +Please note that submitting with **native application mode** is not supported for now. {{< /hint >}} \ No newline at end of file