[FLINK-35544][docs][deploy] Add deployment documentations for Kubernetes Operator

This closes #3392.
pull/3687/head
North Lin 3 months ago committed by GitHub
parent aa4f15a636
commit c42b925251
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -24,53 +24,50 @@ specific language governing permissions and limitations
under the License.
-->
# Introduction
# 简介
Kubernetes is a popular container-orchestration system for automating computer application deployment, scaling, and management.
Flink's native Kubernetes integration allows you to directly deploy Flink on a running Kubernetes cluster.
Moreover, Flink is able to dynamically allocate and de-allocate TaskManagers depending on the required resources because it can directly talk to Kubernetes.
KubernetesK8s是一种流行的容器编排系统用于自动化部署、扩展和管理应用程序。Flink的原生Kubernetes集成允许您直接在正在运行的 Kubernetes 集群上部署 Flink。此外Flink 能够根据所需资源动态分配和取消分配TaskManager因为它可以直接与Kubernetes通信。
Apache Flink also provides a Kubernetes operator for managing Flink clusters on Kubernetes. It supports both standalone and native deployment mode and greatly simplifies deployment, configuration and the life cycle management of Flink resources on Kubernetes.
Apache Flink还提供了Kubernetes Operator用于管理Kubernetes上的Flink集群。它支持独立部署和原生部署模式极大简化了Flink在Kubernetes上的部署、配置和生命周期管理。
For more information, please refer to the [Flink Kubernetes Operator documentation](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/).
更多信息请参考:[Flink Kubernetes Operator文档](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)。
## Preparation
## 准备
The doc assumes a running Kubernetes cluster fulfilling the following requirements:
假设您正在运行的Kubernetes集群满足以下要求
- Kubernetes >= 1.9.
- KubeConfig, which has access to list, create, delete pods and services, configurable via `~/.kube/config`. You can verify permissions by running `kubectl auth can-i <list|create|edit|delete> pods`.
- Enabled Kubernetes DNS.
- `default` service account with [RBAC](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#rbac) permissions to create, delete pods.
- Kubernetes版本 >= 1.9。
- KubeConfig作为列出、创建、删除pods和services权限的入口可通过`~/.kube/config`进行配置。 您可以通过运行命令:`kubectl auth can-i <list|create|edit|delete> pods` 来验证权限。
- 已启用 Kubernetes DNS。
- `default`用户具有创建、删除POD的 [RBAC](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#rbac) 权限。
If you have problems setting up a Kubernetes cluster, please take a look at [how to setup a Kubernetes cluster](https://kubernetes.io/docs/setup/).
如果您在配置Kubernetes集群时遇到问题请参考[如何配置Kubernetes集群](https://kubernetes.io/docs/setup/)。
## Session Mode
## Session模式
Flink runs on all UNIX-like environments, i.e. Linux, Mac OS X, and Cygwin (for Windows).
You can refer [overview]({{< ref "docs/connectors/pipeline-connectors/overview" >}}) to check supported versions and download [the binary release](https://flink.apache.org/downloads/) of Flink,
then extract the archive:
Flink可以在所有类UNIX环境上运行即Linux、Mac OS X和Cygwin适用于 Windows
您可以参考 [overview]({{< ref "docs/connectors/pipeline-connectors/overview" >}})页面查看支持的Flink版本并下载[发行包](https://flink.apache.org/downloads/),然后解压:
```bash
tar -xzf flink-*.tgz
```
You should set `FLINK_HOME` environment variables like:
设置`FLINK_HOME`环境变量:
```bash
export FLINK_HOME=/path/flink-*
```
### Start a session cluster
### 启动Session集群
To start a session cluster on k8s, run the bash script that comes with Flink:
要在k8s上启动Session集群请运行 Flink 附带的 bash 脚本:
```bash
cd /path/flink-*
./bin/kubernetes-session.sh -Dkubernetes.cluster-id=my-first-flink-cluster
```
After successful startup, the return information is as follows
成功启动集群后,返回如下信息
```
org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124
@ -79,33 +76,33 @@ org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note th
org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster my-first-flink-cluster successfully, JobManager Web Interface: http://my-first-flink-cluster-rest.default:8081
```
{{< hint info >}}
please refer to [Flink documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui) to expose Flinks Web UI and REST endpoint.
You should ensure that REST endpoint can be accessed by the node of your submission.
{{< /hint >}}
Then, you need to add these two config to your flink-conf.yaml:
{{< hint info >}}
请参考[Flink文档](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui)来暴露Flink Web UI和REST端口。
请确保您提交作业的节点可以访问REST端口。
{{< /hint >}}
然后将以下两个配置添加到flink-conf.yaml中
```yaml
rest.bind-port: {{REST_PORT}}
rest.address: {{NODE_IP}}
```
{{REST_PORT}} and {{NODE_IP}} should be replaced by the actual values of your JobManager Web Interface.
{{REST_PORT}}和{{NODE_IP}}替换为JobManager Web界面的实际值。
### Set up Flink CDC
Download the tar file of Flink CDC from [release page](https://github.com/apache/flink-cdc/releases), then extract the archive:
### 配置Flink CDC
从[发行页面](https://github.com/apache/flink-cdc/releases)下载Flink CDC的tar文件并解压
```bash
tar -xzf flink-cdc-*.tar.gz
```
Extracted `flink-cdc` contains four directories: `bin`,`lib`,`log` and `conf`.
解压后的`flink-cdc`包含四个目录: `bin``lib``log`和`conf`。
Download the connector jars from [release page](https://github.com/apache/flink-cdc/releases), and move it to the `lib` directory.
Download links are available only for stable releases, SNAPSHOT dependencies need to be built based on specific branch by yourself.
从[发行页面](https://github.com/apache/flink-cdc/releases)下载连接器,并移动到`lib`路径下。
下载链接仅适用于稳定版本SNAPSHOT依赖项需要您根据特定分支自行构建。
### Submit a Flink CDC Job
Here is an example file for synchronizing the entire database `mysql-to-doris.yaml`
### 提交Flink CDC作业
以下是mysql整库同步到doris的示例配置文件`mysql-to-doris.yaml`
```yaml
################################################################################
@ -133,18 +130,18 @@ pipeline:
```
You need to modify the configuration file according to your needs, refer to connectors more information.
- [MySQL pipeline connector]({{< ref "docs/connectors/pipeline-connectors/mysql.md" >}})
- [Apache Doris pipeline connector]({{< ref "docs/connectors/pipeline-connectors/doris.md" >}})
请参考连接器信息,按需修改配置文件。
- [MySQL Pipeline连接器]({{< ref "docs/connectors/pipeline-connectors/mysql.md" >}})
- [Apache Doris Pipeline连接器]({{< ref "docs/connectors/pipeline-connectors/doris.md" >}})
Finally, submit job to Flink Standalone cluster using Cli.
最后通过Cli将作业提交到Flink Standalone集群。
```bash
cd /path/flink-cdc-*
./bin/flink-cdc.sh mysql-to-doris.yaml
```
After successful submission, the return information is as follows
成功提交作业后,返回如下信息
```bash
Pipeline has been submitted to cluster.
@ -152,8 +149,149 @@ Job ID: ae30f4580f1918bebf16752d4963dc54
Job Description: Sync MySQL Database to Doris
```
Then you can find a job named `Sync MySQL Database to Doris` running through Flink Web UI.
通过Flink Web UI您可以找到一个名为`Sync MySQL Database to Doris`的作业正在运行。
## Kubernetes Operator模式
假设您已经在K8S集群上部署[Flink Kubernetes Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)您只需构建自定义的Flink CDC Docker镜像即可。
### 构建自定义Docker镜像
1. 从[发行页面](https://github.com/apache/flink-cdc/releases)下载Flink CDC的tar文件和需要的连接器并移动到Docker镜像构建目录。
假设您的Docker构建目录为`/opt/docker/flink-cdc`,此时该目录下的文件结构如下:
```text
/opt/docker/flink-cdc
├── flink-cdc-3.1.0-bin.tar.gz
├── flink-cdc-pipeline-connector-doris-3.1.0.jar
├── flink-cdc-pipeline-connector-mysql-3.1.0.jar
├── mysql-connector-java-8.0.27.jar
└── ...
```
2. 创建Dockerfile文件从官方`flink`镜像构建出自定义镜像并添加Flink CDC的依赖。
```shell script
FROM flink:1.18.0-java8
ADD *.jar $FLINK_HOME/lib/
ADD flink-cdc*.tar.gz $FLINK_HOME/
RUN mv $FLINK_HOME/flink-cdc-3.1.0/lib/flink-cdc-dist-3.1.0.jar $FLINK_HOME/lib/
```
Docker镜像构建目录最终如下
```text
/opt/docker/flink-cdc
├── Dockerfile
├── flink-cdc-3.1.0-bin.tar.gz
├── flink-cdc-pipeline-connector-doris-3.1.0.jar
├── flink-cdc-pipeline-connector-mysql-3.1.0.jar
├── mysql-connector-java-8.0.27.jar
└── ...
```
3. 构建自定义镜像并推送至仓库
```bash
docker build -t flink-cdc-pipeline:3.1.0 .
docker push flink-cdc-pipeline:3.1.0
```
### 创建ConfigMap用于挂载Flink CDC配置文件
以下是一个示例文件,请修改其中对应的连接参数为实际值:
```yaml
---
apiVersion: v1
data:
flink-cdc.yaml: |-
parallelism: 4
schema.change.behavior: EVOLVE
mysql-to-doris.yaml: |-
source:
type: mysql
hostname: localhost
port: 3306
username: root
password: 123456
tables: app_db.\.*
server-id: 5400-5404
server-time-zone: UTC
sink:
type: doris
fenodes: 127.0.0.1:8030
username: root
password: ""
pipeline:
name: Sync MySQL Database to Doris
parallelism: 2
kind: ConfigMap
metadata:
name: flink-cdc-pipeline-configmap
```
### 创建FlinkDeployment YAML文件
以下是示例文件`flink-cdc-pipeline-job.yaml`
```yaml
---
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-cdc-pipeline-job
spec:
flinkConfiguration:
classloader.resolve-order: parent-first
state.checkpoints.dir: 'file:///tmp/checkpoints'
state.savepoints.dir: 'file:///tmp/savepoints'
flinkVersion: v1_18
image: 'flink-cdc-pipeline:3.1.0'
imagePullPolicy: Always
job:
args:
- '--use-mini-cluster'
- /opt/flink/flink-cdc-3.1.0/conf/mysql-to-doris.yaml
entryClass: org.apache.flink.cdc.cli.CliFrontend
jarURI: 'local:///opt/flink/flink-cdc-3.1.0/lib/flink-cdc-dist-3.1.0.jar'
parallelism: 1
state: running
upgradeMode: savepoint
jobManager:
replicas: 1
resource:
cpu: 1
memory: 1024m
podTemplate:
apiVersion: v1
kind: Pod
spec:
containers:
# don't modify this name
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/flink-cdc-3.1.0/conf
name: flink-cdc-pipeline-config
volumes:
- configMap:
name: flink-cdc-pipeline-configmap
name: flink-cdc-pipeline-config
restartNonce: 0
serviceAccount: flink
taskManager:
resource:
cpu: 1
memory: 1024m
```
{{< hint info >}}
Please note that submitting with **native application mode** and **Flink Kubernetes operator** are not supported for now.
1. 由于Flink的类加载机制参数`classloader.resolve-order`必须设置为`parent-first`。
2. Flink CDC默认提交作业到远程Flink集群在Operator模式下您需要通过指定`--use-mini-cluster`参数在pod内部启动一个Standalone Flink集群。
{{< /hint >}}
### 提交Flink CDC作业
ConfigMap和FlinkDeployment YAML文件创建完成后即可通过kubectl提交作业到Operator
```bash
kubectl apply -f flink-cdc-pipeline-job.yaml
```
成功提交作业后,返回信息如下:
```shell
flinkdeployment.flink.apache.org/flink-cdc-pipeline-job created
```
如您需要查看日志、暴露Flink Web UI等请参考[Flink Kubernetes Operator文档](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)。
{{< hint info >}}
请注意,目前不支持使用**native application mode**提交作业。
{{< /hint >}}

@ -79,10 +79,10 @@ org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note th
org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster my-first-flink-cluster successfully, JobManager Web Interface: http://my-first-flink-cluster-rest.default:8081
```
{{< hint info >}}
{{< hint info >}}
please refer to [Flink documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui) to expose Flinks Web UI and REST endpoint.
You should ensure that REST endpoint can be accessed by the node of your submission.
{{< /hint >}}
You should ensure that REST endpoint can be accessed by the node of your submission.
{{< /hint >}}
Then, you need to add these two config to your flink-conf.yaml:
```yaml
@ -152,8 +152,148 @@ Job ID: ae30f4580f1918bebf16752d4963dc54
Job Description: Sync MySQL Database to Doris
```
Then you can find a job named `Sync MySQL Database to Doris` running through Flink Web UI.
Then you can find a job named `Sync MySQL Database to Doris` running through Flink Web UI.
## Kubernetes Operator Mode
The doc assumes a [Flink Kubernetes Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/) has been deployed on your K8s cluster, then you only need to build a Docker image of Flink CDC.
### Build a custom Docker image
1. Download the tar file of Flink CDC and needed connectors from [release page](https://github.com/apache/flink-cdc/releases), then move them to the docker image build directory.
Assume that your docker image build directory is `/opt/docker/flink-cdc`, The structure of this directory is as follow
```text
/opt/docker/flink-cdc
├── flink-cdc-3.1.0-bin.tar.gz
├── flink-cdc-pipeline-connector-doris-3.1.0.jar
├── flink-cdc-pipeline-connector-mysql-3.1.0.jar
├── mysql-connector-java-8.0.27.jar
└── ...
```
2. Create a Dockerfile to build a custom image from the `flink` official image and add Flink CDC dependencies.
```shell script
FROM flink:1.18.0-java8
ADD *.jar $FLINK_HOME/lib/
ADD flink-cdc*.tar.gz $FLINK_HOME/
RUN mv $FLINK_HOME/flink-cdc-3.1.0/lib/flink-cdc-dist-3.1.0.jar $FLINK_HOME/lib/
```
Finally, The structure is as follow
```text
/opt/docker/flink-cdc
├── Dockerfile
├── flink-cdc-3.1.0-bin.tar.gz
├── flink-cdc-pipeline-connector-doris-3.1.0.jar
├── flink-cdc-pipeline-connector-mysql-3.1.0.jar
├── mysql-connector-java-8.0.27.jar
└── ...
```
3. Build the custom Docker image then push.
```bash
docker build -t flink-cdc-pipeline:3.1.0 .
docker push flink-cdc-pipeline:3.1.0
```
### Create a ConfigMap for mounting Flink CDC configuration files
Here is an example file, please change the connection parameters into your actual values:
```yaml
---
apiVersion: v1
data:
flink-cdc.yaml: |-
parallelism: 4
schema.change.behavior: EVOLVE
mysql-to-doris.yaml: |-
source:
type: mysql
hostname: localhost
port: 3306
username: root
password: 123456
tables: app_db.\.*
server-id: 5400-5404
server-time-zone: UTC
sink:
type: doris
fenodes: 127.0.0.1:8030
username: root
password: ""
pipeline:
name: Sync MySQL Database to Doris
parallelism: 2
kind: ConfigMap
metadata:
name: flink-cdc-pipeline-configmap
```
### Create a FlinkDeployment YAML
Here is an example file `flink-cdc-pipeline-job.yaml`
```yaml
---
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-cdc-pipeline-job
spec:
flinkConfiguration:
classloader.resolve-order: parent-first
state.checkpoints.dir: 'file:///tmp/checkpoints'
state.savepoints.dir: 'file:///tmp/savepoints'
flinkVersion: v1_18
image: 'flink-cdc-pipeline:3.1.0'
imagePullPolicy: Always
job:
args:
- '--use-mini-cluster'
- /opt/flink/flink-cdc-3.1.0/conf/mysql-to-doris.yaml
entryClass: org.apache.flink.cdc.cli.CliFrontend
jarURI: 'local:///opt/flink/lib/flink-cdc-dist-3.1.0.jar'
parallelism: 1
state: running
upgradeMode: savepoint
jobManager:
replicas: 1
resource:
cpu: 1
memory: 1024m
podTemplate:
apiVersion: v1
kind: Pod
spec:
containers:
# don't modify this name
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/flink-cdc-3.1.0/conf
name: flink-cdc-pipeline-config
volumes:
- configMap:
name: flink-cdc-pipeline-configmap
name: flink-cdc-pipeline-config
restartNonce: 0
serviceAccount: flink
taskManager:
resource:
cpu: 1
memory: 1024m
```
{{< hint info >}}
1. Due to Flink's class loader, the parameter of `classloader.resolve-order` must be `parent-first`.
2. Flink CDC submits a job to a remote Flink cluster by default, you should start a Standalone Flink cluster in the pod by `--use-mini-cluster` in Operator mode.
{{< /hint >}}
### Submit a Flink CDC Job
After the ConfigMap and FlinkDeployment YAML are created, you can submit the Flink CDC job to the Operator through kubectl like
```bash
kubectl apply -f flink-cdc-pipeline-job.yaml
```
After successful submission, the return information is as follows
```shell
flinkdeployment.flink.apache.org/flink-cdc-pipeline-job created
```
If you want to trace the logs or expose the Flink Web UI, please refer to: [Flink Kubernetes Operator documentation](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)。
{{< hint info >}}
Please note that submitting with **native application mode** and **Flink Kubernetes operator** are not supported for now.
{{< hint info >}}
Please note that submitting with **native application mode** is not supported for now.
{{< /hint >}}
Loading…
Cancel
Save