ezra-sullivan
发布于 2025-07-06 / 1 阅读
0
0

02 - 使用 kubespray 2.28 安装 Kubernetes 1.32

更新时间:2025 年 4 月

Kubernetes 版本:1.32.5

Kubespray 版本:2.28.0

containerd 版本:

简介

kubespray 官方文档:Readme (kubespray.io)

kubespray 的 github 地址:kubernetes-sigs/kubespray: Deploy a Production Ready Kubernetes Cluster (github.com)

kubespray 当前支持的操作系统版本

注:

  • 基于 Upstart 和 SysV init 的操作系统版本暂不支持
  • Kernel requirements (Linux 内核版本最好 >= 4.19)

版本选择

kubespray 版本与 kubenetes 版本及相关组件对应关系: Releases · kubernetes-sigs/kubespray (github.com)

Kubernetes 的 github 地址:Releases · kubernetes/kubernetes (github.com)

例如:选择 kubespray v2.27.0

对应默认 Kubernetes 版本为 Kubernetes 1.32.2

最新主线支持的组件版本

安装要求

软件需求

  • Kubernetes 版本 v1.30+
  • 运行 Ansible 命令的计算机上需要安装 Ansible v2.14+、 Jinja 2.11+ 和 python-netaddr
  • 目标服务器需要能够上网,以下载对应的 docker 镜像。否则需要事先下载好镜像导入私网仓库,并进行额外配置
  • 服务器需要配置允许 IPv4 转发
  • 如果使用到了 IPv6,则需要配置允许 IPv6 转发
  • 运行 Ansible 命令主机的 SSH 密钥必须复制到部署集群的所有服务器中
  • 防火墙设置适当的规则策略(配置放行端口),或者直接关闭防火墙
  • 如果从非 root 用户帐户运行 kubespray,则应在目标服务器中配置正确的特权升级方法 并指定 ansible_become 标志或命令参数 --become-b

硬件需求

  • Master(控制面节点)
    • 内存 2 GB 以上
  • Node(工作节点)
    • 内存 1 GB 以上

主机列表

操作系统均为 AlmaLinux release 9.6

主机IP配置安装服务角色
admin.kubespray.local192.168.111.190
(需联网下载资源)
2C2Gnginx
docker
kubespray(自带 ansible)
ansible 控制主机
局域网 DNS 服务(coredns)
harbor 私有仓库
http 文件下载服务
DNF/YUM 仓库
kubespray 运行主机
kube-cp-01.kubespray.local192.168.111.1912C4Getcdk8s 控制面节点-1
kube-node-01.kubespray.local192.168.111.1922C2Gk8s 工作节点-1
kube-node-02.kubespray.local192.168.111.1932C2Gk8s 工作节点-2
kube-node-03.kubespray.local192.168.111.1942C2Gk8s 工作节点-3

其中 admin 主机作为管理主机,用于运行 ansible 命令,并且提供一些下载功能


Admin 主机准备

仅在 admin 主机上操作

环境准备

时间设置

注:后面同 k8s 主机一起批量操作

内核参数设置

注:后面同 k8s 主机一起批量操作

防火墙设置

SELinux 设置

$ 
setenforce 0 

sed -ri 's@(^SELINUX)=.*@\1=disabled@g' /etc/selinux/config && sed -ri 's@(^SELINUX)=.*@\1=disabled@g' /etc/sysconfig/selinux

firewalld 设置

放开一些防火墙端口

# 持久化当前配置
$ firewall-cmd --runtime-to-permanent

# 资源下载端口
$ firewall-cmd --permanent --zone=public --add-port=8000/tcp

# harbor 端口
$ firewall-cmd --permanent --zone=public --add-service=https

# 加载配置到运行时
$ firewall-cmd --reload

也可以用 ipset 的方法

dnf -y install ipset


# 创建区域
firewall-cmd --permanent --new-zone=kubernetes-local
firewall-cmd --permanent --zone=kubernetes-local --set-target=ACCEPT


# 创建 ipset
firewall-cmd --permanent --new-ipset=kubernetes-local-ips --type=hash:net

# 往 ipset 添加 IP
firewall-cmd --permanent --ipset=kubernetes-local-ips --add-entry=192.168.111.191/32
firewall-cmd --permanent --ipset=kubernetes-local-ips --add-entry=192.168.111.192/32
firewall-cmd --permanent --ipset=kubernetes-local-ips --add-entry=192.168.111.193/32
firewall-cmd --permanent --ipset=kubernetes-local-ips --add-entry=192.168.111.194/32


# 区域设置 ipset
firewall-cmd --permanent --zone=kubernetes-local --add-source=ipset:kubernetes-local-ips

# 重新加载配置
firewall-cmd --reload



  
# POD 网络 CIDR 和 Service 网络 CIDR
firewall-cmd --permanent --ipset=kubernetes-local-ips --add-entry=10.100.0.0/16  
firewall-cmd --permanent --ipset=kubernetes-local-ips --add-entry=10.200.0.0/16
firewall-cmd --reload



firewall-cmd --info-ipset=kubernetes-local-ips

常用工具安装

$ dnf -y install vim wget tree

安装 Docker

参考文档:Install Docker Engine on CentOS | Docker Documentation

下载仓库文件(RHEL 系)

dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

替换仓库为国内源

官方源在国内访问速度较慢,可以替换为国内源(此处列举『清华源』和『阿里源』,选一即可)

# 替换软件仓库为 TUNA
$ sed -i 's+download.docker.com+mirrors.tuna.tsinghua.edu.cn/docker-ce+' /etc/yum.repos.d/docker-ce.repo

# 替换软件仓库为 ali
$ sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo

更新缓存

$ dnf makecache

安装

# 查看版本
$ dnf list docker-ce.x86_64 --showduplicates | sort -r
docker-ce.x86_64                3:28.1.1-1.el9                  docker-ce-stable
docker-ce.x86_64                3:28.1.0-1.el9                  docker-ce-stable
docker-ce.x86_64                3:28.0.4-1.el9                  docker-ce-stable
docker-ce.x86_64                3:28.0.3-1.el9                  docker-ce-stable
docker-ce.x86_64                3:28.0.2-1.el9                  docker-ce-stable
docker-ce.x86_64                3:28.0.1-1.el9                  docker-ce-stable
docker-ce.x86_64                3:28.0.0-1.el9                  docker-ce-stable
docker-ce.x86_64                3:27.5.1-1.el9                  docker-ce-stable
.......

# 安装最新稳定版本
dnf -y install docker-ce

# 安装指定版本 dnf -y install docker-ce-[VERSION].[架构]
$ dnf -y install docker-ce-25.0.5-1.el9.x86_64

配置(可选)

# 创建数据目录
$ mkdir -p /data/docker

# 创建配置目录
$ mkdir -p /etc/docker/


# 注:192.168.111.1:10811 是外网的代理,没有可以不配置 proxies 段
$ cat > /etc/docker/daemon.json << EOF
{
    "bip": "172.17.0.1/16",
    "data-root": "/data/docker",
    "dns": [
        "114.114.114.114",
        "119.29.29.29"
    ],
    "registry-mirrors": [
        "https://docker.m.daocloud.io",
        "https://dockerproxy.net",
        "https://docker.aiden-work.tech/"
    ],
    "proxies": {
        "http-proxy": "socks5://192.168.111.1:10811",
        "https-proxy": "socks5://192.168.111.1:10811",
        "no-proxy": "127.0.0.1,localhost,admin.kubespray.local"
    },
    "log-driver": "json-file",
    "log-opts": {
        "max-size": "20m",
        "max-file": "5"
    },
    "storage-driver": "overlay2",
    "storage-opt": [ "overlay2.size=200G"],
    "live-restore": false
}
EOF

启动

# 启动并设置开机启动
$ systemctl enable --now docker

部署 CoreDNS

部署 CoreDNS 作为局域网 DNS

部署

基础变量设置

COREDNS_VERSION=1.12.1

COREDNS_HOME=/opt/coredns

COREDNS_CONF_HOME=/etc/coredns

COREDNS_PID_FILE=/var/run/coredns.pid



mkdir -p ${COREDNS_HOME}/bin
mkdir -p ${COREDNS_CONF_HOME}

下载解压

下载地址:Releases · coredns/coredns

curl -x http://192.168.111.1:10811 \
     -o /usr/local/src/coredns_${COREDNS_VERSION}_linux_amd64.tgz \
     -L https://github.com/coredns/coredns/releases/download/v${COREDNS_VERSION}/coredns_${COREDNS_VERSION}_linux_amd64.tgz
     
     
tar \
    -zxvf /usr/local/src/coredns_${COREDNS_VERSION}_linux_amd64.tgz  \
    -C ${COREDNS_HOME}/bin

配置环境变量

cat > /etc/profile.d/coredns.sh << EOF
export COREDNS_HOME=${COREDNS_HOME}
export COREDNS_CONF_HOME=${COREDNS_CONF_HOME}
export PATH=\${PATH}:\${COREDNS_HOME}/bin
EOF

source /etc/profile

配置

配置参考:CoreDNS: DNS and Service Discovery

用户使用 Corefile 来配置 CoreDNS。当 CoreDNS 启动时,如果未给出 -conf 标志,它将在当前目录中查找名为 Corefile 的文件。该文件由一个或多个服务器块组成。每个服务器块列出一个或多个插件。这些插件可以通过指令进一步配置

注意:Corefile 中插件的顺序并不决定插件链的执行顺序。插件的执行顺序由 plugin.cfg 中的顺序决定

其余特点

  • Corefile 中的注释以 # 开头
  • 可以在 Corefile 的任何地方使用环境变量替换。语法是 {$ENV_VAR}
  • 可以使用 import 插件导入其他文件

注:本文主要使用到 hosts 插件,详情参考:hosts

配置:引用外部文件

创建 hosts 文件

cat > /etc/coredns/hosts.kubespray.local << EOF
192.168.111.190 admin.kubespray.local
192.168.111.191 kube-cp-01.kubespray.local
192.168.111.192 kube-node-01.kubespray.local
192.168.111.193 kube-node-02.kubespray.local
192.168.111.194 kube-node-03.kubespray.local
EOF

主配置

cat > ${COREDNS_CONF_HOME}/Corefile << EOF


# 内部局域网解析:只处理 *.local 域名
local:53 {
    # 使用 hosts 插件读取内部解析记录
    hosts /etc/coredns/hosts.kubespray.local {
       fallthrough
    }
    # 缓存 DNS 查询结果
    cache 300
    
    # 日志配置
    log
    errors
    
    # 如果有多条记录,轮训响应
    loadbalance
}

# 全局解析:转发其他所有查询
.:53 {
    forward . 8.8.8.8 114.114.114.114 {
        max_concurrent 1000
    }
    log
    errors
    cache 600
}
EOF

启动

测试启动

coredns -conf ${COREDNS_CONF_HOME}/Corefile

加入 systemd 管理

$ cat > /usr/lib/systemd/system/coredns.service << EOF

[Unit]
Description=CoreDNS DNS Server
After=network.target

[Service]
Type=simple
ExecStart=${COREDNS_HOME}/bin/coredns -conf ${COREDNS_CONF_HOME}/Corefile -pidfile ${COREDNS_PID_FILE}
PIDFile=${COREDNS_PID_FILE}  # 与 -pidfile 路径一致
ExecStopPost=/bin/rm -f ${COREDNS_PID_FILE}  # 服务停止后删除 PID 文件
Restart=on-failure
User=root

[Install]
WantedBy=multi-user.target

EOF

启动并设置开机启动

systemctl daemon-reload

systemctl enable --now coredns

配置局域网 DNS

注:局域网内主机配置

nmcli c modify ens160 ipv4.dns 192.168.111.190


nmcli con up ens160

部署 HTTP 服务

参考:http://nginx.org/en/linux_packages.html#RHEL-CentOS

Nginx 主要为私网主机提供部分安装资源的下载

安装 Nginx

配置 nginx 的 yum 仓库

# 此处的配置默认使用 stable 版本,根据需要安装的版本修改 enabled
$ cat << EOF > /etc/yum.repos.d/nginx.repo

[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/\$releasever/\$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

[nginx-mainline]
name=nginx mainline repo
baseurl=http://nginx.org/packages/mainline/centos/\$releasever/\$basearch/
gpgcheck=1
enabled=0
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

EOF

安装稳定版本

# 查看版本
$ dnf list nginx.x86_64 --showduplicates | sort -r
nginx.x86_64               2:1.28.0-1.el9.ngx                       nginx-stable
nginx.x86_64               2:1.26.3-1.el9.ngx                       nginx-stable
......

# 默认配置即为稳定版本,直接安装即可
$ dnf install nginx -y


# 或者:安装指定版本
$ dnf -y install nginx-1.26.3-1.el9.ngx.x86_64

# 查看版本信息
$ nginx -v
nginx version: nginx/1.28.0


# 启动 nginx,并设置开机启动
$ systemctl enable --now nginx

配置 Nginx

配置 Nginx 共享本地目录(如果有其他监听配置记得注释,防止端口冲突)

$ mkdir -p /data/files


$ cat > /etc/nginx/conf.d/default.conf << EOF

server {
    listen       8000;



    location ^~ /files {
        root   /data/;
        autoindex on;
        autoindex_exact_size on;
        autoindex_localtime on;
    }
    
    location ^~ /repos {
        root   /data/;
        autoindex on;
        autoindex_exact_size on;
        autoindex_localtime on;
    }
}

EOF

加载配置

$ systemctl restart nginx

# 或
$ nginx -s reload

测试访问是否正常

$ curl -L http://admin.kubespray.local:8000/files

部署私有镜像仓库(harbor)

证书准备

harbor 证书相关配置参考:Harbor docs | Configure HTTPS Access to Harbor (goharbor.io)

创建 CA

创建 CA 私钥

mkdir -p /etc/pki/tls/


openssl genrsa -out /etc/pki/tls/ca.key 4096

创建 CA 证书

openssl req -x509 -new -nodes -sha512 -days 3650 \
  -subj "/C=CN/ST=Shanghai/L=Shanghai/O=KMUST/OU=Personal/CN=kubespray.local" \
  -key /etc/pki/tls/ca.key \
  -out /etc/pki/tls/ca.crt
创建 Harbor 密钥证书

创建私钥

mkdir -p /etc/harbor/certs

openssl genrsa -out /etc/harbor/certs/admin.kubespray.local.key 4096

创建证书请求(CSR)

openssl req -sha512 -new \
    -subj "/C=CN/ST=Shanghai/L=Shanghai/O=KMUST/OU=Personal/CN=admin.kubespray.local" \
    -key /etc/harbor/certs/admin.kubespray.local.key \
    -out /etc/harbor/certs/admin.kubespray.local.csr

创建 x509 v3 扩展文件

cat > /etc/harbor/certs/v3.ext <<-EOF

authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

[alt_names]
DNS.1=admin.kubespray.local
IP.1=192.168.111.190
#IP.2=172.17.0.1

EOF

基于扩展文件验证证书

openssl x509 -req -sha512 -days 3650 \
    -extfile /etc/harbor/certs/v3.ext \
    -CA /etc/pki/tls/ca.crt \
    -CAkey /etc/pki/tls/ca.key \
    -CAcreateserial \
    -in /etc/harbor/certs/admin.kubespray.local.csr \
    -out /etc/harbor/certs/admin.kubespray.local.crt

转换证书后缀

# docker daemon 会把 .crt 结尾的证书认为是 CA 的证书,把 .cert 结尾的证书认为是客户端证书
openssl x509 -inform PEM \
    -in /etc/harbor/certs/admin.kubespray.local.crt \
    -out /etc/harbor/certs/admin.kubespray.local.cert

查看生成的证书

$ ls -l /etc/harbor/certs/
total 20
-rw-r--r--. 1 root root 2159 Mar 12 16:29 admin.kubespray.local.cert
-rw-r--r--. 1 root root 2159 Mar 12 16:23 admin.kubespray.local.crt
-rw-r--r--. 1 root root 1716 Mar 12 16:22 admin.kubespray.local.csr
-rw-------. 1 root root 3272 Mar 12 16:22 admin.kubespray.local.key
-rw-r--r--. 1 root root  280 Mar 12 16:22 v3.ext

复制到对应位置

mkdir -p /data/certs/harbor


cp /etc/harbor/certs/admin.kubespray.local.{key,crt} /data/certs/harbor

安装与检查

安装文档参考:Harbor docs | Harbor Installation and Configuration (goharbor.io)

注:kubespray 提供了 kubespray-{version}/contrib/offline/manage-offline-container-images.sh 脚本可以在互联网下载镜像,构建本地 Registry,并上传镜像。但 Registry 不好管理镜像,项目中建议使用 harbor 作为本地镜像仓库

下载

下载地址:Releases · goharbor/harbor (github.com)

$ HARBOR_VERSION="v2.13.1"




$ wget -P /usr/local/src/ \
https://github.com/goharbor/harbor/releases/download/${HARBOR_VERSION}/harbor-offline-installer-${HARBOR_VERSION}.tgz

# 有代理可以使用代理下载
$ wget -e https_proxy=http://192.168.111.1:10811 \
    -P /usr/local/src/ \
    https://github.com/goharbor/harbor/releases/download/${HARBOR_VERSION}/harbor-offline-installer-${HARBOR_VERSION}.tgz

解压

mkdir -p /opt/


tar -xf /usr/local/src/harbor-offline-installer-${HARBOR_VERSION}.tgz -C /opt/

配置

# 创建数据存放路径
$ mkdir -p /data/harbor/


$ cd /opt/harbor/

$ cp harbor.yml.tmpl harbor.yml



$ vim harbor.yml
# 修改域名为当前主机或当前主机 IP,需要与颁发证书时设置的一致
hostname: admin.kubespray.local


# 若无证书则需要注释 https 相关配置
# 配置 https 证书
https:
  port: 443
  certificate: /data/certs/harbor/admin.kubespray.local.crt
  private_key: /data/certs/harbor/admin.kubespray.local.key


# 指定 harbor 登录 admin 用户的密码
harbor_admin_password: Harbor12345


# 数据存放目录
data_volume: /data/harbor

安装 harbor

$ ./install.sh
......
✔ ----Harbor has been installed and started successfully.----

启动 harbor

设置 systemd 管理

$ cat > /usr/lib/systemd/system/harbor.service << EOF
[Unit]
Description=Harbor
After=docker.service systemd-resolved.service
Requires=docker.service
Documentation=https://goharbor.io/

[Service]
RemainAfterExit=yes
ExecStart=/usr/bin/docker compose -f /opt/harbor/docker-compose.yml up
ExecStop=/usr/bin/docker compose -f /opt/harbor/docker-compose.yml down

[Install]
WantedBy=multi-user.target
EOF

启动

$ 
systemctl daemon-reload

systemctl stop harbor && systemctl enable --now harbor

检查

将私有 CA 证书加入系统的 CA 证书存储库

update-ca-trust

\cp /etc/pki/tls/ca.crt /etc/pki/ca-trust/source/anchors/

update-ca-trust extract

测试 TLS 握手是否正常

# 添加 hosts 解析(如果没有局域网 dns)
# $ echo "192.168.111.190 admin.kubespray.local" >>  /etc/hosts

$ openssl s_client -connect admin.kubespray.local:443

测试 WEB 访问是否正常

$ curl https://admin.kubespray.local/

Docker 登录

创建证书存放路径

$ mkdir -p /etc/docker/certs.d/admin.kubespray.local/

复制 CA 证书到 Docker 路径

客户端登录只需要一个文件:CA证书(ca.crt)或域名证书(<域名>.crt

但如果路径下存在私钥,则至少需要三个文件:

  • CA证书(ca.crt)或域名证书(<域名>.crt
  • 证书:xx.cert
  • 证书对应的私钥:xx.key
\cp /etc/pki/tls/ca.crt /etc/docker/certs.d/admin.kubespray.local/

登录

docker login -u admin admin.kubespray.local

构建本地 DNF/YUM 源

创建软件存储目录

SYSTEM_PKG_PATH=/data/repos/system/almalinux/9/x86_64/Packages
DOCKER_PKG_PATH=/data/repos/apps/almalinux/docker-ce/9/x86_64/Packages

mkdir -p ${SYSTEM_PKG_PATH}



mkdir -p ${DOCKER_PKG_PATH}

创建下载脚本

$ mkdir -p /opt/k8s/ && cd /opt/k8s/ 

$ 
cat > /opt/k8s/download_packages.sh << EOF
#!/bin/bash

# 目标下载目录
DOWNLOAD_DIR="${SYSTEM_PKG_PATH}"

# 确保下载目录存在
mkdir -p "\${DOWNLOAD_DIR}"

# 包列表文件
PACKAGE_FILE="./packages.txt"

# 检查包列表文件是否存在
if [ ! -f "\${PACKAGE_FILE}" ]; then
  echo "Package file \${PACKAGE_FILE} does not exist."
  exit 1
fi

# 从文件读取包名并下载每个包及其依赖项
while IFS= read -r pkg || [[ -n "\${pkg}" ]]; do
  echo "Processing \${pkg}..."
  dnf download --resolve --alldeps --destdir="\${DOWNLOAD_DIR}" "\${pkg}" || {
    echo "Failed to download \${pkg} and its dependencies."
    exit 1
  }
done < "\${PACKAGE_FILE}"

echo "Download completed successfully."

EOF

下载所需的 rpm 包及依赖

# 一些基础软件,ipvsadm  在使用 ipvs 模式时必须安装
$ cat > /opt/k8s/packages.txt << EOF
wget
vim
ipvsadm
ipset
make
EOF

# 由文件 {KUBESPRAY_HOME}/roles/kubernetes/preinstall/vars/centos.yml 确定需要安装的软件
$ cat >> /opt/k8s/packages.txt << EOF
device-mapper-libs
nss
conntrack
container-selinux
libseccomp
EOF




# 由文件 {KUBESPRAY_HOME}/roles/kubernetes/preinstall/defaults/main.yml 确定需要安装的软件
$ cat >> /opt/k8s/packages.txt << EOF
curl
rsync
socat
unzip
e2fsprogs
xfsprogs
ebtables
bash-completion
tar
EOF


# 下载
$ chmod +x /opt/k8s/download_packages.sh
$ /opt/k8s/download_packages.sh

下载 Docker/Containerd 的 rpm 包及依赖

$ cat << EOF | xargs -I{} sh -c "dnf repoquery --requires --resolve {} | xargs dnf download --downloaddir=${DOCKER_PKG_PATH}"
docker-ce
containerd.io
EOF

安装 DNF 源构建软件

$ dnf -y install createrepo

生成索引

# 生成索引
$ 

createrepo $(dirname ${SYSTEM_PKG_PATH})
createrepo $(dirname ${DOCKER_PKG_PATH})


# 检查
ls -lh $(dirname ${SYSTEM_PKG_PATH})

ls -lh $(dirname ${DOCKER_PKG_PATH})

配置 DNF 源


$ cat > /etc/yum.repos.d/local.repo << EOF
[local-os-repo]
name=local-os-repo
baseurl=http://admin.kubespray.local:8000/repos/system/almalinux/\$releasever/\$basearch/
enabled=1
gpgcheck=0

[local-docker-repo]
name=local-docker-repo
baseurl=http://admin.kubespray.local:8000/repos/apps/almalinux/docker-ce/\$releasever/\$basearch/
enabled=1
gpgcheck=0

EOF

测试

$ dnf install -y make --disablerepo=*  --enablerepo=local-os-repo

Kubespray 安装

方法一:部署主线版本

升级 python 版本

注:AlmaLinux 默认Python 版本为 3.9,不支持 ansible 9.5.1+,需要升级版本

参考:Install Python 3.11 on Rocky Linux 9 / AlmaLinux 9 | ComputingForGeeks

dnf -y install python3.12

dnf -y install python3.12-pip

下载 kubespray

kubespray 下载地址:Releases · kubernetes-sigs/kubespray (github.com)

dnf -y install git


cd /opt 


git clone https://ghfast.top/https://github.com/kubernetes-sigs/kubespray.git

安装 kubespray 依赖

参考:kubespray/docs/ansible.md at master · kubernetes-sigs/kubespray

# 查看 kubespray 依赖
# requirements 有多个版本,选择自己适合的版本即可
$ cd /opt/kubespray && cat requirements.txt
ansible==9.13.0
# Needed for community.crypto module
cryptography==44.0.1
# Needed for jinja2 json_query templating
jmespath==1.0.1
# Needed for ansible.utils.ipaddr
netaddr==1.3.0





# 创建虚拟环境
$ cd /opt/ && VENVDIR=kubespray-venv && python3.12 -m venv $VENVDIR

# 启用虚拟环境
$ cd /opt/ && source $VENVDIR/bin/activate

# 更新虚拟环境中 pip 版本
$ /opt/${VENVDIR}/bin/python3.12 -m pip install --upgrade pip


# ----------直接安装 kubespray 依赖-------------
# 不推荐,容易由于环境影响导致部分依赖无法安装
# $ pip3 install -r requirements.txt


# ----------下载后再安装(推荐)-----------------
# 创建目录
$ mkdir -p /opt/kubespray-req
    
# 下载(使用阿里云的 pip 源加速)
$ pip3 download -i  https://mirrors.aliyun.com/pypi/simple/ \
    -d /opt/kubespray-req \
    -r /opt/kubespray/requirements.txt

# 安装依赖
$ pip3 install -U --no-index \
    --find-links=/opt/kubespray-req  \
    -r /opt/kubespray/requirements.txt

方法二:部署稳定版本

参考:kubespray/docs/setting-up-your-first-cluster.md at master

安装 kubespray 及依赖

kubespray 下载地址:Releases · kubernetes-sigs/kubespray (github.com)

升级 python 版本

注:AlmaLinux 默认 Python 版本为 3.9,不支持 ansible 9.5.1+,需要升级版本

参考:Install Python 3.11 on Rocky Linux 9 / AlmaLinux 9 | ComputingForGeeks

dnf -y install python3.12

dnf -y install python3.12-pip

下载 kubespray

KUBESPRAY_VERSION=2.28.0


# 下载
$ wget -O /usr/local/src/kubespray-${KUBESPRAY_VERSION}.tar.gz \
    https://codeload.github.com/kubernetes-sigs/kubespray/tar.gz/refs/tags/v${KUBESPRAY_VERSION}


# 通过代理下载
$ wget -e https_proxy=http://192.168.111.1:10811 \
    -O /usr/local/src/kubespray-${KUBESPRAY_VERSION}.tar.gz \
    https://codeload.github.com/kubernetes-sigs/kubespray/tar.gz/refs/tags/v${KUBESPRAY_VERSION}


# 解压
$ tar -xf /usr/local/src/kubespray-${KUBESPRAY_VERSION}.tar.gz -C /opt/

# 创建软连接
$ ln -sf /opt/kubespray-${KUBESPRAY_VERSION} /opt/kubespray

安装 kubespray 依赖

参考:kubespray/docs/ansible/ansible.md at master · kubernetes-sigs/kubespray

# 查看 kubespray 依赖
# requirements 有多个版本,选择自己适合的版本即可
$ cd /opt/kubespray && cat requirements.txt
ansible==9.13.0
# Needed for community.crypto module
cryptography==45.0.2
# Needed for jinja2 json_query templating
jmespath==1.0.1
# Needed for ansible.utils.ipaddr
netaddr==1.3.0



# 创建虚拟环境,需要使用 python 3.12 启用虚拟环境 
$ cd /opt/ && VENVDIR=kubespray-venv && python3.12 -m venv $VENVDIR

# 启用虚拟环境
$ cd /opt/ && source $VENVDIR/bin/activate

# 更新虚拟环境中 pip 版本
$ /opt/kubespray-venv/bin/python3.12 -m pip install --upgrade pip



# ----------直接安装 kubespray 依赖-------------
# 不推荐,容易由于环境影响导致部分依赖无法安装
# $ pip3 install -r requirements.txt


# ----------下载后再安装(推荐)-----------------
# 创建目录
$ mkdir -p /opt/kubespray-req

# 下载(使用阿里云的 pip 源加速)
$ pip3 download -i  https://mirrors.aliyun.com/pypi/simple/ \
    -d /opt/kubespray-req \
    -r /opt/kubespray/requirements.txt 
    


# 安装依赖
$ pip3 install -U --no-index \
    --find-links=/opt/kubespray-req  \
    -r /opt/kubespray/requirements.txt 
    

方法三:使用容器部署(未测试)

基础变量设置

cd /opt/

KUBESPRAY_VERSION=2.28.0
KUBESPRAY_HOME=/opt/kubespray-${KUBESPRAY_VERSION}

mkdir -p ${KUBESPRAY_HOME}/inventory/kubespray.local

# 写入环境变量
cat > /etc/profile.d/kubespray.sh << EOF
export KUBESPRAY_VERSION=2.28.0
export KUBESPRAY_HOME=/opt/kubespray-${KUBESPRAY_VERSION}
EOF


source /etc/profile


# 如果没有生成过密钥
ssh-keygen

安装运行


$ docker pull quay.io/kubespray/kubespray:v${KUBESPRAY_VERSION}


$ docker run -it  \
    --mount type=bind,source="${HOME}"/.ssh/id_rsa,dst=/root/.ssh/id_rsa \
    --mount type=bind,source=${KUBESPRAY_HOME}/inventory/kubespray.local,dst=/kubespray/inventory/kubespray.local \
    quay.io/kubespray/kubespray:v${KUBESPRAY_VERSION} bash


# 编写 /kubespray/inventory/kubespray.local/inventory.ini


# 执行安装
> ansible-playbook -i /kubespray/inventory/kubespray.local/inventory.ini --private-key /root/.ssh/id_rsa cluster.yml



K8S 主机准备

所有主机均需设置,此处将会适当使用 ansible 批量执行命令

注:以下所有操作均在虚拟环境 kubespray-venv 中,在目录 /opt/k8s 中进行

创建 Ansible 主机清单

# 进入虚拟环境
$ VENVDIR=kubespray-venv && cd /opt/ && source $VENVDIR/bin/activate

# 创建 ansible 项目目录
$ mkdir -p /opt/k8s && cd /opt/k8s

# 生成默认配置文件,并修改主机清单位置
$ ansible-config init --disabled > ansible.cfg

# 设置主机清单位置为当前目录 inventory=./ansible-hosts
$ sed -ri 's@;(inventory)=.*@\1=./ansible-hosts@g' ansible.cfg


# 创建 ansible 主机清单
$ cat > ./ansible-hosts << EOF
[k8s_hosts]
192.168.111.191
192.168.111.192
192.168.111.193
192.168.111.194


[local]
192.168.111.190
EOF

SSH 免密登录

配置 admin 主机对其他 k8s 主机的 ssh 免密登录,方便 ansible 运行命令

修改 sshd 配置

# 修改 admin 主机的全局 host key 校验配置
$ sed -ri 's@.*(StrictHostKeyChecking).*@\1 no@g' /etc/ssh/ssh_config

创建主机列表和脚本

# 创建主机列表
$ cat > k8s-hosts.list << EOF
192.168.111.190
192.168.111.191
192.168.111.192
192.168.111.193
192.168.111.194
EOF


# 创建密钥分发脚本
$ cat > ssh_key_auth.sh << EOF
#!/bin/bash

rpm -q sshpass &> /dev/null || dnf -y install sshpass  

[ -f /root/.ssh/id_rsa ] || ssh-keygen -f /root/.ssh/id_rsa  -P ''

# 根据情况修改密码
export SSHPASS=520123

while read IP;do
   sshpass -e /usr/bin/ssh-copy-id -o StrictHostKeyChecking=no \$IP
done < k8s-hosts.list
EOF

分发密钥

$ bash ssh_key_auth.sh

验证

免密登录配置完成后,可以验证 ansible 是否能正常连接各主机

$ ansible all -m ping
# 输出信息
192.168.111.191 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
......

......

时间设置

时区设置

ansible all -m shell -a 'timedatectl set-timezone Asia/Shanghai'

时间同步设置

复制当前的 chrony 配置

cp /etc/chrony.conf  ./chrony.conf 

修改 chrony 配置

#### 设置 #### 
NTP_SERVERS=( \
    "0.cn.pool.ntp.org" \
    "ntp.tuna.tsinghua.edu.cn" \
    "ntp.tencent.com" \
    "ntp.aliyun.com" \
    "ntp.ntsc.ac.cn" \
)

# 删除所有现有的 pool 行
sed -i '/^pool /d' ./chrony.conf


# 在第三行插入新的 NTP 服务器
for NTP_SERVER in "${NTP_SERVERS[@]}"; do
    sed -i "3i\pool ${NTP_SERVER} iburst" ./chrony.conf
done

# 重启 chronyd
systemctl restart chronyd

# 检查同步情况
chronyc sources

分发参数配置

# 分发参数配置
$ ansible all -m copy -a 'src=./chrony.conf  dest=/etc/chrony.conf'


# 重启 chronyd
$ ansible all -m shell -a 'systemctl restart chronyd'

# 检查同步情况
chronyc sources

关闭 SELinux 策略

所有主机关闭 SELinux

# 关闭 SELinux
$ ansible all -m shell -a "setenforce 0"

$ ansible all -m shell -a "sed -ri 's@(^SELINUX)=.*@\1=disabled@g' /etc/selinux/config && sed -ri 's@(^SELINUX)=.*@\1=disabled@g' /etc/sysconfig/selinux"

加载所需的模块

所有主机加载模块,重启主机需要重新加载,建议配置到开机启动脚本

加载一些基础模块

$ 
# 加载 overlay 模块
ansible all -m shell -a 'modprobe -- overlay' 

ansible all -m shell -a 'echo 'overlay' > /etc/modules-load.d/overlay.conf'


# 加载 br_netfilter 模块
ansible all -m shell -a 'modprobe -- br_netfilter' 

ansible all -m shell -a 'echo 'br_netfilter' > /etc/modules-load.d/br_netfilter.conf'


# 加载 ip_conntrack 模块
ansible all -m shell -a 'modprobe -- ip_conntrack'

ansible all -m shell -a 'echo 'ip_conntrack' > /etc/modules-load.d/ip_conntrack.conf'


# 加载 nf_conntrack 模块
ansible all -m shell -a 'modprobe -- nf_conntrack'

ansible all -m shell -a 'echo 'nf_conntrack' > /etc/modules-load.d/nf_conntrack.conf'

加载 ipvs 相关模块

$ 
ansible all -m shell -a 'modprobe -- ip_vs'
ansible all -m shell -a 'modprobe -- ip_vs_rr'
ansible all -m shell -a 'modprobe -- ip_vs_wrr'
ansible all -m shell -a 'modprobe -- ip_vs_sh'
ansible all -m shell -a 'modprobe -- ip_vs_wlc'
ansible all -m shell -a 'modprobe -- ip_vs_lc'



$ cat > ./ipvs.conf << EOF
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
ip_vs_wlc
ip_vs_lc
EOF

# 分发文件到其他主机
$ ansible all -m copy -a 'src=./ipvs.conf dest=/etc/modules-load.d/'

检查加载的模块

$ lsmod | grep -e 'ip_vs' -e 'nf_conntrack' -e 'ip_conntrack' -e 'br_netfilter'

系统参数配置

所有主机配置系统参数优化

SWAP 配置

$ ansible all -m shell -a 'swapoff -a'


$ ansible all -m shell -a 'sed -ri "/swap/ s@^@#@g" /etc/fstab'

资源限制

创建资源限制配置(PAM)

$ cat > ./pam-limits-sys.conf <<EOF
*            -    core            unlimited
*            -    nproc           unlimited
*            -    nofile          1048576
*            -    memlock         unlimited
*            -    msgqueue        unlimited
*            -    stack           unlimited	
EOF

创建资源限制配置(systemd)

$ cat > ./systemd-user-sys.conf << EOF
[Manager]
DefaultLimitCORE=infinity
DefaultLimitNPROC=infinity
DefaultLimitNOFILE=1048576
DefaultLimitMEMLOCK=infinity
DefaultLimitMSGQUEUE=infinity
EOF


$ cp /etc/systemd/system.conf ./system.conf
sed -ri 's@^#* *(DefaultLimitCORE).*@\1=infinity@' ./system.conf
sed -ri 's@^#* *(DefaultLimitNPROC).*@\1=infinity@' ./system.conf
sed -ri 's@^#* *(DefaultLimitNOFILE).*@\1=1048576@' ./system.conf
sed -ri 's@^#* *(DefaultLimitMEMLOCK).*@\1=infinity@' ./system.conf
sed -ri 's@^#* *(DefaultLimitMSGQUEUE).*@\1=infinity@' ./system.conf

分发参数配置

# 分发参数配置
$ 
ansible all -m copy -a 'src=./pam-limits-sys.conf dest=/etc/security/limits.d/'

ansible all -m copy -a 'src=./systemd-user-sys.conf dest=/etc/systemd/user.conf.d/'

ansible all -m copy -a 'src=./system.conf dest=/etc/systemd/'


# 应用参数配置
$ ansible all -m shell -a 'systemctl daemon-reexec'

内核参数

注:内核参数文档参考:kernel.org/doc/Documentation/sysctl/README

注:Pod 级别的内核参数设置可以参考:在 Kubernetes 集群中使用 sysctl | Kubernetes

创建内核参数配置

$ cat > ./90-sysctl.conf  << EOF
######  TCP 连接快速释放设置  ######
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3



######  TIME_WAIT 过多时设置   ######
net.ipv4.tcp_tw_reuse = 1
#net.ipv4.tcp_tw_recycle = 0
# 限制 TIME_WAIT 最大值,默认 8192
net.ipv4.tcp_max_tw_buckets=5000


######  端口相关设置  ######
# 设定允许系统主动打开的端口范围,根据需要设置,默认 32768	60999
net.ipv4.ip_local_port_range = 32768	65530


######  防 SYNC 攻击设置  ######
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_syn_retries=3
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_max_syn_backlog=8192
# 配置 TCP 重传的最大次数减少到 5 次,超时时间约为 6 秒,方便及时发现节点故障
# net.ipv4.tcp_retries2=5

######  其他 TCP 设置  ######
# 系统当前因后台进程无法处理的新连接而溢出,则允许系统重置新连接
net.ipv4.tcp_abort_on_overflow=1


#######  nf_conntrack 相关设置(k8s、docker 防火墙的 nat)  #######
net.netfilter.nf_conntrack_max = 262144
net.nf_conntrack_max = 262144

net.netfilter.nf_conntrack_tcp_timeout_established = 86400
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 3600
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120


####### socket 相关设置 ######
net.core.somaxconn = 32768
net.core.netdev_max_backlog = 32768




######  其他设置  #######
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.default.accept_source_route=0
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
#
net.ipv4.conf.all.forwarding=1
net.ipv6.conf.all.forwarding=1
# 转发端口(nodeport)
net.ipv4.ip_local_reserved_ports=30000-32767


######  内存相关设置 #######
vm.swappiness = 0
vm.max_map_count = 655360
vm.overcommit_memory = 0
# vm.min_free_kbytes = 1048576
vm.overcommit_memory=1
vm.panic_on_oom=0

###### 文件相关 #######
fs.file-max = 6573688
fs.nr_open = 1048576
fs.aio-max-nr = 1048576


#######  K8S 相关设置 ######
# 必须先加载 br_netfilter 模块
# 二层的网桥在转发包时也会被 arptables/ip6tables/iptables 的 FORWARD 规则所过滤
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1

###### 进程相关 #######
# 最大进程 id,默认值为 32768,最大值根据发行版有所不同
kernel.pid_max = 132768
kernel.threads-max = 123342

###### 其他内核相关 #######
kernel.keys.root_maxbytes=25000000
kernel.keys.root_maxkeys=1000000
kernel.panic=10
kernel.panic_on_oops=1

EOF



# kube-bench 相关参数配置,使用 ansible 安装时会自动设置
# 参考:{KUBESPRAY_HOME}/roles/kubernetes/preinstall/tasks/0080-system-configurations.yml

分发参数配置并应用

# 分发参数配置
$ ansible all -m copy -a 'src=./90-sysctl.conf dest=/etc/sysctl.d/'

# 应用参数配置
$ ansible all -m shell -a 'sysctl --system'

配置 DNS 解析

DNS 配置

注:有局域网 DNS 的情况

nmcli 模块相关文档:community.general.nmcli module – Manage Networking — Ansible Community Documentation

ansible all -m community.general.nmcli -a "type=ethernet conn_name=ens160 dns4=192.168.111.190 state=present" --become


ansible all -m shell -a "nmcli con up ens160" --become

HOSTS 配置

注:没有局域网 DNS 的情况

编写 hosts 解析文件

# 只需要解析 192.168.111.190 主机,其余主机 kubespray 会自动添加解析
$ cat > ./hosts << EOF
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.111.190    admin.kubespray.local
EOF

复制文件到各个主机

$ ansible k8s_hosts -m copy -a 'src=./hosts dest=/etc/'

创建 DNF 仓库文件

禁用所有仓库(除了 local-*-repo 之外)

# 备份仓库配置
$ ansible k8s_hosts -m copy -a "src=/etc/yum.repos.d/ dest=/etc/yum.repos.d.bak-`date +'%Y%m%d'` remote_src=yes" -b


# 禁用所有仓库
$ ansible k8s_hosts -m shell -a "sed -i 's/enabled=1/enabled=0/g' /etc/yum.repos.d/*.repo " --become

启用新仓库(局域网)

ansible all -m ansible.builtin.yum_repository \
  -a "name=local-os-repo
      description=local-os-repo
      baseurl=http://admin.kubespray.local:8000/repos/system/almalinux/\$releasever/\$basearch/
      enabled=1
      gpgcheck=0" \
  --become

ansible all -m ansible.builtin.yum_repository \
  -a "name=local-docker-repo
      description=local-docker-repo
      baseurl=http://admin.kubespray.local:8000/repos/apps/almalinux/docker-ce/\$releasever/\$basearch/
      enabled=1
      gpgcheck=0" \
  --become

创建缓存

$ ansible k8s_hosts -m shell -a "dnf makecache" --become

添加证书信任

harbor 使用的是自签证书,所以需要在各个 k8s 主机添加证书信任

# 创建证书目录
$ ansible k8s_hosts -m shell -a "mkdir -p /etc/containerd/certs.d/admin.kubespray.local/" --become


# 复制证书
# 此处直接使用 CA 证书
$ ansible k8s_hosts -m copy -a 'src=/etc/pki/tls/ca.crt dest=/etc/containerd/certs.d/admin.kubespray.local/'

防火墙设置

节点数据

注:如果项目所有主机均已配置防火墙互通,则这一步可以省略

添加一个 zone,默认设置为 ACCEPT,并将集群中的主机添加到该 zone 当中

创建 playbook

$ cat << EOF > ./firewalld-nodes-playbook.yaml
---
- name: Configure firewall for kubespray-local zone
  hosts: all
  become: yes  # 使用提升权限,因为修改防火墙配置通常需要root权限
  tasks:
    - name: Create kubespray-local zone
      ansible.builtin.firewalld:
        state: present
        zone: kubespray-local
        permanent: true
        immediate: false

    - name: Set default policy for kubespray-local to ACCEPT
      ansible.builtin.firewalld:
        zone: kubespray-local
        target: ACCEPT
        state: present
        permanent: true
        immediate: false

    - name: Add sources to kubespray-local zone
      ansible.builtin.firewalld:
        zone: kubespray-local
        source: "{{ item }}"
        state: enabled
        permanent: true
        immediate: false
      loop:
        - "192.168.111.190/32"
        - "192.168.111.191/32"
        - "192.168.111.192/32"
        - "192.168.111.193/32"
        - "192.168.111.194/32"

    - name: Reload firewall to apply changes
      ansible.builtin.command: firewall-cmd --reload

EOF

应用 playbook

$ ansible-playbook -i ./ansible-hosts ./firewalld-nodes-playbook.yaml

容器数据

注:需要与 pod 和 service 的 CIDR 配置一致

创建 playbook

$ cat << EOF > ./firewalld-k8s-internal-playbook.yaml
---
- name: Configure Firewalld For K8S Net CIDR
  hosts: k8s_hosts
  # 需要管理员权限来修改防火墙规则
  become: yes  
  tasks:
    - name: Add Service CIDR 10.233.0.0/18 to trusted zone
      ansible.posix.firewalld:
        zone: trusted
        source: 10.233.0.0/18
        state: enabled
        permanent: true
        immediate: yes

    - name: Add Pod CIDR 10.233.64.0/18 to trusted zone
      ansible.posix.firewalld:
        zone: trusted
        source: 10.233.64.0/18
        state: enabled
        permanent: true
        immediate: yes
        
    - name: Reload firewall to apply changes
      ansible.builtin.command: firewall-cmd --reload
EOF

应用 playbook

$ ansible-playbook -i ./ansible-hosts ./firewalld-k8s-internal-playbook.yaml

配置

进入虚拟环境

$ VENVDIR=kubespray-venv && cd /opt/ && source $VENVDIR/bin/activate

复制一份默认的配置

$ cp -a /opt/kubespray/inventory/sample /opt/kubespray/inventory/kubespray.local \
  && cd /opt/kubespray/inventory/kubespray.local

配置 k8s 主机清单(新)

注:如果是容器安装,需要进入容器执行该操作

参考:Inventory

Kubespray 2.27.0 版本之后,不再提供生成主机清单的脚本,可以参考文档或者 /opt/kubespray/inventory/sample/inventory.ini 编辑主机清单

$ vim /opt/kubespray/inventory/kubespray.local/inventory.ini

kube-cp-01.kubespray.local ansible_host=192.168.111.191 etcd_member_name=etcd-01.kubespray.local # ip=私网IP
kube-node-01.kubespray.local ansible_host=192.168.111.192  # ip=私网IP
kube-node-02.kubespray.local ansible_host=192.168.111.193  # ip=私网IP
kube-node-03.kubespray.local ansible_host=192.168.111.194  # ip=私网IP


[kube_control_plane]
kube-cp-01.kubespray.local

[etcd:children]
kube_control_plane

[kube_node]
kube-node-01.kubespray.local
kube-node-02.kubespray.local
kube-node-03.kubespray.local

另一种主机清单格式(yaml)

# 可以根据实际情况修改作为控制面节点的主机、安装 etcd 的主机等配置
$ cat /opt/kubespray/inventory/kubespray.local/hosts.yaml       
all:
  hosts:
    kube-cp-01.kubespray.local:
      ansible_host: 192.168.111.191
      ip: 192.168.111.191
      access_ip: 192.168.111.191
    kube-node-01.kubespray.local:
      ansible_host: 192.168.111.192
      ip: 192.168.111.192
      access_ip: 192.168.111.192
    kube-node-02.kubespray.local:
      ansible_host: 192.168.111.193
      ip: 192.168.111.193
      access_ip: 192.168.111.193
    kube-node-03.kubespray.local:
      ansible_host: 192.168.111.194
      ip: 192.168.111.194
      access_ip: 192.168.111.194
  children:
    kube_control_plane:
      hosts:
        kube-cp-01.kubespray.local:
    kube_node:
      hosts:
        kube-node-01.kubespray.local:
        kube-node-02.kubespray.local:
        kube-node-03.kubespray.local:
    etcd:
      hosts:
        kube-cp-01.kubespray.local:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico_rr:
      hosts: {}

个性化配置

Containerd 相关配置

注:当前 containerd (v1.7+)的配置不支持配置 registry 认证凭证

关注开发进展 - 1:https://github.com/containerd/containerd/issues/8228

关注开发进展 - 2:[WIP] Introduce credential plugin by lengrongfu · Pull Request #9872 · containerd/containerd (github.com)

关注开发进展 - 3:feat: set credentials in host configurations by knight42 · Pull Request #10612 · containerd/containerd

关注开发进展 - 4:Support credential domain aliases in host configuration · Issue #10540 · containerd/containerd

生成 base64 请求头

$ echo -n "admin:Harbor12345" | base64
YWRtaW46SGFyYm9yMTIzNDU=


# $ echo -n "18487357220:0320Yxc520.." | base64
# MTg0ODczNTcyMjA6MDMyMFl4YzUyMC4u

修改配置模板

参考:Private Registry auth config when using hosts.toml · containerd/containerd · Discussion #6468

# 备份
cp /opt/kubespray/roles/container-engine/containerd/templates/hosts.toml.j2 /opt/kubespray/roles/container-engine/containerd/templates/hosts.toml.j2.bak-`date +"%Y%m%d"`

# 新增 auth 配置
$ vim /opt/kubespray/roles/container-engine/containerd/templates/hosts.toml.j2

server = "{{ item.server | default("https://" + item.prefix) }}"
{% for mirror in item.mirrors %}
[host."{{ mirror.host }}"]
  capabilities = ["{{ ([ mirror.capabilities ] | flatten ) | join('","') }}"]
  skip_verify = {{ mirror.skip_verify | default('false') | string | lower }}
  override_path = {{ mirror.override_path | default('false') | string | lower }}
{% if mirror.ca is defined %}
  ca = ["{{ ([ mirror.ca ] | flatten ) | join('","') }}"]
{% endif %}
{% if mirror.client is defined %}
  client = [{% for pair in mirror.client %}["{{ pair[0] }}", "{{ pair[1] }}"]{% if not loop.last %},{% endif %}{% endfor %}]
{% endif %}
{% if mirror.auth is defined %}
  [host."{{ mirror.host }}".header]
    authorization = "Basic {{ mirror.auth }}"
{% endif %}
{% endfor %}

修改配置

$ vim /opt/kubespray/inventory/kubespray.local/group_vars/all/containerd.yml
# 修改以下内容
containerd_storage_dir: "/data/containerd"
......

# Registries defined within containerd.
containerd_registries_mirrors:
  - prefix: docker.io
    mirrors:
      - host: https://docker.m.daocloud.io
        capabilities: ["pull", "resolve"]
      - host: https://dockerproxy.cn
        capabilities: ["pull", "resolve"]
      - host: https://dockerpull.com
        capabilities: ["pull", "resolve"]
      - host: https://docker.aiden-work.tech
        capabilities: ["pull", "resolve"]
  - prefix: admin.kubespray.local
    mirrors:
      - host: https://admin.kubespray.local
        capabilities: ["pull", "resolve", "push"]
        skip_verify: false
        ca: "/etc/containerd/certs.d/admin.kubespray.local/ca.crt"
        auth: YWRtaW46SGFyYm9yMTIzNDU=

#
#  header 携带 basic auth 的方式,不适用于 docker.io 、 阿里云 ACR
#  - prefix: registry.cn-hangzhou.aliyuncs.com
#    mirrors:
#      - host: https://registry.cn-hangzhou.aliyuncs.com
#        capabilities: ["pull", "resolve", "push"]
#        skip_verify: false
#        auth: MTg0ODczNTcyMjA6MDMyMFl4YzUyMC4u

# 注:暂时不能配置 containerd_registry_auth
# containerd_registry_auth:
#   - registry: 10.0.0.2:5000
#     username: user
#     password: pass


......

生成的配置示例

server = "https://admin.kubespray.local"
[host."https://admin.kubespray.local"]
  skip_verify = false
  ca = "/etc/containerd/certs.d/admin.kubespray.local/ca.crt"
  [host."https://admin.kubespray.local".header]
  authorization = "Basic YWRtaW46SGFyYm9yMTIzNDU="

K8S 集群相关配置

注:当前 kubespray 支持的 kubernetes 版本可以参考 kubespray/roles/kubespray-defaults/defaults/main/checksums.yml

注:不推荐使用 kubespray 部署的 NodeLocal DNS

推荐自己手动安装,参考:在 Kubernetes 集群中使用 NodeLocal DNSCache | Kubernetes

$ vim /opt/kubespray/inventory/kubespray.local/group_vars/k8s_cluster/k8s-cluster.yml



# 是否部署 nodelocaldns,小项目无需 nodelocaldns,大项目根据需要配置
enable_nodelocaldns: false

# 网络插件类型,默认 calico。可以选择 cilium, calico, kube-ovn, weave 或 flannel
kube_network_plugin: calico


# Service 的网段
kube_service_addresses: 10.233.0.0/18  

# Pod 的网段
kube_pods_subnet: 10.233.64.0/18

# 每个节点分配 pod 网段的大小
# 常用于限制每个 node 的 pod 数量
kube_network_node_prefix: 24

# 直接限制每个 node 的 pod 数量
kubelet_max_pods: 110


# kube-proxy 的模式选择,默认为 ipvs,可以切换为 iptables
kube_proxy_mode: ipvs  


# Kubernetes 集群名称
cluster_name: cluster.local

# 容器运行时选择
container_manager: containerd  

# event 保留时间
## Amount of time to retain events. (default 1h0m0s)
event_ttl_duration: "1h0m0s"


#####  资源预留与 Pod 驱逐(根据实际情况配置)  #######
#####  参考示例:https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#example-scenario

##  kubernetes 资源预留
##  预留资源给 Kubernetes 的守护进程(如:kube-apiserver、kube-scheduler、kube-controller-manager 等)
# kube_reserved: false
## Uncomment to override default values
## The following two items need to be set when kube_reserved is true
# kube_reserved_cgroups_for_service_slice: kube.slice
# kube_reserved_cgroups: "/{{ kube_reserved_cgroups_for_service_slice }}"
# kube_memory_reserved: 256Mi
# kube_cpu_reserved: 100m
# kube_ephemeral_storage_reserved: 2Gi
# kube_pid_reserved: "1000"
##  针对主节点(master node)预留的资源 
# Reservation for master hosts
# kube_master_memory_reserved: 512Mi
# kube_master_cpu_reserved: 200m
# kube_master_ephemeral_storage_reserved: 2Gi
# kube_master_pid_reserved: "1000"




# 系统资源预留
# system_reserved: true
## Uncomment to override default values
## The following two items need to be set when system_reserved is true
# system_reserved_cgroups_for_service_slice: system.slice
# system_reserved_cgroups: "/{{ system_reserved_cgroups_for_service_slice }}"
# system_memory_reserved: 512Mi
# system_cpu_reserved: 500m
# system_ephemeral_storage_reserved: 2Gi

##  针对主节点(master node)预留的资源 
## Reservation for master hosts
# system_master_memory_reserved: 256Mi
# system_master_cpu_reserved: 250m
# system_master_ephemeral_storage_reserved: 2Gi

system_reserved: true
system_memory_reserved: 512Mi
system_cpu_reserved: 500m
system_ephemeral_storage_reserved: 2Gi

# 什么情况下执行 Pod 驱逐,参考 https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/kubelet-config-file/
## Eviction Thresholds to avoid system OOMs
# https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#eviction-thresholds
# eviction_hard: {}
# eviction_hard_control_plane: {}
eviction_hard: {memory.available: "500Mi", nodefs.available: "15%", nodefs.inodesFree: "15%", imagefs.available: "15%"}

etcd 相关配置

$ vim /opt/kubespray/inventory/kubespray.local/group_vars/all/etcd.yml

# etcd 数据存储目录
etcd_data_dir: /data/etcd

# etcd 安装位置,默认 host,即安装在主机,可以设置为 docker 或 containerd
etcd_deployment_type: host 

附加组件配置

可以配置一些常用的组件,例如 Dashboard、Ingress、Metrics Server 等等组件

$ vim /opt/kubespray/inventory/kubespray.local/group_vars/k8s_cluster/addons.yml
# 例如安装 Kubernetes dashboard,则设置为 true
# dashboard_enabled: true

证书配置

自动更新证书配置

$ vim /opt/kubespray/inventory/kubespray.local/group_vars/k8s_cluster/k8s-cluster.yml

## Automatically renew K8S control plane certificates on first Monday of each month
auto_renew_certificates: true

注:这只是 etcd 证书的有效期,其余证书由 kubeadm 生成,默认一年(ca 十年)

$ vim /opt/kubespray/roles/kubespray_defaults/defaults/main/main.yml
certificates_duration: 36500

Calico 配置

$ vim /opt/kubespray/inventory/kubespray.local/group_vars/k8s_cluster/k8s-net-calico.yml
# Set calico network backend: "bird", "vxlan" or "none"
# bird enable BGP routing, required for ipip and no encapsulation modes
# calico_network_backend: vxlan

# IP in IP and VXLAN is mutualy exclusive modes.
# set IP in IP encapsulation mode: "Always", "CrossSubnet", "Never"
# calico_ipip_mode: 'Never'

# set VXLAN encapsulation mode: "Always", "CrossSubnet", "Never"
# calico_vxlan_mode: 'Always'
                             

NodePort 端口设置

根据需要进行设置

$ vim /opt/kubespray/roles/kubernetes/control-plane/defaults/main/main.yml
......
# 默认 30000-32767
kube_apiserver_node_port_range: "30000-32767"
.......


$ vim /opt/kubespray/roles/kubernetes/node/defaults/main.yml
......
# 默认 30000-32767
kube_apiserver_node_port_range: "30000-32767"
.......

CoreDNS(集群内)设置

$ vim /opt/kubespray/inventory/kubespray.local/group_vars/all/all.yml

## Upstream dns servers
upstream_dns_servers:
  - 192.168.111.190
  - 114.114.114.114
  - 223.5.5.5
  - 119.29.29.29
  - 8.8.8.8

允许不安全的内核参数(可选)

$ vim /opt/kubespray/roles/kubernetes/node/defaults/main.yml


######  影响所有 kubelet,包括控制节点
# 根据实际需要添加配置,支持通配符
## Support parameters to be passed to kubelet via kubelet-config.yaml
kubelet_config_extra_args:
  allowedUnsafeSysctls:
    - "net.core.*"
    - "net.ipv4.*"

## Parameters to be passed to kubelet via kubelet-config.yaml when cgroupfs is used as cgroup driver
kubelet_config_extra_args_cgroupfs:
  systemCgroups: /system.slice
  cgroupRoot: /

######  只影响工作节点上的 kubelet,不包括控制节点
## Support parameters to be passed to kubelet via kubelet-config.yaml only on nodes, not masters
kubelet_node_config_extra_args: {}


######  这两个设置,是以标签形式作为 kubelet 参数,现在已经弃用
## Support custom flags to be passed to kubelet
# kubelet_custom_flags: []

## Support custom flags to be passed to kubelet only on nodes, not masters
# kubelet_node_custom_flags: []


部署资源下载

参考:kubespray/docs/offline-environment.md

kubespray/contrib/offline/manage-offline-container-images.sh 脚本可以在互联网下载镜像,构建本地 Registry,并上传镜像。但 Registry 不好管理镜像,此处将使用 Harbor 作为本地镜像仓库

查看所需资源

离线环境检查

# 修改下载组件的版本,不修改的话,默认用当前支持的最高版本
#$ vim /opt/kubespray/roles/kubespray_defaults/defaults/main/main.yml
# kube_version: 1.32.5

# 运行离线环境检查脚本
# 进入虚拟环境
$ VENVDIR=kubespray-venv && cd /opt/ && source $VENVDIR/bin/activate

$ bash /opt/kubespray/contrib/offline/generate_list.sh -i /opt/kubespray/inventory/kubespray.local/inventory.ini

资源文件下载地址

$ cat /opt/kubespray/contrib/offline/temp/files.list
https://dl.k8s.io/release/v1.32.5/bin/linux/amd64/kubelet
https://dl.k8s.io/release/v1.32.5/bin/linux/amd64/kubectl
https://dl.k8s.io/release/v1.32.5/bin/linux/amd64/kubeadm
......

镜像下载地址

$ cat /opt/kubespray/contrib/offline/temp/images.list
docker.io/mirantis/k8s-netchecker-server:v1.2.2
docker.io/mirantis/k8s-netchecker-agent:v1.2.2
quay.io/coreos/etcd:v3.5.16
quay.io/cilium/cilium:v1.17.3
......

镜像资源准备

注:registry.k8s.io 的镜像资源需要连接外网

剔除镜像

  • registry.k8s.io/sig-storage/local-volume-provisioner:v2.5.0
sed -ri '/local-volume-provisioner/d' /opt/kubespray/contrib/offline/temp/images.list

下载镜像资源

$ cat /opt/kubespray/contrib/offline/temp/images.list | xargs -n 1 docker pull 

harbor 批量创建项目

注:harbor 的项目名为 k8s-cluster,访问级别可以为公开

web 页面创建,步骤较为简单,省略

重新打标签

# 定义 harbor 地址变量
$ HARBOR_DOMAIN=admin.kubespray.local/k8s-cluster



# 该命令可以过滤镜像列表,并将原 url 替换成 harbor 地址
# 传给 docker tag,批量修改镜像标签
$ cat /opt/kubespray/contrib/offline/temp/images.list   \
  | awk -F'/' -v OFS='/' '{$1=$0"  '$HARBOR_DOMAIN'";print}' | xargs -n 2 docker tag
  
  
# 查看是否正常更新标签
$ docker images | grep admin.kubespray.local

上传至 Harbor

$ docker images | grep $HARBOR_DOMAIN \
    | awk -F' ' -v OFS=':' '{print $1,$2}' \
    | xargs -i docker push {}

下载文件资源

提取路径并创建本地路径(脚本)

$ cd /opt/k8s/ && vim /opt/k8s/create_dir.sh

#!/bin/bash

# 定义下载文件列表
FILES="/opt/kubespray/contrib/offline/temp/files.list"

# 定义基础目录
BASE_DIR="/data/files"

# 创建一个新文件用于存放 URL 和对应的本地目录路径
OUTPUT_FILE="./url_and_paths.list"
rm -f ${OUTPUT_FILE}

# 读取并处理文件中的每一行
while read -r line; do
    # 使用 awk 去掉 http(s)://
    without_http=$(echo "$line" | awk -F'//' '{print $2}')
    
    # 使用 dirname 去掉最后的文件名
    path_only=$(dirname "$without_http")
    
    # 拼接目录
    file_path="$BASE_DIR/$path_only/"
    
    # 创建相应的目录
    mkdir -p "$file_path"
    
    # 在 URL 后面追加相应的本地目录路径,并保存到新文件中
    echo "$line $file_path" >> "$OUTPUT_FILE"
    
    # 输出创建的目录信息
    echo "Created directory: $file_path"
done < "$FILES"

运行脚本,查看生成的路径文件

$ chmod 700 /opt/k8s/create_dir.sh && bash /opt/k8s/create_dir.sh

$ cat /opt/k8s/url_and_paths.list
......
https://dl.k8s.io/release/v1.32.5/bin/linux/amd64/kubelet /data/files/dl.k8s.io/release/v1.32.5/bin/linux/amd64/
https://dl.k8s.io/release/v1.32.5/bin/linux/amd64/kubectl /data/files/dl.k8s.io/release/v1.32.5/bin/linux/amd64/
https://dl.k8s.io/release/v1.32.5/bin/linux/amd64/kubeadm /data/files/dl.k8s.io/release/v1.32.5/bin/linux/amd64/
......

查看

$ tree /data/files/
/data/files/
├── dl.k8s.io
│   └── release
│       └── v1.31.4
│           └── bin
│               └── linux
│                   └── amd64
├── get.helm.sh
├── github.com
│   ├── cilium
│   │   └── cilium-cli
│   │       └── releases
│   │           └── download
│   │               └── v0.16.0
......

根据生成的文件,下载组件到指定路径

# 外网代理地址:https_proxy="http://192.168.111.1:10811"
$ cat /opt/k8s/url_and_paths.list | awk -F' ' '{ cmd="https_proxy=http://192.168.111.1:10811 wget --no-check-certificate " $1 " -P " $2; system(cmd) }'

完整的目录

$ tree /data/files/

私网部署相关配置

修改文件下载地址

# 修改 files_repo: "http://admin.kubespray.local:8000/files" 
$ sed -ri 's@^#* *(files_repo:).*@\1 "http://admin.kubespray.local:8000/files"@' /opt/kubespray/inventory/kubespray.local/group_vars/all/offline.yml

修改私有镜像仓库地址

注:Harbor 的项目名为 k8s-cluster

# 修改 registry_host: "admin.kubespray.local/k8s-cluster"
$ sed -ri 's@^#* *(registry_host:).*@\1 "admin.kubespray.local/k8s-cluster"@' /opt/kubespray/inventory/kubespray.local/group_vars/all/offline.yml

修改 DNF 源

# 修改 yum_repo: "http://admin.kubespray.local:8000/repos/apps/almalinux/"
$ sed -ri 's@^#* *(yum_repo:).*@\1 "http://admin.kubespray.local:8000/repos/apps/almalinux/"@' /opt/kubespray/inventory/kubespray.local/group_vars/all/offline.yml

确定需要私网下载的资源

# 放开容器私网下载
$ sed -ri 's@^#* +(.*registry_host.*)@\1@g' /opt/kubespray/inventory/kubespray.local/group_vars/all/offline.yml


# 文件资源私网下载
$ sed -ri 's@^#* +(.*files_repo.*)@\1@g' /opt/kubespray/inventory/kubespray.local/group_vars/all/offline.yml


部署 Kubernetes

部署 K8S

# 进入虚拟环境
$ VENVDIR=kubespray-venv && cd /opt/ && source $VENVDIR/bin/activate && cd /opt/kubespray/

# 开启日志
$ vim /opt/kubespray/inventory/kubespray.local/group_vars/all/all.yml
unsafe_show_logs: true

# 部署
$ ansible-playbook -i /opt/kubespray/inventory/kubespray.local/inventory.ini  \
  --become --become-user=root -b /opt/kubespray/cluster.yml

检查状态

cp 节点执行

$ kubectl get nodes
NAME                           STATUS   ROLES           AGE     VERSION
kube-cp-01.kubespray.local     Ready    control-plane   8m49s   v1.31.4
kube-node-01.kubespray.local   Ready    <none>          7m22s   v1.31.4
kube-node-02.kubespray.local   Ready    <none>          7m21s   v1.31.4
kube-node-03.kubespray.local   Ready    <none>          7m21s   v1.31.4

删除节点

# 进入虚拟环境
$ VENVDIR=kubespray-venv && cd /opt/ && source $VENVDIR/bin/activate

$ cd /opt/kubespray/

# 执行 remove-node.yml,限制只在 node=kube-node-02.kubespray.local进行修改
$ ansible-playbook \
  -i /opt/kubespray/inventory/skynemo.cn/hosts.yaml  \
  -b /opt/kubespray/remove-node.yml \
  -e "node=kube-node-02.kubespray.local" --limit=kube-node-02.kubespray.local


# 在主机清单中删除对应的节点信息
$ vim /opt/kubespray/inventory/skynemo.cn/hosts.yaml


# 查看节点信息
$ kubectl get nodes

添加节点

参考:kubespray/nodes.md at master · kubernetes-sigs/kubespray (github.com)

# 编辑主机清单文件
$ vim /opt/kubespray/inventory/skynemo.cn/hosts.yaml

# 执行,限制在 kube-node-02.kubespray.local 进行修改
$ ansible-playbook \
  -i /opt/kubespray/inventory/skynemo.cn/hosts.yaml \
  /opt/kubespray/cluster.yml \
  -b --limit=kube-node-02.kubespray.local

# 注:非控制面节点可以使用 scale.yml,控制面节点只能够使用 cluster.yml


# 查看节点信息
$ kubectl get nodes

测试

下载私有仓库镜像

创建凭证

# 方式一:通过命令直接创建
$ kubectl create secret docker-registry secret-ali-acr  \
    --docker-email=sky.nemo@outlook.com  \
    --docker-username=18487357220   \
    --docker-password=20141040Ezra..   \
    --docker-server=registry.cn-hangzhou.aliyuncs.com

# 方式二:通过 docker 认证文件创建 secret
$ kubectl create secret generic secret-ali-acr-2 \
    --from-file=.dockerconfigjson=/root/.docker/config.json \
    --type=kubernetes.io/dockerconfigjson

创建测试 yaml

mkdir -p demo

$ cat <<  EOF > ./demo/stress-ng.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress-ng
  labels:
    app.kubernetes.io/name: stress-ng
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: stress-ng
  template:
    metadata:
      labels:
        app.kubernetes.io/name: stress-ng
    spec:
      # 业务容器
      containers:
      - name: stress-ng
        image: registry.cn-hangzhou.aliyuncs.com/kmust/stress-ng-alpine:0.14.00-r0
        # 资源限制
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        args: ["--cpu", "1", "--vm", "1", "--vm-bytes", "50M"]
      imagePullSecrets:
        - name: secret-ali-acr

EOF

应用

kubectl apply -f ./demo/stress-ng.yaml

测试域名及网络

mkdir -p demo

$ cat << EOF  > ./demo/nginx.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
      #securityContext:
      #  sysctls:
      #    - name: net.ipv4.tcp_fin_timeout
      #      value: "30"
      #    - name: net.ipv4.tcp_keepalive_time
      #      value: "1200"
      #    - name: net.ipv4.tcp_keepalive_intvl
      #      value: "30"
      #    - name: net.ipv4.tcp_keepalive_probes
      #      value: "3"
      #    - name: net.core.somaxconn
      #      value: "1024"
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: ClusterIP
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

EOF

部署 busybox

mkdir -p demo

$ cat << EOF > ./demo/busybox-1.28.4.yaml 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox-selector
  template:
    metadata:
      labels:
        app: busybox-selector
    spec:
      restartPolicy: Always
      containers:
      - name: busybox
        command:
        - sleep
        - "864000"
        image: busybox:1.28.4
        resources:
          requests:
            cpu: 50m
            memory: "32Mi"
          limits:
            cpu: 1
            memory: "512Mi"

EOF

应用

kubectl apply -f ./demo/nginx.yaml


kubectl apply -f ./demo/busybox-1.28.4.yaml

测试

$ kubectl exec -it deploy/busybox -- sh
/ # nslookup nginx
Server:    10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local

Name:      nginx
Address 1: 10.233.62.127 nginx.default.svc.cluster.local


/ # wget -O - nginx.default.svc.kubespray.local


其他设置

放开 DNF 仓库文件

进入之前的虚拟环境

# 进入虚拟环境
$ VENVDIR=kubespray-venv && cd /opt/ && source $VENVDIR/bin/activate && cd /opt/k8s

恢复所有仓库

# 删除当前的仓库配置
ansible k8s_hosts -m shell -a "rm -rf /etc/yum.repos.d/" -b

# 恢复原本的备份
$ ansible k8s_hosts -m copy -a "src=/etc/yum.repos.d.bak-`date +'%Y%m%d'`/ dest=/etc/yum.repos.d remote_src=yes" -b

创建缓存

$ ansible k8s_hosts -m shell -a "dnf makecache" --become

配置 kubectl 命令补全

官方参考文档:kubectl 的可选配置和插件 | Kubernetes

kubespray 部署默认已安装

安装 Helm(二进制)

下载地址:Releases · helm/helm (github.com)

每个 Helm 版本都提供了各种操作系统的二进制版本,可以手动下载和安装

# 设置版本
$ HELM_VERSION=v3.18.0
# 设置路径
$ HELM_HOME=/opt/helm

# 创建目录
$ mkdir -p ${HELM_HOME}


# 下载
$ curl -o /usr/local/src/helm-${HELM_VERSION}-linux-amd64.tar.gz  \
    -L https://get.helm.sh/helm-${HELM_VERSION}-linux-amd64.tar.gz

# 通过代理下载
$ curl -x http://192.168.111.1:10811 \
    -o /usr/local/src/helm-${HELM_VERSION}-linux-amd64.tar.gz  \
    -L https://get.helm.sh/helm-${HELM_VERSION}-linux-amd64.tar.gz

# 解压
$ tar -zxvf /usr/local/src/helm-${HELM_VERSION}-linux-amd64.tar.gz -C ${HELM_HOME}


# 配置路径
$ cat << EOF > /etc/profile.d/helm.sh
export HELM_HOME=${HELM_HOME}
export PATH=\${PATH}:\${HELM_HOME}/linux-amd64/
EOF


$ source /etc/profile

安装 krew

参考:Installing · Krew (k8s.io)

安装 git 命令

$ dnf -y install git

下载安装

# 设置外网 HTTPS_PROXY
export HTTPS_PROXY=http://192.168.111.1:10811

(
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -k -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)

配置路径

$ cat > /etc/profile.d/krew.sh << EOF
# krew enviroment
export PATH="${KREW_ROOT:-$HOME/.krew}/bin:\$PATH"
EOF


$ source /etc/profile

检查

$ kubectl krew

测试安装插件

kubectl krew update

kubectl krew install sniff

安装插件(先下载后安装)

plugin=access-matrix

os=$(uname | tr '[:upper:]' '[:lower:]')
arch=$(uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/')
version=$(curl -s https://api.github.com/repos/kubernetes-sigs/krew-index/contents/plugins/$plugin.yaml \
  | grep '"download_url"' | cut -d '"' -f 4)
  

# 下载 yaml
$ curl -O https://raw.githubusercontent.com/kubernetes-sigs/krew-index/master/plugins/$plugin.yaml


# 查看下载地址
$ grep 'uri:' $plugin.yaml
    uri: https://github.com/corneliusweig/rakkess/releases/download/v0.5.0/access-matrix-amd64-linux.tar.gz
    uri: https://github.com/corneliusweig/rakkess/releases/download/v0.5.0/access-matrix-amd64-darwin.tar.gz
    uri: https://github.com/corneliusweig/rakkess/releases/download/v0.5.0/access-matrix-arm64-darwin.tar.gz
    uri: https://github.com/corneliusweig/rakkess/releases/download/v0.5.0/access-matrix-amd64-windows.zip

# 下载
$ curl -LO  https://github.com/corneliusweig/rakkess/releases/download/v0.5.0/access-matrix-amd64-linux.tar.gz


# 安装
$ kubectl krew install --manifest=$plugin.yaml --archive=access-matrix-amd64-linux.tar.gz
Installing plugin: access-matrix
Installed plugin: access-matrix
\
 | Use this plugin:
 | 	kubectl access-matrix
 | Documentation:
 | 	https://github.com/corneliusweig/rakkess
 | Caveats:
 | \
 |  | Usage:
 |  |   kubectl access-matrix
 |  |   kubectl access-matrix for pods
 | /
/

安装 Metric Server

参考(GitHub):kubernetes-sigs/metrics-server

参考(Kubernetes):Resource metrics pipeline | Kubernetes

Metrics Server 是 Kubernetes 内置自动缩放管道的可扩展、高效的容器资源指标来源

Metrics Server 从 Kubelets 收集资源指标,并通过 Metrics API 在 Kubernetes apiserver 中公开这些指标,供 Horizontal Pod Autoscaler 和 Vertical Pod Autoscaler 使用。指标 API 也可以通过 kubectl top [node|pod] 访问

注:Metrics Server 不适用于指标监控,如果需要指标监控,请直接从 Kubelet /metrics/resource 端点收集指标

部署最新版本

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

指定版本部署(推荐)

下载 yaml 文件

METRICS_SERVER_VERSION=v0.7.2

# 可以提前下载镜像
export HTTPS_PROXY=http://192.168.111.1:10811

nerdctl -n k8s.io pull registry.k8s.io/metrics-server/metrics-server:${METRICS_SERVER_VERSION}


mkdir -p kube-pkg/metrics-server

cd kube-pkg/metrics-server

# 下载
$ curl -LO https://github.com/kubernetes-sigs/metrics-server/releases/download/${METRICS_SERVER_VERSION}/components.yaml

修改配置

# 修改配置
$ vim components.yaml
......
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        # 添加配置:不校验 kubelet 的证书
        - --kubelet-insecure-tls
        image: registry.k8s.io/metrics-server/metrics-server:v0.7.2
        imagePullPolicy: IfNotPresent
......

应用

kubectl apply -f ./components.yaml

检查

检查 Pod 运行状态

$ kubectl get pods -n kube-system
# 输出信息
......
metrics-server-8467fcc7b7-4gnfl                      1/1     Running   0          28s
......

检查 Metric Server API 是否正常

$ kubectl top nodes
# 输出信息
NAME                           CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)   
kube-cp-01.kubespray.local     189m         13%      1436Mi          52%         
kube-node-01.kubespray.local   67m          4%       846Mi           97%         
kube-node-02.kubespray.local   78m          5%       874Mi           100%        
kube-node-03.kubespray.local   77m          5%       904Mi           103% 
......

ETCD 操作

控制面节点操作

export ETCD_IPS="192.168.111.191 192.168.111.192 192.168.111.193"


for ip in ${ETCD_IPS};do 
  ETCDCTL_API=3
  etcdctl \
  --endpoints=https://${ip}:2379 \
  --cacert=/etc/ssl/etcd/ssl/ca.pem \
  --cert=/etc/ssl/etcd/ssl/admin-kube-cp-01.pem \
  --key=/etc/ssl/etcd/ssl/admin-kube-cp-01-key.pem \
  endpoint health; 
done
#### 输出信息
https://192.168.111.191:2379 is healthy: successfully committed proposal: took = 30.631813ms
https://192.168.111.192:2379 is healthy: successfully committed proposal: took = 33.708944ms
https://192.168.111.193:2379 is healthy: successfully committed proposal: took = 35.858495ms


for ip in ${ETCD_IPS};do 
  ETCDCTL_API=3
  etcdctl \
  --endpoints=https://${ip}:2379 \
  --cacert=/etc/ssl/etcd/ssl/ca.pem \
  --cert=/etc/ssl/etcd/ssl/admin-kube-cp-01.pem \
  --key=/etc/ssl/etcd/ssl/admin-kube-cp-01-key.pem \
  endpoint status \
  --write-out=table; 
done


CoreDNS 设置

新增对外部主机的映射,此处仅作为示例,一般映射的都是非 k8s 的主机

$ kubectl edit configmap coredns -n kube-system


apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors {
        }
        health {
            lameduck 5s
        }
        ready
        hosts {
            192.168.111.191 kube-cp-01.kubespray.local
            192.168.111.192 kube-node-01.kubespray.local
            192.168.111.193 kube-node-02.kubespray.local
            192.168.111.194 kube-node-03.kubespray.local
            fallthrough
        }
        kubernetes kubespray.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 114.114.114.114 223.5.5.5 119.29.29.29 8.8.8.8 {
          prefer_udp
          max_concurrent 1000
        }
        cache 30

        loop
        reload
        loadbalance
    }
# 以下为配置示例,根据实际情况配置
    db.local:53 {
        errors
        hosts {
            192.168.111.191 mysql-01.db.local
            192.168.111.192 mysql-02.db.local
            192.168.111.193 mysql-03.db.local
            192.168.111.194 mysql-04.db.local
            fallthrough
        }
    }
    
......

附录

证书操作

查看证书有效期

$ kubeadm certs check-expiration

# 输出信息
[check-expiration] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
[check-expiration] Use 'kubeadm init phase upload-config --config your-config.yaml' to re-upload it.
W0408 17:47:55.938978    9625 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Apr 08, 2026 07:13 UTC   364d            ca                      no      
apiserver                  Apr 08, 2026 07:13 UTC   364d            ca                      no      
apiserver-kubelet-client   Apr 08, 2026 07:13 UTC   364d            ca                      no      
controller-manager.conf    Apr 08, 2026 07:13 UTC   364d            ca                      no      
front-proxy-client         Apr 08, 2026 07:13 UTC   364d            front-proxy-ca          no      
scheduler.conf             Apr 08, 2026 07:13 UTC   364d            ca                      no      
super-admin.conf           Apr 08, 2026 07:13 UTC   364d            ca                      no      

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Apr 06, 2035 07:13 UTC   9y              no      
front-proxy-ca          Apr 06, 2035 07:13 UTC   9y              no    

更新证书有效期

需要重启 kube-apiserverkube-controller-managerkube-scheduleretcd 后才会生效,kubeadm 管理的集群可以选择 重启 kubelet

在所有控制面节点依次执行以下命令

# 更新证书
$ kubeadm certs renew all

# 重启使证书生效
$ systemctl restart kubelet

etcd 证书操作

查看证书有效期

openssl x509 -in /etc/ssl/etcd/ssl/ca.pem  -noout --dates
notBefore=Apr  8 07:10:11 2025 GMT
notAfter=Mar 15 07:10:11 2125 GMT



$ openssl x509 -in /etc/ssl/etcd/ssl/admin-kube-cp-01.kubespray.local.pem  -noout --dates
notBefore=Apr  8 07:10:12 2025 GMT
notAfter=Mar 15 07:10:12 2125 GMT

Token 操作

当使用 --upload-certs 调用 kubeadm init 时,主控制平面的证书被加密并上传到 kubeadm-certs Secret 中,生成 Token。其他节点加入集群时,需要提供 Token 和密钥(由 --certificate-key 选项指定)

查看 Token有效期

$ kubeadm token list
# 输出信息
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
abcdef.0123456789abcdef   23h         2024-04-03T05:39:20Z   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token
grwcij.envbizeafuqxwajz   1h          2024-04-02T07:39:19Z   <none>                   Proxy for managing TTL for the kubeadm-certs secret        <none>

离线升级 pip3

下载 pip3 软件包

下载地址:pip · PyPI

$ wget https://files.pythonhosted.org/packages/47/6a/453160888fab7c6a432a6e25f8afe6256d0d9f2cbd25971021da6491d899/pip-23.3.1-py3-none-any.whl

放到离线主机升级

# 进入虚拟环境
$ VENVDIR=kubespray-venv
$ cd /opt/ && source $VENVDIR/bin/activate
# 升级
$ /opt/kubespray-venv/bin/python3 -m pip install --upgrade ./pip-23.3.1-py3-none-any.whl

评论