IT老男孩 - 系统玩家

Unexpected EOF during watch stream event decoding

发表:2023-07-11|更新:2023-07-14|分类:kubernetes

本文永久链接: https://www.xtplayer.cn/kubernetes/unexpected-eof-during-watch-stream-event-decoding/ 在 rke 部署的 k8s 集群架构中，只有 worker 角色的节点，会在节点上以 host 网络模式运行一个 nginx-proxy 容器，这个容器代理到 apiserver 服务然后保留 6443 端口以供节点上的 k8s 组件连接（比如 kubelet、kube-proxy）。节点上的 kubelet 容器等 k8s 组件，会直接通过连接 https://127.0.0.1:6443 的方式来了解 apiserver。因此在 kubelet 或者 kube-proxy 组件中可以看到类似如下的信息。 I0806 14:27:53.862021 5166 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOFI0806 14:27:53.863139 5166 ...

rancher 2.6 logging output alyun oss

发表:2023-06-10|更新:2025-05-22|分类:rancher

本文永久链接: https://www.xtplayer.cn/rancher/rancher-2-6-logging-output-alyun-oss-example/ 参考 https://kube-logging.dev/docs/configuration/plugins/outputs/secret/ 创建授权密文 apiVersion: v1kind: Secrettype: Opaquemetadata: annotations: logging.banzaicloud.io/default: watched name: oss-auth-secret namespace: cattle-logging-systemdata: username: <base64 加密> password: <base64 加密> 创建 ClusterOutput，oss 插件配置参数参考 https://kube-logging.dev/docs/configuration/plugins/outputs/oss/ ，buffer 配置参数参 ...

rancher 2.6 版本项目用户查看监控指标

发表:2023-06-10|更新:2025-05-22|分类:rancher

本文永久链接: https://www.xtplayer.cn/rancher/rancher-2-6-project-user-view-monitor-metric/ 在给项目添加成员时，可以添加项目所有者、项目成员、项目只读用户，或者自定义的某些项目权限。有时候可能需要项目用户也可以查看应用的监控图表，但默认配置下以上的这些权限都无法查看应用的监控指标。为了满足这个需求，我们需要自定义一个集群权限。操作方法在用户&认证|角色中创建一个集群角色，授予 services/proxy 资源的 create 和 get 权限。也可以使用以下的 YAML 文件直接在 local 集群创建。 apiVersion: management.cattle.io/v3builtin: falsecontext: clusterdescription: show-metricdisplayName: show-metricexternal: falsehidden: falsekind: RoleTemplatemetadata: annotations ...

Failed to start ContainerManager failed to build map of initial containers from runtime: no PodsandBox found with Id

发表:2023-06-10|更新:2023-06-10|分类:kubernetes

本文永久链接: https://www.xtplayer.cn/kubernetes/no-podsandbox-found/ 根据 issue https://github.com/kubernetes/kubelet/issues/21 说明，在相对较老的 k8s 版本中，当从 runtimeservice 收到不一致的容器列表时，Kubelet 进入循环重启状态。在 kubelet 日志中可以看到如下的错误信息，这些错误信息会持续的循环打印 {"log":"F0609 16:18:40.349779 48606 kubelet.go:1386] Failed to start ContainerManager failed to build map of initial containers from runtime: no PodsandBox found with Id 'ad00f282abdb54fbb90b357ae79e9aeeb89ca33054fca207c2ca7c1522a742d3'\n&qu ...

rke2 coredns 自定义配置

发表:2023-02-07|更新:2025-05-22|分类:coredns

本文永久链接: https://www.xtplayer.cn/coredns/rke2-coredns-custom-config/ 在部署 rke2 集群后，会自动部署一个 cluster agent 服务去连接 rancher server。对于内部没有 dns 服务器的开发环境，这个时候查看 cluster agent pod 日志，就会出现以下的错误日志：可能你在主机的 /etc/hosts 中配置了 HostAliases，但是因为 cluster agent 是 pod/容器运行，它无法读取到主机的 HostAliases。为了避免 cluster agent 无法连接 rancher server 导致 rke2 集群不可用，rke2 增加了自定义 corends 配置的功能。执行以下命令查看 coredns 默认配置 root@rke2-1:~# kubectl -n kube-system get configmaps rke2-coredns-rke2-coredns -oyamlapiVersion: v1d ...

rke2 coredns 自定义配置

发表:2023-02-07|更新:2025-05-22|分类:rke2

本文永久链接: https://www.xtplayer.cn/rke2/rke2-coredns-custom-config/ 在部署 rke2 集群后，会自动部署一个 cluster agent 服务去连接 rancher server。对于内部没有 dns 服务器的开发环境，这个时候查看 cluster agent pod 日志，就会出现以下的错误日志：可能你在主机的 /etc/hosts 中配置了 HostAliases，但是因为 cluster agent 是 pod/容器运行，它无法读取到主机的 HostAliases。为了避免 cluster agent 无法连接 rancher server 导致 rke2 集群不可用，rke2 增加了自定义 corends 配置的功能。执行以下命令查看 coredns 默认配置 root@rke2-1:~# kubectl -n kube-system get configmaps rke2-coredns-rke2-coredns -oyamlapiVersion: v1data ...

rke2 节点初始化

发表:2023-02-07|更新:2023-02-07|分类:rke2

本文永久链接: https://www.xtplayer.cn/rke2/rke2-node-init/ 在 rke2 集群创建后，没有把 rke2 bin 目录添加到主机环境变量，在节点维护时需要通过完全路径或者切换到指定的目录下才能执行对应的命令。并且也未把 kubectl 和 crictl 配置文件放在默认路径。以至于执行 kubectl 或者 crictl 需要指定配置文件路径。为了方便后期维护，以下脚本可以对 rke2 节点进行简单的初始化，仅供参考。 #!/bin/bash# 如果是离线环境，则在此处定义内部 dns 服务器 ip，一行一个。NAMESERVER_LIST="114.114.114.114223.5.5.5"docker_check(){ if [ $( which dockerd >> /dev/null 2>&1; echo ${?} ) = 0 ]; then echo "rke2 节点中不建议同时运行 docker 服务，建议卸载 dock ...

rke2 常用命令

发表:2023-02-04|更新:2023-02-04|分类:rke2

本文永久链接: https://www.xtplayer.cn/rke2/rke2-common-command/ Installcurl -sL https://get.rke2.io | shsystemctl daemon-reloadsystemctl start rke2-server Various exploration/debug commmands for RKE2 binariesls -1 /var/lib/rancher/rke2/bin/*/var/lib/rancher/rke2/bin/containerd/var/lib/rancher/rke2/bin/containerd-shim/var/lib/rancher/rke2/bin/containerd-shim-runc-v1/var/lib/rancher/rke2/bin/containerd-shim-runc-v2/var/lib/rancher/rke2/bin/crictl/var/lib/rancher/rke2/bin/ctr/var/lib/rancher/rke2/ ...

Rancher Logging v2 启用与配置优化

发表:2022-10-31|更新:2025-05-22|分类:rancher

本文永久链接: https://www.xtplayer.cn/rancher/rancher-logging-v2-configuration-optimization/ 从 Rancher v2.6.x 开始，原来的 rancher logging v1 功能将被弃用， Banzai Cloud Logging operator 将取代原来的日志搜集功能。 Banzai Cloud Logging Operator 工作原理Logging Operator 自动部署和配置 Kubernetes 日志流水线。它会在每个节点上部署和配置一个 Fluent Bit DaemonSet，从而收集节点文件系统中的容器和应用程序日志。 Fluent Bit 查询 Kubernetes API 并使用 pod 的元数据来丰富日志，然后将日志和元数据都传输到 Fluentd。Fluentd 会接收和过滤日志并将日志传输到多个Output。以下自定义资源用于定义了如何过滤日志并将日志发送到 Output： Flow 是一个命名空间自定义资源，它使用过滤器和选择器将日志消息路由到对应的 Outp ...

ClusterUnavailable 503 cluster not found

发表:2022-10-31|更新:2025-05-22|分类:rancher|标签:ClusterUnavailable

本文永久链接: https://www.xtplayer.cn/rancher/clusterunavailable-503-cluster-not-found/ 如图所示，在升级到 rancher v2.5.16之后，可能会出现点击集群无法进入集群首页，rancher ui 一直卡住，接着页面右上角出现 ClusterUnavailable 503 cluster not found 的错误提示。查看 rancher pod 可以看到以下的错误日志： {"log":"2022/10/25 12:31:48 [ERROR] failed on subscribe storageClass: ClusterUnavailable 503: ClusterUnavailable 503: cluster not found\r\n","stream":"stdout","time":"2022-10-25T12:31:48.660186963Z"&# ...

Waiting for node to register. Either cluster is not ready for registering or etcd and controlplane node have to be registered first

发表:2022-09-17|更新:2023-06-17|分类:rancher

本文永久链接: https://www.xtplayer.cn/rancher/waiting-for-node-to-register-either-cluster-is-not-ready-for-registering-or-etcd-and-controlplane-node-have-to-be-registered-first/ INFO: Environment: CATTLE_ADDRESS=10.1xx.xx.xx CATTLE_AGENT_CONNECT=true CATTLE_CA_CHECKSUM=99e6ccda7c91855xxxxxxxxx4f760c0278713b95b30ab0616b66df1a CATTLE_CLUSTER=false CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cncxxxx060vl CATTLE_SERVER=https://rancher.xxxx.comINFO: Using resolv.conf: nameserver 10.1 ...

如何查看 MySQL 数据库容量大小，表容量大小，索引容量大小？

发表:2022-05-17|更新:2025-05-22|分类:mysql

本文永久链接: https://www.xtplayer.cn/mysql/how-to-get-the-sizes-of-the-tables-of-a-mysql-database/ 查看 MySQL「所有库」的容量大小SELECT table_schema as '数据库',sum(table_rows) as '记录数',sum(truncate(data_length/1024/1024, 2)) as '数据容量(MB)',sum(truncate(index_length/1024/1024, 2)) as '索引容量(MB)',sum(truncate(DATA_FREE/1024/1024, 2)) as '碎片占用(MB)'from information_schema.tablesgroup by table_schemaorder by sum(data_length) desc, sum(index_length) desc; 特别提示：data_leng ...

node-agent 报错 Either cluster is not ready for registering, cluster is currently provisioning, or etcd, controlplane and worker node have to be registered

发表:2022-05-14|更新:2022-05-14|分类:rancher

本文永久链接: https://www.xtplayer.cn/rancher/either-cluster-is-not-ready-for-registering/ 问题背景某天一客户突然反馈 rancher local 集群节点 32G 内存资源耗尽，local 集群中只运行了 rancher 相关业务。经过一系列排查，发现耗尽内存的是 k8s apiserver 进程。在 apiserver 容器日志中发现了大量以下日志： {"log":"I0507 08:24:37.492998 1 pathrecorder.go:253] kube-apiserver: \"/apis/management.cattle.io/v3/namespaces/c-dql6k/nodes/machine-h46dc\" satisfied by NotFoundHandler\n","stream":"stderr","time":"2022 ...

Prometheus Adapter 安装

发表:2022-04-04|更新:2022-04-04|分类:prometheus

本文永久链接: https://www.xtplayer.cn/prometheus/prometheus-adapter/ Prometheus 准备从 Rancher v2.4.8-ent（监控 chart 版本 0.1.2000）开始，只能通过应用商店跳转的链接访问 Prometheus，直接通过 Pod IP 、svc 或者 Nodeport 无法访问。如果想通过 Nodeport 或者 Ingress 代理访问 Prometheus ，配置方法请访问通过 Nodeport 或者 Ingress 访问 Prometheus。对于集群监控 chart 版本高于 0.1.4001 的环境，在集群监控配置页添加应答 prometheus.serviceNodePort = true，将会自动创建一个 Nodeport svc，通过 Nodeport 端口即可访问。 Prometheus Adapter 安装Prometheus Adapter chart 地址：https://github.com/prometheus-community/helm-charts/tree/ma ...

自定义集群监控参数

发表:2022-03-28|更新:2022-04-04|分类:prometheus

本文永久链接: https://www.xtplayer.cn/prometheus/custom-parameter/ 默认的集群监控配置可能不适用于所有的环境，比如内存 limit 大小，需要根据实际需求进行参数的调整。调整组件内存有时候可能会发现 prometheus-cluster-monitoring-0 Pod 中的 prometheus 容器在反复重启，在 promethues 容器日志中并未发现异常错误。类似的错误还在 prometheus-agent、prometheus-proxy 都可能会出现。如果出现以上现象，说明很有可能是容器内存超过了限制值，容器进程被强制 kill ，导致容器频繁重启。调整 prometheus 内存在集群|工具|监控配置页面中，可以看到如图的限制配置，可以适当的调整，比如 Prometheus CPU 限制设置为 4000，Prometheus 内存限制设置为 8192，Node Exporter CPU 限制设置为 500，Node Exporter 内存限制设置为 500。调整其他组件在高级选项中添加以下应答 ...

全新论坛启用，Rancher 中文社区迈入新阶段

发表:2022-03-25|更新:2022-04-04|分类:rancher

本文永久链接: https://www.xtplayer.cn/rancher/rancher-new-forums/ 伴随着 Rancher 旗下各种开源产品的快速发展，Rancher 中文社区群体日益壮大。统计微信群和 QQ 群等主要聚集地，社区群体数量已达数万人之众。为了能够更好地发挥社区互助的力量，Rancher 中文社区 (https://forums.rancher.cn) 正式启用新的论坛，并于今日正式对外开放。写在前面在社区发展的过程中，微信社交软件拥有不可磨灭的贡献，Rancher 社区通过各种市场活动，吸纳技术群体加入到微信群进行交流，相互切磋。然而，随着社区群体的急速壮大，以微信群为主的社区治理模式逐渐遇到发展瓶颈。其最大的痛点在于知识传递障碍。早期成立的微信群中，经过多轮打磨，其成员普遍技艺精湛。而相对成立较晚的微信群，很多还在挣扎于很基础的安装部署。同时，社区的管理人员需要付出大量精力在割裂的群体中传递这些知识。其次，微信在本质上并不是适合社区交流的通信工具，其内容无法被有效的索引到搜索引擎，每个群500人的限制可能无法找到合适的交流对象，并且会干扰工作 ...

rke1 coreDNS 加速外部域名解析

发表:2021-07-17|更新:2024-09-02|分类:coredns|标签:coredns•coredns 最佳实践•coreDNS 加速外部域名解析

本文永久链接: https://www.xtplayer.cn/coredns/accelerate-external-domain-resolution/ 问题背景有时候业务可能对 DNS 解析有很高要求，通过以下脚本循环去访问一个域名，时而会出现解析到过 1s 的错误提示。 for i in `seq 1 500`;do curl -LSs -I api.mch.weixin.qq.com --connect-timeout 1 | grep HTTP/1.1done 问题分析在 coredns 配置中添加 log 参数可以打印详细的请求日志。 .:53 { log errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } tem ...

AI at the Edge with K3s and NVIDIA Jetson Nano: Object Detection and Real-Time Video Analytics

发表:2021-04-21|更新:2025-05-22|分类:longhorn

本文永久链接: https://www.xtplayer.cn/longhorn/ai-at-the-edge-with-k3s-nvidia-jetson-nano-object-detection-real-time-video-analytics-src/ With the advent of new and powerful GPU-capable devices, the possible use cases that we can execute at the edge are expanding. The edge is growing in size and getting more efficient as technology advances. NVIDIA, with its industry-leading GPUs, and Arm, the leading technology provider of processor IP, are making significant innovations and investments in the edge ...

MountVolume.WaitForAttach failed for volume with “structure needs cleaning”

发表:2021-03-24|更新:2025-05-22|分类:kubernetes

本文永久链接: https://www.xtplayer.cn/kubernetes/mountvolume-waitforattach-failed-for-volume-with-structure-needs-cleaning/ 问题MountVolume.WaitForAttach failed for volume with “structure needs cleaning” 2:02:45 PM Warning Failed MountMountVolume.WaitForAttach failed for volume "pvc-62c6563e-8cac-11e9-bba9-005056b0bf31" : Heuristic determination of mount point failed:stat /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/iscsi/iface-default/XX.XX.XX.XX:3260-iqn.2016-12.org.gluster ...

记一次 controller manager and scheduler unavailable 问题分析

发表:2021-03-24|更新:2021-03-24|分类:kubernetes

本文永久链接: https://www.xtplayer.cn/kubernetes/controller-manager-and-scheduler-unavailable/ 问题背景用户反馈经常收到告警提示: controller manager and scheduler unavailable。问题分析 controller manager 和 scheduler 都是通过连接 API SERVER 去读写数据，假如 API SERVER 出现异常无法访问，将会影响 controller manager 和 scheduler 运行。 API SERVER 运行依赖 ETCD 服务，如果 ETCD 不可访问或者不可读写，那么 API SERVER 也无法向 controller manager 和 scheduler 或者其他连接 API SERVER 的应用提供服务。有很多因素导致 ETCD 服务不可访问或者不可读写，比如：网络断开或者网络闪断；三个 ETCD 节点丢失三个，最后一个节点将变成只读模式；多个 ETCD 实例之间会通过 2380 端口通信 ...

变更 Rancher Server IP 或域名

发表:2021-03-23|更新:2025-05-22|分类:rancher|标签:rancher•更换域名

本文永久链接: https://www.xtplayer.cn/rancher/replace-ip-domain/ 注意：此文档主要适用于 rancher 2.6 之前的版本准备全部集群的直连 kubeconfig 配置文件默认情况，在 Rancher UI 上复制的 kubeconfig 是通过 cluster agent 代理连接到 K8S 集群的。在变更 SSL 证书后，因为一些参数发送变化，需要通过 kubectl 命令行去修改配置。在变更 SSL 证书后会导致 cluster agent 无法连接 Rancher server，从而导致 kubectl 无法使用 Rancher UI 上复制的 kubeconfig 去操作 K8S 集群。因此，建议在做域名或 IP 变更之前，准备好所有集群的直连 kubeconfig 配置文件。具体可参考：恢复 kubectl 配置文件提示 2.1.x 以前的版本，可在 Master 节点的 /etc/kubernetes/.tmp 路径下找到 kubecfg-kube-admin.yml，这个是具有集群管理员权限的直连 kub ...

本地搜索