用户工具

站点工具


02-工程实践:kubernetes:issue:apiserver响应慢

dashboard响应慢

还是慢应该是因为heapster,见:

设置很大的–metric-client-check-period参数

可能是因为metric-server慢metrics-server响应慢

首先删除原来dashboard相关的所有对象,然后使用github仓库的yaml更新dashboard之后,速度挺快,但是不能运行命令了,报错

WebSocket connection to 'wss://10.10.9.4/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/api/sockjs/763/eubuhr2e/websocket?c077088b8a8a551aac03da8466d246f7' failed: Error during WebSocket handshake: Invalid status line
t @ vendor.bd425c26.js:135
r @ vendor.bd425c26.js:135
i._connect @ vendor.bd425c26.js:135
i._receiveInfo @ vendor.bd425c26.js:135
n @ vendor.bd425c26.js:135
r.emit @ vendor.bd425c26.js:135
(anonymous) @ vendor.bd425c26.js:135
n @ vendor.bd425c26.js:135
r.emit @ vendor.bd425c26.js:135
(anonymous) @ vendor.bd425c26.js:135
n @ vendor.bd425c26.js:135
r.emit @ vendor.bd425c26.js:135
xhr.onreadystatechange @ vendor.bd425c26.js:135
Mixed Content: The page at 'https://10.10.9.4/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/shell/default/init-demo-78b69bbb59-9gtqz/?namespace=default' was loaded over HTTPS, but requested an insecure script 'http://cdn.sockjs.org/sockjs-0.3.min.js'. This request has been blocked; the content must be served over HTTPS.

kuectl, dashboard,有一定比例响应很慢(30s左右)

可能是etcd响应慢的问题?(虚拟机性能异常,iops很低,正常虚拟机iops能到350,物理机能到2500左右),换物理机试试

使用fio测试使用中的磁盘非常危险,会随机写入,损坏磁盘文件,发现以下现象

  • grub2-editenv list报错grub2-editenv error invalid environment block,/boot/grub2/grubenv乱码
  • 机器重启后无法启动
  • yum无法使用
  • php配置文件出现乱码
# fio -filename=/dev/vda -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=30 -group_reporting -name=mytest
mytest: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 10 threads
Jobs: 10 (f=10): [w(10)][100.0%][r=0KiB/s,w=528KiB/s][r=0,w=33 IOPS][eta 00m:00s] 
mytest: (groupid=0, jobs=10): err= 0: pid=32274: Wed Nov 14 16:41:38 2018
  write: IOPS=129, BW=2076KiB/s (2125kB/s)(61.2MiB/30179msec)
    clat (usec): min=1928, max=555597, avg=76926.54, stdev=48805.70
     lat (usec): min=1930, max=555599, avg=76929.25, stdev=48805.64
    clat percentiles (msec):
     |  1.00th=[   17],  5.00th=[   31], 10.00th=[   40], 20.00th=[   47],
     | 30.00th=[   53], 40.00th=[   60], 50.00th=[   65], 60.00th=[   74],
     | 70.00th=[   84], 80.00th=[   95], 90.00th=[  124], 95.00th=[  159],
     | 99.00th=[  275], 99.50th=[  326], 99.90th=[  514], 99.95th=[  523],
     | 99.99th=[  558]
   bw (  KiB/s): min=   32, max=  320, per=10.05%, avg=208.57, stdev=66.61, samples=599
   iops        : min=    2, max=   20, avg=13.00, stdev= 4.14, samples=599
  lat (msec)   : 2=0.03%, 4=0.05%, 10=0.15%, 20=1.71%, 50=23.96%
  lat (msec)   : 100=56.42%, 250=15.99%, 500=1.56%, 750=0.13%
  cpu          : usr=0.02%, sys=0.07%, ctx=3934, majf=0, minf=7
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,3915,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
 
Run status group 0 (all jobs):
  WRITE: bw=2076KiB/s (2125kB/s), 2076KiB/s-2076KiB/s (2125kB/s-2125kB/s), io=61.2MiB (64.1MB), run=30179-30179msec
 
Disk stats (read/write):
  vda: ios=1/4922, merge=0/601, ticks=106/393197, in_queue=400269, util=100.00%

迁移etcd到物理机

物理机资源预留

迁移步骤

迁移etcd到io性能好的机器,并和master分开部署(不分了,机器不够,一起部署到物理机上)

  • puppet修改主机名匹配规则,主机名前缀k8s-etcd部署etcd
  • 新机器使用op-stock进行初始化
  • 更新etcd使用的证书(迁移过程中证书IP只增加不减少),并执行puppet同步,生效后在新增节点
  • etcdctl member add 添加一个节点
  • 直接在线上puppet修改initial-cluster配置,增加新节点(勿删除老节点),通过puppet部署并启动新节点
  • etcdctl验证集群状态
  • 重复以上步骤添加多个节点(一个个的来)
  • kube-apiserver配置中删除待下线的节点
  • etcdctl member remove 下线节点
  • 修改puppet配置(现在可以在initial-cluster中删除老节点了),不继续在master上部署etcd

etcd挂掉会影响服务! coredns依赖etcd,会导致集群域名解析异常

request cluster ID mismatch

报错

request cluster ID mismatch (got 3732ca115e389d9b want 7c506b9725a04c0a)

原因是过早修改etcd.servcie,删除了一个老etcd节点,启动时认为是一个新集群。

解决方案是 先只新增,增加完成之后在改systemd配置文件

02-工程实践/kubernetes/issue/apiserver响应慢.txt · 最后更改: 2020/04/07 06:34 由 annhe