还是慢应该是因为heapster,见:
设置很大的–metric-client-check-period
参数
可能是因为metric-server慢: metrics-server响应慢
首先删除原来dashboard相关的所有对象,然后使用github仓库的yaml更新dashboard之后,速度挺快,但是不能运行命令了,报错
WebSocket connection to 'wss://10.10.9.4/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/api/sockjs/763/eubuhr2e/websocket?c077088b8a8a551aac03da8466d246f7' failed: Error during WebSocket handshake: Invalid status line t @ vendor.bd425c26.js:135 r @ vendor.bd425c26.js:135 i._connect @ vendor.bd425c26.js:135 i._receiveInfo @ vendor.bd425c26.js:135 n @ vendor.bd425c26.js:135 r.emit @ vendor.bd425c26.js:135 (anonymous) @ vendor.bd425c26.js:135 n @ vendor.bd425c26.js:135 r.emit @ vendor.bd425c26.js:135 (anonymous) @ vendor.bd425c26.js:135 n @ vendor.bd425c26.js:135 r.emit @ vendor.bd425c26.js:135 xhr.onreadystatechange @ vendor.bd425c26.js:135 Mixed Content: The page at 'https://10.10.9.4/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/shell/default/init-demo-78b69bbb59-9gtqz/?namespace=default' was loaded over HTTPS, but requested an insecure script 'http://cdn.sockjs.org/sockjs-0.3.min.js'. This request has been blocked; the content must be served over HTTPS.
kuectl, dashboard,有一定比例响应很慢(30s左右)
可能是etcd响应慢的问题?(虚拟机性能异常,iops很低,正常虚拟机iops能到350,物理机能到2500左右),换物理机试试
使用fio测试使用中的磁盘非常危险,会随机写入,损坏磁盘文件,发现以下现象
grub2-editenv error invalid environment block
,/boot/grub2/grubenv乱码# fio -filename=/dev/vda -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=30 -group_reporting -name=mytest mytest: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1 ... fio-3.1 Starting 10 threads Jobs: 10 (f=10): [w(10)][100.0%][r=0KiB/s,w=528KiB/s][r=0,w=33 IOPS][eta 00m:00s] mytest: (groupid=0, jobs=10): err= 0: pid=32274: Wed Nov 14 16:41:38 2018 write: IOPS=129, BW=2076KiB/s (2125kB/s)(61.2MiB/30179msec) clat (usec): min=1928, max=555597, avg=76926.54, stdev=48805.70 lat (usec): min=1930, max=555599, avg=76929.25, stdev=48805.64 clat percentiles (msec): | 1.00th=[ 17], 5.00th=[ 31], 10.00th=[ 40], 20.00th=[ 47], | 30.00th=[ 53], 40.00th=[ 60], 50.00th=[ 65], 60.00th=[ 74], | 70.00th=[ 84], 80.00th=[ 95], 90.00th=[ 124], 95.00th=[ 159], | 99.00th=[ 275], 99.50th=[ 326], 99.90th=[ 514], 99.95th=[ 523], | 99.99th=[ 558] bw ( KiB/s): min= 32, max= 320, per=10.05%, avg=208.57, stdev=66.61, samples=599 iops : min= 2, max= 20, avg=13.00, stdev= 4.14, samples=599 lat (msec) : 2=0.03%, 4=0.05%, 10=0.15%, 20=1.71%, 50=23.96% lat (msec) : 100=56.42%, 250=15.99%, 500=1.56%, 750=0.13% cpu : usr=0.02%, sys=0.07%, ctx=3934, majf=0, minf=7 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=0,3915,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=2076KiB/s (2125kB/s), 2076KiB/s-2076KiB/s (2125kB/s-2125kB/s), io=61.2MiB (64.1MB), run=30179-30179msec Disk stats (read/write): vda: ios=1/4922, merge=0/601, ticks=106/393197, in_queue=400269, util=100.00%
迁移etcd到io性能好的机器,并和master分开部署(不分了,机器不够,一起部署到物理机上)
k8s-etcd
部署etcdetcd挂掉会影响服务! coredns依赖etcd,会导致集群域名解析异常
报错
request cluster ID mismatch (got 3732ca115e389d9b want 7c506b9725a04c0a)
原因是过早修改etcd.servcie,删除了一个老etcd节点,启动时认为是一个新集群。
解决方案是 先只新增,增加完成之后在改systemd配置文件