用户工具

站点工具


02-工程实践:kubernetes:issue:systemd_zombie问题

systemd zombie问题

日志

查看/var/log/message ,可以看到容器触发oom进程被杀,持续大约半小时后systemd变成zombie状态,日志就停了,直到强制重启之后才有日志:

Oct 30 00:28:47 k8s-node kernel: [<ffffffff811d3bb5>] mem_cgroup_oom_synchronize+0x575/0x5a0
Oct 30 00:28:47 k8s-node kernel: [<ffffffff811d2f80>] ? mem_cgroup_charge_common+0xc0/0xc0
Oct 30 00:28:47 k8s-node kernel: [<ffffffff8116d764>] pagefault_out_of_memory+0x14/0x90
Oct 30 00:28:47 k8s-node kernel: [<ffffffff8162eaec>] mm_fault_error+0x68/0x12b
Oct 30 00:28:47 k8s-node kernel: [<ffffffff81641652>] __do_page_fault+0x3e2/0x450
Oct 30 00:28:47 k8s-node kernel: [<ffffffff816416e3>] do_page_fault+0x23/0x80
Oct 30 00:28:47 k8s-node kernel: [<ffffffff8163d948>] page_fault+0x28/0x30
Oct 30 00:28:47 k8s-node kernel: Task in /kubepods/pod741a21dd-db5b-11e8-9fea-fa168f866a38/ee23ab203dffa79453c191f37211c6f91526930bf918afcabe2573913477512f killed as a result of limit of /kubepods/pod741a21dd-db5b-11e8-9fea-fa168f866a38
Oct 30 00:28:47 k8s-node kernel: memory: usage 131072kB, limit 131072kB, failcnt 3515
Oct 30 00:28:47 k8s-node kernel: memory+swap: usage 131072kB, limit 9007199254740991kB, failcnt 0
Oct 30 00:28:47 k8s-node kernel: kmem: usage 19388kB, limit 9007199254740991kB, failcnt 0
Oct 30 00:28:47 k8s-node kernel: Memory cgroup stats for /kubepods/pod741a21dd-db5b-11e8-9fea-fa168f866a38: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Oct 30 00:28:47 k8s-node kernel: Memory cgroup stats for /kubepods/pod741a21dd-db5b-11e8-9fea-fa168f866a38/ee23ab203dffa79453c191f37211c6f91526930bf918afcabe2573913477512f: cache:3396KB rss:108288KB rss_huge:69632KB mapped_file:3388KB swap:0KB inactive_anon:3388KB active_anon:108260KB inactive_file:8KB active_file:0KB unevictable:0KB
Oct 30 00:28:47 k8s-node kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Oct 30 00:28:47 k8s-node kernel: [23550]     0 23550      397      127       6        0          -998 sh
Oct 30 00:28:47 k8s-node kernel: [23653]     0 23653    21565     4057      46        0          -998 supervisord
Oct 30 00:28:47 k8s-node kernel: [23656]     0 23656     3762      654      13        0          -998 nginx
Oct 30 00:28:47 k8s-node kernel: [23662]     0 23662    62232     3288      64        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23665]   100 23665    10725     7242      25        0          -998 nginx
Oct 30 00:28:47 k8s-node kernel: [23666]   100 23666    10719     7242      25        0          -998 nginx
Oct 30 00:28:47 k8s-node kernel: [23670] 65534 23670    62333     2777      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23671] 65534 23671    62332     2784      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23672] 65534 23672    62332     2576      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23673] 65534 23673    62334     2747      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23674] 65534 23674    62332     2792      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23722] 65534 23722    62332     2655      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23723] 65534 23723    62332     2732      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23724] 65534 23724    62332     2540      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23813] 65534 23813    62333     2788      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [23815] 65534 23815    62332     2470      62        0          -998 php-fpm7
Oct 30 00:28:47 k8s-node kernel: [24004]     0 24004   126774     1677      32        0          -998 gbalancer
Oct 30 00:28:47 k8s-node kernel: Memory cgroup out of memory: Kill process 24023 (gbalancer) score 0 or sacrifice child
Oct 30 00:28:47 k8s-node kernel: Killed process 24004 (gbalancer) total-vm:507096kB, anon-rss:4288kB, file-rss:2420kB
Oct 30 00:28:47 k8s-node kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
Oct 30 00:28:47 k8s-node kernel:  cache: taskstats(158:ee23ab203dffa79453c191f37211c6f91526930bf918afcabe2573913477512f), object size: 328, buffer size: 328, default order: 1, min order: 0

软件版本

  • 对比内核版本,出过问题的机器内核都是 3.10.0-327.13.1.el7.x8664,内核3.10.0-327.18.2.el7.x8664似乎没出过问题
  • systemd版本为 219-57, yum安装提示已经是最新版本
  • docker版本都是 18.6.1
  • 出问题的机器基本都是虚拟机,物理机好像没出过问题(物理机内核全是 3.10.0-327.18.2.el7.x86_64)

可能是内核版本问题?

可能有用的链接

02-工程实践/kubernetes/issue/systemd_zombie问题.txt · 最后更改: 2020/04/07 06:34 由 annhe