查看/var/log/message ,可以看到容器触发oom进程被杀,持续大约半小时后systemd变成zombie状态,日志就停了,直到强制重启之后才有日志:
Oct 30 00:28:47 k8s-node kernel: [<ffffffff811d3bb5>] mem_cgroup_oom_synchronize+0x575/0x5a0 Oct 30 00:28:47 k8s-node kernel: [<ffffffff811d2f80>] ? mem_cgroup_charge_common+0xc0/0xc0 Oct 30 00:28:47 k8s-node kernel: [<ffffffff8116d764>] pagefault_out_of_memory+0x14/0x90 Oct 30 00:28:47 k8s-node kernel: [<ffffffff8162eaec>] mm_fault_error+0x68/0x12b Oct 30 00:28:47 k8s-node kernel: [<ffffffff81641652>] __do_page_fault+0x3e2/0x450 Oct 30 00:28:47 k8s-node kernel: [<ffffffff816416e3>] do_page_fault+0x23/0x80 Oct 30 00:28:47 k8s-node kernel: [<ffffffff8163d948>] page_fault+0x28/0x30 Oct 30 00:28:47 k8s-node kernel: Task in /kubepods/pod741a21dd-db5b-11e8-9fea-fa168f866a38/ee23ab203dffa79453c191f37211c6f91526930bf918afcabe2573913477512f killed as a result of limit of /kubepods/pod741a21dd-db5b-11e8-9fea-fa168f866a38 Oct 30 00:28:47 k8s-node kernel: memory: usage 131072kB, limit 131072kB, failcnt 3515 Oct 30 00:28:47 k8s-node kernel: memory+swap: usage 131072kB, limit 9007199254740991kB, failcnt 0 Oct 30 00:28:47 k8s-node kernel: kmem: usage 19388kB, limit 9007199254740991kB, failcnt 0 Oct 30 00:28:47 k8s-node kernel: Memory cgroup stats for /kubepods/pod741a21dd-db5b-11e8-9fea-fa168f866a38: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Oct 30 00:28:47 k8s-node kernel: Memory cgroup stats for /kubepods/pod741a21dd-db5b-11e8-9fea-fa168f866a38/ee23ab203dffa79453c191f37211c6f91526930bf918afcabe2573913477512f: cache:3396KB rss:108288KB rss_huge:69632KB mapped_file:3388KB swap:0KB inactive_anon:3388KB active_anon:108260KB inactive_file:8KB active_file:0KB unevictable:0KB Oct 30 00:28:47 k8s-node kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Oct 30 00:28:47 k8s-node kernel: [23550] 0 23550 397 127 6 0 -998 sh Oct 30 00:28:47 k8s-node kernel: [23653] 0 23653 21565 4057 46 0 -998 supervisord Oct 30 00:28:47 k8s-node kernel: [23656] 0 23656 3762 654 13 0 -998 nginx Oct 30 00:28:47 k8s-node kernel: [23662] 0 23662 62232 3288 64 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23665] 100 23665 10725 7242 25 0 -998 nginx Oct 30 00:28:47 k8s-node kernel: [23666] 100 23666 10719 7242 25 0 -998 nginx Oct 30 00:28:47 k8s-node kernel: [23670] 65534 23670 62333 2777 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23671] 65534 23671 62332 2784 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23672] 65534 23672 62332 2576 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23673] 65534 23673 62334 2747 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23674] 65534 23674 62332 2792 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23722] 65534 23722 62332 2655 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23723] 65534 23723 62332 2732 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23724] 65534 23724 62332 2540 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23813] 65534 23813 62333 2788 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [23815] 65534 23815 62332 2470 62 0 -998 php-fpm7 Oct 30 00:28:47 k8s-node kernel: [24004] 0 24004 126774 1677 32 0 -998 gbalancer Oct 30 00:28:47 k8s-node kernel: Memory cgroup out of memory: Kill process 24023 (gbalancer) score 0 or sacrifice child Oct 30 00:28:47 k8s-node kernel: Killed process 24004 (gbalancer) total-vm:507096kB, anon-rss:4288kB, file-rss:2420kB Oct 30 00:28:47 k8s-node kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x80d0) Oct 30 00:28:47 k8s-node kernel: cache: taskstats(158:ee23ab203dffa79453c191f37211c6f91526930bf918afcabe2573913477512f), object size: 328, buffer size: 328, default order: 1, min order: 0
可能是内核版本问题?