들어가며
- 서버에서 memory 사용량이 임계치에 도달하면 oom-killer가 실행됩니다.
oom-killer는 커널 영역에서 실행되며, 비활성 옵션은 제공되지 않습니다.
oom-killer가 수행될 경우 프로세스를 강제로 중지하여 메모리를 확보합니다.
다만 우선순위 값을 지정하여 제거될 프로세스를 조정할 수 있습니다.
- oom-killer 수행된 이력은 /var/log/messages 또는 /var/log/syslog 에서 확인할 수 있습니다.
내용
oom-killer 주요 기능
- 실행중인 모든 프로세스를 감시하며, 메모리 사용량에 따라 oom 점수를 산출합니다. (oom-badness 함수가 수행하며 메모리 사용량이 높을수록 oom 점수가 높아집니다)
- OS에서 메모리가 필요한 경우 점수가 가장 높은 프로세스를 종료시킵니다.
oom-killer 수행시 참조되는 설정 파일
- /proc/<PID>/oom_score
현재 프로세스의 OOM 점수를 나타내며, 점수가 높을수록 OOM Killer의 대상이 될 확률이 높아집니다. (OOM 상황에서 oom-killer가 프로세스 제거시 실제로 참고하는 값) - /proc/<PID>/oom_adj
oom_score 파일의 점수를 직접 변경하지 않고, 이 파일을 수정하여 변경. -16 ~ +15 사이의 값을 가지며 점수가 높을 수록 제거 대상. -17은 특수 값으로 oom-killer 대상에서 제외됩니다.(제거되지 않는 프로세스) 2.6.36 커널 이후에는 사용되지 않는 파일(oom_score_adj 로 대체)이나 아직까지 상호호환은 지원됨 oom_adj의 값을 변경시 비례한 값으로 oom_score_adj 값이 변경됨 (반대의 경우도 성립) - /proc/<PID>/oom_score_adj
oom_score 값을 변경하는 파일 -999 ~ 1000 사이의 값을 가지며 높을수록 제거대상. -1000은 특수 값으로 oom-killer 대상에서 제외됩니다. 이전 커널과의 하위 호환성을 위해 oom_adj 값이 비례한 값으로 적용됩니다.
- /proc/<PID>/oom_score
oom-killer 발생시 messages 로그 (site1 예시)
메모리 사용량 임계치 근접하며 keepalived 데몬 상태 비활성 Oct 18 09:39:29 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) Received advert with higher priority 102, ours 43 Oct 18 09:39:29 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) Entering BACKUP STATE Oct 18 09:39:29 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) removing protocol VIPs. Oct 18 09:39:30 ailearn1 avahi-daemon[1841]: Withdrawing address record for 192.168.226.15 on ens10f0. Oct 18 09:39:30 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15 Oct 18 09:39:30 ailearn1 ntpd[1810]: Deleting interface #73 ens10f0, 192.168.226.15#123, interface stats: received=0, sent=0, dropped=0, active_time=501831 secs Oct 18 09:39:35 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15 Oct 18 09:39:46 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15 Oct 18 09:39:49 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15 Oct 18 09:40:03 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15 Oct 18 09:40:15 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15 임계치 도달 시점에 oom-killer 데몬 실행. Oct 18 09:40:24 ailearn1 kernel: zagent invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 Oct 18 09:40:24 ailearn1 kernel: zagent cpuset=/ mems_allowed=0-1 Oct 18 09:40:24 ailearn1 kernel: CPU: 5 PID: 33422 Comm: zagent Kdump: loaded Tainted: P OE ------------ T 3.10.0-1127.el7.x86_64 #1 Oct 18 09:40:24 ailearn1 kernel: Hardware name: HPE ProLiant XL270d Gen10/ProLiant XL270d Gen10, BIOS U45 04/08/2020 Oct 18 09:40:24 ailearn1 kernel: Call Trace: Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad7ff85>] dump_stack+0x19/0x1b Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad7a8a3>] dump_header+0x90/0x229 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba706ce2>] ? ktime_get_ts64+0x52/0xf0 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c246e>] oom_kill_process+0x25e/0x3f0 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba733a41>] ? cpuset_mems_allowed_intersects+0x21/0x30 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c1ecd>] ? oom_unkillable_task+0xcd/0x120 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c1f76>] ? find_lock_task_mm+0x56/0xc0 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c2cc6>] out_of_memory+0x4b6/0x4f0 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad7b3c0>] __alloc_pages_slowpath+0x5db/0x729 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c9146>] __alloc_pages_nodemask+0x436/0x450 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba818e18>] alloc_pages_current+0x98/0x110 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7be377>] __page_cache_alloc+0x97/0xb0 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c0f30>] filemap_fault+0x270/0x420 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffc0845a4e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs] Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad85830>] ? __schedule+0x310/0x840 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffc0845c4c>] xfs_filemap_fault+0x2c/0x30 [xfs] Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7edeea>] __do_fault.isra.61+0x8a/0x100 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7ee49c>] do_read_fault.isra.63+0x4c/0x1b0 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7f5d00>] handle_mm_fault+0xa20/0xfb0 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba6cac19>] ? hrtimer_try_to_cancel+0x29/0x120 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad8d653>] __do_page_fault+0x213/0x500 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad8d975>] do_page_fault+0x35/0x90 Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad89778>] page_fault+0x28/0x30 Oct 18 09:40:24 ailearn1 kernel: Mem-Info: Oct 18 09:40:24 ailearn1 kernel: active_anon:62171453 inactive_anon:563216 isolated_anon:0#012 active_file:0 inactive_file:619 isolated_file:115#012 unevictable:2239 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:170713 slab_unreclaimable:622079#012 mapped:357420 shmem:1266249 pagetables:1111120 bounce:0#012 free:152412 free_pcp:9 free_cma:0 Oct 18 09:40:24 ailearn1 kernel: Node 0 DMA free:15868kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Oct 18 09:40:24 ailearn1 kernel: lowmem_reserve[]: 0 2462 128414 128414 Oct 18 09:40:24 ailearn1 kernel: Node 0 DMA32 free:504664kB min:860kB low:1072kB high:1288kB active_anon:1881276kB inactive_anon:0kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2770524kB managed:2521184kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:23268kB slab_unreclaimable:64004kB kernel_stack:7504kB pagetables:400kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 18 09:40:24 ailearn1 kernel: lowmem_reserve[]: 0 0 125951 125951 Oct 18 09:40:24 ailearn1 kernel: Node 0 Normal free:43988kB min:44088kB low:55108kB high:66132kB active_anon:122635392kB inactive_anon:1831644kB active_file:52kB inactive_file:1632kB unevictable:1892kB isolated(anon):0kB isolated(file):480kB present:131072000kB managed:128974764kB mlocked:1892kB dirty:0kB writeback:0kB mapped:1339784kB shmem:3808772kB slab_reclaimable:425936kB slab_unreclaimable:888044kB kernel_stack:193536kB pagetables:997364kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:168 all_unreclaimable? no Oct 18 09:40:24 ailearn1 kernel: lowmem_reserve[]: 0 0 0 0 Oct 18 09:40:24 ailearn1 kernel: Node 1 Normal free:45128kB min:45156kB low:56444kB high:67732kB active_anon:124169144kB inactive_anon:421220kB active_file:0kB inactive_file:832kB unevictable:7064kB isolated(anon):0kB isolated(file):0kB present:134217724kB managed:132100136kB mlocked:7064kB dirty:0kB writeback:0kB mapped:89900kB shmem:1256224kB slab_reclaimable:233648kB slab_unreclaimable:1536236kB kernel_stack:164400kB pagetables:3446716kB unstable:0kB bounce:0kB free_pcp:44kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:239 all_unreclaimable? no Oct 18 09:40:24 ailearn1 kernel: lowmem_reserve[]: 0 0 0 0 Oct 18 09:40:24 ailearn1 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15868kB Oct 18 09:40:24 ailearn1 kernel: Node 0 DMA32: 190*4kB (UEM) 196*8kB (UEM) 249*16kB (UEM) 438*32kB (UEM) 554*64kB (UEM) 311*128kB (UEM) 162*256kB (UEM) 284*512kB (UEM) 217*1024kB (UEM) 0*2048kB 0*4096kB = 504680kB Oct 18 09:40:24 ailearn1 kernel: Node 0 Normal: 11621*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46484kB Oct 18 09:40:24 ailearn1 kernel: Node 1 Normal: 11862*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47448kB Oct 18 09:40:24 ailearn1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Oct 18 09:40:24 ailearn1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Oct 18 09:40:24 ailearn1 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Oct 18 09:40:24 ailearn1 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Oct 18 09:40:24 ailearn1 kernel: 1268943 total pagecache pages Oct 18 09:40:24 ailearn1 kernel: 0 pages in swap cache Oct 18 09:40:24 ailearn1 kernel: Swap cache stats: add 0, delete 0, find 0/0 Oct 18 09:40:24 ailearn1 kernel: Free swap = 0kB Oct 18 09:40:24 ailearn1 kernel: Total swap = 0kB Oct 18 09:40:24 ailearn1 kernel: 67019059 pages RAM Oct 18 09:40:24 ailearn1 kernel: 0 pages HighMem/MovableOnly Oct 18 09:40:24 ailearn1 kernel: 1116063 pages reserved Oct 18 09:40:24 ailearn1 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Oct 18 09:40:24 ailearn1 kernel: [ 926] 0 926 67746 37377 134 0 0 systemd-journal Oct 18 09:40:24 ailearn1 kernel: [ 954] 0 954 152602 439 40 0 0 lvmetad Oct 18 09:40:24 ailearn1 kernel: [ 965] 0 965 11174 494 21 0 -1000 systemd-udevd Oct 18 09:40:24 ailearn1 kernel: [ 1254] 0 1254 146848 2002 44 0 -1000 multipathd Oct 18 09:40:24 ailearn1 kernel: [ 1748] 0 1748 13863 229 28 0 -1000 auditd Oct 18 09:40:24 ailearn1 kernel: [ 1750] 0 1750 21125 200 11 0 0 audispd Oct 18 09:40:24 ailearn1 kernel: [ 1752] 0 1752 6013 178 16 0 0 sedispatch Oct 18 09:40:24 ailearn1 kernel: [ 1776] 996 1776 2133 177 10 0 0 lsmd Oct 18 09:40:24 ailearn1 kernel: [ 1777] 998 1777 136242 2184 63 0 0 polkitd Oct 18 09:40:24 ailearn1 kernel: [ 1780] 0 1780 106489 1104 76 0 0 ModemManager Oct 18 09:40:24 ailearn1 kernel: [ 1781] 0 1781 4210 186 13 0 0 alsactl Oct 18 09:40:24 ailearn1 kernel: [ 1784] 0 1784 344162 2478 86 0 0 arsm Oct 18 09:40:24 ailearn1 kernel: [ 1789] 376 1789 169338 19279 82 0 0 acronis_monitor Oct 18 09:40:24 ailearn1 kernel: [ 1794] 0 1794 181041 20102 234 0 0 rsyslogd Oct 18 09:40:24 ailearn1 kernel: [ 1795] 0 1795 6161 411 16 0 0 systemd-logind Oct 18 09:40:24 ailearn1 kernel: [ 1797] 0 1797 5544 408 15 0 0 irqbalance Oct 18 09:40:24 ailearn1 kernel: [ 1800] 0 1800 70376 722 25 0 0 ScvWatch Oct 18 09:40:24 ailearn1 kernel: [ 1801] 0 1801 48760 233 36 0 0 gssproxy Oct 18 09:40:24 ailearn1 kernel: [ 1803] 81 1803 6698 348 18 0 -900 dbus-daemon Oct 18 09:40:24 ailearn1 kernel: [ 1810] 38 1810 6938 343 18 0 0 ntpd Oct 18 09:40:24 ailearn1 kernel: [ 1820] 0 1820 265464 159968 332 0 0 lcaAgad Oct 18 09:40:24 ailearn1 kernel: [ 1823] 0 1823 1711 132 9 0 0 lcaAgmd Oct 18 09:40:24 ailearn1 kernel: [ 1837] 0 1837 1618 150 8 0 0 rngd Oct 18 09:40:24 ailearn1 kernel: [ 1839] 0 1839 7747 257 20 0 0 rdma-ndd Oct 18 09:40:24 ailearn1 kernel: [ 1840] 0 1840 31992 378 21 0 0 smartd Oct 18 09:40:24 ailearn1 kernel: [ 1841] 70 1841 7612 333 20 0 0 avahi-daemon Oct 18 09:40:24 ailearn1 kernel: [ 1859] 0 1859 1641 63 8 0 0 mcelog Oct 18 09:40:24 ailearn1 kernel: [ 1869] 0 1869 28911 213 12 0 0 ksmtuned Oct 18 09:40:24 ailearn1 kernel: [ 1875] 70 1875 7518 60 18 0 0 avahi-daemon Oct 18 09:40:24 ailearn1 kernel: [ 7122] 0 7122 46618 23706 95 0 0 SA-linux-64 Oct 18 09:40:24 ailearn1 kernel: [ 7234] 0 7234 140585 2975 96 0 0 tuned Oct 18 09:40:24 ailearn1 kernel: [ 7235] 0 7235 48409 541 51 0 0 cupsd Oct 18 09:40:24 ailearn1 kernel: [ 7244] 0 7244 26499 515 53 0 -1000 sshd --- 이하 프로세스 리스트 생략 --- oom_score 낮은 프로세스 제거(gateway-server가 여기서 킬) Oct 18 09:40:24 ailearn1 kernel: Out of memory: Kill process 39837 (python) score 1039 or sacrifice child Oct 18 09:40:24 ailearn1 kernel: Killed process 39837 (python), UID 0, total-vm:80511424kB, anon-rss:10341740kB, file-rss:126684kB, shmem-rss:17412kB Oct 18 09:40:24 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15 Oct 18 09:40:24 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15 Oct 18 09:40:25 ailearn1 kubelet: E1018 09:40:25.048955 32420 controller.go:115] failed to ensure node lease exists, will retry in 200ms, error: Get <https://192.168.226.15:16443/apis/coordination.k8s.io/v1beta1/namespaces/kube-node-lease/leases/ailearn1?timeout=10s:> context deadline exceeded (Client.Timeout exceeded while awaiting headers) Oct 18 09:40:25 ailearn1 kubelet: E1018 09:40:25.049862 32420 kubelet_node_status.go:385] Error updating node status, will retry: error getting node "ailearn1": Get <https://192.168.226.15:16443/api/v1/nodes/ailearn1?resourceVersion=0&timeout=10s:> context deadline exceeded Oct 18 09:40:25 ailearn1 kubelet: W1018 09:40:25.097616 32420 container.go:523] Failed to update stats for container "/kubepods/besteffort/pod773453ca-f580-11ed-bae1-08f1ea91456c/6d3c0f52f641224e8f0610ab8bdcef2a2cd536b6bdcfbd1cb5ceaa5ddc234179": unable to determine device info for dir: /ai/docker/overlay2/5d623d1dd6ffbbd41d7149c44b4895b9f44bdf984a40530cc6cfaac7d46fa028/diff: stat failed on /ai/docker/overlay2/5d623d1dd6ffbbd41d7149c44b4895b9f44bdf984a40530cc6cfaac7d46fa028/diff with error: no such file or directory, continuing to push stats Oct 18 09:40:25 ailearn1 kubelet: E1018 09:40:25.202361 32420 kubelet_volumes.go:154] Orphaned pod "ee152b6f-a2f8-11eb-84d3-08f1ea914680" found, but volume subpaths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them. Oct 18 09:40:25 ailearn1 kubelet: E1018 09:40:25.427450 32420 kubelet_volumes.go:154] Orphaned pod "ee152b6f-a2f8-11eb-84d3-08f1ea914680" found, but volume subpaths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them. Oct 18 09:40:25 ailearn1 kubelet: I1018 09:40:25.465644 32420 log.go:172] http: superfluous response.WriteHeader call from k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader (httplog.go:184) Oct 18 09:40:25 ailearn1 kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20130517/exfield-389) Oct 18 09:40:25 ailearn1 kernel: ACPI Error: Method parse/execution failed [\\_SB_.PMI0._PMM] (Node ffff8af21390a960), AE_AML_BUFFER_LIMIT (20130517/psparse-536) Oct 18 09:40:25 ailearn1 kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20130517/power_meter-339) Oct 18 09:40:25 ailearn1 kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20130517/exfield-389) Oct 18 09:40:25 ailearn1 kernel: ACPI Error: Method parse/execution failed [\\_SB_.PMI0._PMM] (Node ffff8af21390a960), AE_AML_BUFFER_LIMIT (20130517/psparse-536) Oct 18 09:40:25 ailearn1 kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20130517/power_meter-339) Oct 18 09:40:27 ailearn1 containerd: time="2023-10-18T09:40:26.792866007+09:00" level=info msg="shim reaped" id=69112597163b02f87bae188e6d2e80bfe6f8e8efca4eabe52df4dfa9dfd20696 Oct 18 09:40:27 ailearn1 kubelet: E1018 09:40:26.867022 32420 kubelet_volumes.go:154] Orphaned pod "ee152b6f-a2f8-11eb-84d3-08f1ea914680" found, but volume subpaths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them. Oct 18 09:40:27 ailearn1 dockerd: time="2023-10-18T09:40:26.810884883+09:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" Oct 18 09:40:27 ailearn1 containerd: time="2023-10-18T09:40:27.382287323+09:00" level=info msg="shim reaped" id=5c2aef9b0ae9cfb44ac1b6ae58ed02daf934196de468b010bc7063267345a672 Oct 18 09:40:27 ailearn1 dockerd: time="2023-10-18T09:40:27.389231624+09:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" Oct 18 09:40:27 ailearn1 containerd: time="2023-10-18T09:40:27.547277584+09:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/da46e88060bf4bc3aad8abf5af2c0c5a53873cbf4661a82dce76efd96d2adfc9/shim.sock" debug=false pid=28761 Oct 18 09:40:27 ailearn1 containerd: time="2023-10-18T09:40:27.600160346+09:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/e5b4df64e5ac4698d98f0a4533bf7025b5d7579c08f71ee50be5b3fd402f3d94/shim.sock" debug=false pid=29084 Oct 18 09:40:27 ailearn1 systemd: Started Session 85209 of user root. Oct 18 09:40:27 ailearn1 systemd: Starting Session 85209 of user root. Oct 18 09:40:27 ailearn1 systemd: Started Session 85208 of user root. Oct 18 09:40:27 ailearn1 systemd: Starting Session 85208 of user root. Oct 18 09:40:27 ailearn1 systemd: Started PC/SC Smart Card Daemon. Oct 18 09:40:27 ailearn1 systemd: Starting PC/SC Smart Card Daemon... 정상화 이후 keepalived 상태변경 Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) Entering MASTER STATE Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) setting protocol VIPs. Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15 Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens10f0 for 192.168.226.15 Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15 Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15 Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15 Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15 Oct 18 09:40:46 ailearn1 avahi-daemon[1841]: Registering new address record for 192.168.226.15 on ens10f0.IPv4.
oom-killer messages 로그의 프로세스별 메모리 사용량
/var/log/syslog 에서 oom-killer 확인. (site2 예시)
위 예시에서는 oom-killer가 작동하는 시점에 k9s 프로세스가 다수 확인되는데 rss 의 값은 프로세스가 사용하는 메모리 크기를 확인할 수 있습니다.(KB단위)
따라서 종료되지 않은 다수의 k9s 프로세스가 약 12GB 메모리를 점유하며 노드의 메모리 사용량이 임계치에 도달했음을 알 수 있습니다.
rss란 Resident Set Size의 약자로, 프로세스가 실제로 점유하고 있는 물리 메모리의 크기입니다. 이 값은 프로세스가 사용하고 있는 페이지 수에 페이지 사이즈를 곱한 값입니다 조금 더 상세하게 프로세스에 대한 분석을 하고 싶다면 아래 경로에서 확인할수 있습니다. /proc/$PID/status /proc/$PID/statm /proc/$PID/smaps
마치며
과거에는 nice , renice 값도 oom_score에 영향을 미쳤으나 2.6.36 커널부터는 oom_score_adj 로 통합되어 관리됩니다.
oom-killer는 system 메모리 Full 시점에만 수행되며, 커널에 의해 강제로 트리거됩니다.
관리자가 수동으로 관리가 불가하므로 리눅스 포럼에서 많은 이슈가 있었습니다. 때문에 5.17 커널부터는 수동으로 실행 가능하도록 베타 기능이 추가되었습니다.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f530243a172d2ff03f88d0056f838928d6445c6doom 점수를 산출하는 oom-badness 함수
https://github.com/torvalds/linux/commit/a63d83f427fbce97a6cea0db2e64b0eb8435cd10#diff-268fe084429e2dda106503d80d590ac28f341bcf5969eaed6c09891eea0ca466
- oom 로그 필드 설명
https://community.wandisco.com/s/article/Guide-to-Out-of-Memory-OOM-events-and-decoding-their-logging
아티클이 유용했나요?
훌륭합니다!
피드백을 제공해 주셔서 감사합니다.
도움이 되지 못해 죄송합니다!
피드백을 제공해 주셔서 감사합니다.
피드백 전송
소중한 의견을 수렴하여 아티클을 개선하도록 노력하겠습니다.