서버 메모리 풀 확인 방법 (oom-killer)

작성자 박형춘 수정일 2024-02-19 16:16

들어가며

  • 서버에서 memory 사용량이 임계치에 도달하면 oom-killer가 실행됩니다.
oom-killer는 커널 영역에서 실행되며, 비활성 옵션은 제공되지 않습니다.
oom-killer가 수행될 경우 프로세스를 강제로 중지하여 메모리를 확보합니다.
다만 우선순위 값을 지정하여 제거될 프로세스를 조정할 수 있습니다.


  • oom-killer 수행된 이력은 /var/log/messages 또는 /var/log/syslog 에서 확인할 수 있습니다.


내용

  • oom-killer 주요 기능

    1. 실행중인 모든 프로세스를 감시하며, 메모리 사용량에 따라 oom 점수를 산출합니다. (oom-badness 함수가 수행하며 메모리 사용량이 높을수록 oom 점수가 높아집니다)
    2. OS에서 메모리가 필요한 경우 점수가 가장 높은 프로세스를 종료시킵니다.
  • oom-killer 수행시 참조되는 설정 파일

    1. /proc/<PID>/oom_score
      현재 프로세스의 OOM 점수를 나타내며, 점수가 높을수록 OOM Killer의 대상이 될 확률이 높아집니다. (OOM 상황에서 oom-killer가 프로세스 제거시 실제로 참고하는 값)
    2. /proc/<PID>/oom_adj
      oom_score 파일의 점수를 직접 변경하지 않고, 이 파일을 수정하여 변경. -16 ~ +15 사이의 값을 가지며 점수가 높을 수록 제거 대상. -17은 특수 값으로 oom-killer 대상에서 제외됩니다.(제거되지 않는 프로세스) 2.6.36 커널 이후에는 사용되지 않는 파일(oom_score_adj 로 대체)이나 아직까지 상호호환은 지원됨 oom_adj의 값을 변경시 비례한 값으로 oom_score_adj 값이 변경됨 (반대의 경우도 성립)
    3. /proc/<PID>/oom_score_adj
      oom_score 값을 변경하는 파일 -999 ~ 1000 사이의 값을 가지며 높을수록 제거대상. -1000은 특수 값으로 oom-killer 대상에서 제외됩니다. 이전 커널과의 하위 호환성을 위해 oom_adj 값이 비례한 값으로 적용됩니다.
  • oom-killer 발생시 messages 로그 (site1 예시)

    메모리 사용량 임계치 근접하며 keepalived 데몬 상태 비활성
     Oct 18 09:39:29 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) Received advert with higher priority 102, ours 43
     Oct 18 09:39:29 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) Entering BACKUP STATE
     Oct 18 09:39:29 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) removing protocol VIPs.
     Oct 18 09:39:30 ailearn1 avahi-daemon[1841]: Withdrawing address record for 192.168.226.15 on ens10f0.
     Oct 18 09:39:30 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15
     Oct 18 09:39:30 ailearn1 ntpd[1810]: Deleting interface #73 ens10f0, 192.168.226.15#123, interface stats: received=0, sent=0, dropped=0, active_time=501831 secs
     Oct 18 09:39:35 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15
     Oct 18 09:39:46 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15
     Oct 18 09:39:49 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15
     Oct 18 09:40:03 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15
     Oct 18 09:40:15 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15
     
    임계치 도달 시점에 oom-killer 데몬 실행.
     Oct 18 09:40:24 ailearn1 kernel: zagent invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
     Oct 18 09:40:24 ailearn1 kernel: zagent cpuset=/ mems_allowed=0-1
     Oct 18 09:40:24 ailearn1 kernel: CPU: 5 PID: 33422 Comm: zagent Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-1127.el7.x86_64 #1
     Oct 18 09:40:24 ailearn1 kernel: Hardware name: HPE ProLiant XL270d Gen10/ProLiant XL270d Gen10, BIOS U45 04/08/2020
     Oct 18 09:40:24 ailearn1 kernel: Call Trace:
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad7ff85>] dump_stack+0x19/0x1b
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad7a8a3>] dump_header+0x90/0x229
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba706ce2>] ? ktime_get_ts64+0x52/0xf0
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c246e>] oom_kill_process+0x25e/0x3f0
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba733a41>] ? cpuset_mems_allowed_intersects+0x21/0x30
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c1ecd>] ? oom_unkillable_task+0xcd/0x120
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c1f76>] ? find_lock_task_mm+0x56/0xc0
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c2cc6>] out_of_memory+0x4b6/0x4f0
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad7b3c0>] __alloc_pages_slowpath+0x5db/0x729
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c9146>] __alloc_pages_nodemask+0x436/0x450
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba818e18>] alloc_pages_current+0x98/0x110
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7be377>] __page_cache_alloc+0x97/0xb0
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7c0f30>] filemap_fault+0x270/0x420
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffc0845a4e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad85830>] ? __schedule+0x310/0x840
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffc0845c4c>] xfs_filemap_fault+0x2c/0x30 [xfs]
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7edeea>] __do_fault.isra.61+0x8a/0x100
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7ee49c>] do_read_fault.isra.63+0x4c/0x1b0
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba7f5d00>] handle_mm_fault+0xa20/0xfb0
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffba6cac19>] ? hrtimer_try_to_cancel+0x29/0x120
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad8d653>] __do_page_fault+0x213/0x500
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad8d975>] do_page_fault+0x35/0x90
     Oct 18 09:40:24 ailearn1 kernel: [<ffffffffbad89778>] page_fault+0x28/0x30
     Oct 18 09:40:24 ailearn1 kernel: Mem-Info:
     Oct 18 09:40:24 ailearn1 kernel: active_anon:62171453 inactive_anon:563216 isolated_anon:0#012 active_file:0 inactive_file:619 isolated_file:115#012 unevictable:2239 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:170713 slab_unreclaimable:622079#012 mapped:357420 shmem:1266249 pagetables:1111120 bounce:0#012 free:152412 free_pcp:9 free_cma:0
     Oct 18 09:40:24 ailearn1 kernel: Node 0 DMA free:15868kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
     Oct 18 09:40:24 ailearn1 kernel: lowmem_reserve[]: 0 2462 128414 128414
     Oct 18 09:40:24 ailearn1 kernel: Node 0 DMA32 free:504664kB min:860kB low:1072kB high:1288kB active_anon:1881276kB inactive_anon:0kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2770524kB managed:2521184kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:23268kB slab_unreclaimable:64004kB kernel_stack:7504kB pagetables:400kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
     Oct 18 09:40:24 ailearn1 kernel: lowmem_reserve[]: 0 0 125951 125951
     Oct 18 09:40:24 ailearn1 kernel: Node 0 Normal free:43988kB min:44088kB low:55108kB high:66132kB active_anon:122635392kB inactive_anon:1831644kB active_file:52kB inactive_file:1632kB unevictable:1892kB isolated(anon):0kB isolated(file):480kB present:131072000kB managed:128974764kB mlocked:1892kB dirty:0kB writeback:0kB mapped:1339784kB shmem:3808772kB slab_reclaimable:425936kB slab_unreclaimable:888044kB kernel_stack:193536kB pagetables:997364kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:168 all_unreclaimable? no
     Oct 18 09:40:24 ailearn1 kernel: lowmem_reserve[]: 0 0 0 0
     
    Oct 18 09:40:24 ailearn1 kernel: Node 1 Normal free:45128kB min:45156kB low:56444kB high:67732kB active_anon:124169144kB inactive_anon:421220kB active_file:0kB inactive_file:832kB unevictable:7064kB isolated(anon):0kB isolated(file):0kB present:134217724kB managed:132100136kB mlocked:7064kB dirty:0kB writeback:0kB mapped:89900kB shmem:1256224kB slab_reclaimable:233648kB slab_unreclaimable:1536236kB kernel_stack:164400kB pagetables:3446716kB unstable:0kB bounce:0kB free_pcp:44kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:239 all_unreclaimable? no
     Oct 18 09:40:24 ailearn1 kernel: lowmem_reserve[]: 0 0 0 0
     Oct 18 09:40:24 ailearn1 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15868kB
     Oct 18 09:40:24 ailearn1 kernel: Node 0 DMA32: 190*4kB (UEM) 196*8kB (UEM) 249*16kB (UEM) 438*32kB (UEM) 554*64kB (UEM) 311*128kB (UEM) 162*256kB (UEM) 284*512kB (UEM) 217*1024kB (UEM) 0*2048kB 0*4096kB = 504680kB
     Oct 18 09:40:24 ailearn1 kernel: Node 0 Normal: 11621*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46484kB
     Oct 18 09:40:24 ailearn1 kernel: Node 1 Normal: 11862*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47448kB
     Oct 18 09:40:24 ailearn1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
     Oct 18 09:40:24 ailearn1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
     Oct 18 09:40:24 ailearn1 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
     Oct 18 09:40:24 ailearn1 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
     Oct 18 09:40:24 ailearn1 kernel: 1268943 total pagecache pages
     Oct 18 09:40:24 ailearn1 kernel: 0 pages in swap cache
     Oct 18 09:40:24 ailearn1 kernel: Swap cache stats: add 0, delete 0, find 0/0
     Oct 18 09:40:24 ailearn1 kernel: Free swap  = 0kB
     Oct 18 09:40:24 ailearn1 kernel: Total swap = 0kB
     Oct 18 09:40:24 ailearn1 kernel: 67019059 pages RAM
     Oct 18 09:40:24 ailearn1 kernel: 0 pages HighMem/MovableOnly
     Oct 18 09:40:24 ailearn1 kernel: 1116063 pages reserved
     Oct 18 09:40:24 ailearn1 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
     Oct 18 09:40:24 ailearn1 kernel: [  926]     0   926    67746    37377     134        0             0 systemd-journal
     Oct 18 09:40:24 ailearn1 kernel: [  954]     0   954   152602      439      40        0             0 lvmetad
     Oct 18 09:40:24 ailearn1 kernel: [  965]     0   965    11174      494      21        0         -1000 systemd-udevd
     Oct 18 09:40:24 ailearn1 kernel: [ 1254]     0  1254   146848     2002      44        0         -1000 multipathd
     Oct 18 09:40:24 ailearn1 kernel: [ 1748]     0  1748    13863      229      28        0         -1000 auditd
     Oct 18 09:40:24 ailearn1 kernel: [ 1750]     0  1750    21125      200      11        0             0 audispd
     Oct 18 09:40:24 ailearn1 kernel: [ 1752]     0  1752     6013      178      16        0             0 sedispatch
     Oct 18 09:40:24 ailearn1 kernel: [ 1776]   996  1776     2133      177      10        0             0 lsmd
     Oct 18 09:40:24 ailearn1 kernel: [ 1777]   998  1777   136242     2184      63        0             0 polkitd
     Oct 18 09:40:24 ailearn1 kernel: [ 1780]     0  1780   106489     1104      76        0             0 ModemManager
     Oct 18 09:40:24 ailearn1 kernel: [ 1781]     0  1781     4210      186      13        0             0 alsactl
     Oct 18 09:40:24 ailearn1 kernel: [ 1784]     0  1784   344162     2478      86        0             0 arsm
     Oct 18 09:40:24 ailearn1 kernel: [ 1789]   376  1789   169338    19279      82        0             0 acronis_monitor
     Oct 18 09:40:24 ailearn1 kernel: [ 1794]     0  1794   181041    20102     234        0             0 rsyslogd
     Oct 18 09:40:24 ailearn1 kernel: [ 1795]     0  1795     6161      411      16        0             0 systemd-logind
     Oct 18 09:40:24 ailearn1 kernel: [ 1797]     0  1797     5544      408      15        0             0 irqbalance
     Oct 18 09:40:24 ailearn1 kernel: [ 1800]     0  1800    70376      722      25        0             0 ScvWatch
     Oct 18 09:40:24 ailearn1 kernel: [ 1801]     0  1801    48760      233      36        0             0 gssproxy
     Oct 18 09:40:24 ailearn1 kernel: [ 1803]    81  1803     6698      348      18        0          -900 dbus-daemon
     Oct 18 09:40:24 ailearn1 kernel: [ 1810]    38  1810     6938      343      18        0             0 ntpd
     Oct 18 09:40:24 ailearn1 kernel: [ 1820]     0  1820   265464   159968     332        0             0 lcaAgad
     Oct 18 09:40:24 ailearn1 kernel: [ 1823]     0  1823     1711      132       9        0             0 lcaAgmd
     Oct 18 09:40:24 ailearn1 kernel: [ 1837]     0  1837     1618      150       8        0             0 rngd
     Oct 18 09:40:24 ailearn1 kernel: [ 1839]     0  1839     7747      257      20        0             0 rdma-ndd
     Oct 18 09:40:24 ailearn1 kernel: [ 1840]     0  1840    31992      378      21        0             0 smartd
     Oct 18 09:40:24 ailearn1 kernel: [ 1841]    70  1841     7612      333      20        0             0 avahi-daemon
     Oct 18 09:40:24 ailearn1 kernel: [ 1859]     0  1859     1641       63       8        0             0 mcelog
     Oct 18 09:40:24 ailearn1 kernel: [ 1869]     0  1869    28911      213      12        0             0 ksmtuned
     Oct 18 09:40:24 ailearn1 kernel: [ 1875]    70  1875     7518       60      18        0             0 avahi-daemon
     Oct 18 09:40:24 ailearn1 kernel: [ 7122]     0  7122    46618    23706      95        0             0 SA-linux-64
     Oct 18 09:40:24 ailearn1 kernel: [ 7234]     0  7234   140585     2975      96        0             0 tuned
     Oct 18 09:40:24 ailearn1 kernel: [ 7235]     0  7235    48409      541      51        0             0 cupsd
     Oct 18 09:40:24 ailearn1 kernel: [ 7244]     0  7244    26499      515      53        0         -1000 sshd
     --- 이하 프로세스 리스트 생략 ---
     
    oom_score 낮은 프로세스 제거(gateway-server가 여기서 킬)
     Oct 18 09:40:24 ailearn1 kernel: Out of memory: Kill process 39837 (python) score 1039 or sacrifice child
     Oct 18 09:40:24 ailearn1 kernel: Killed process 39837 (python), UID 0, total-vm:80511424kB, anon-rss:10341740kB, file-rss:126684kB, shmem-rss:17412kB
     Oct 18 09:40:24 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15
     Oct 18 09:40:24 ailearn1 Keepalived_vrrp[7287]: /etc/keepalived/check_apiserver.sh exited due to signal 15
     Oct 18 09:40:25 ailearn1 kubelet: E1018 09:40:25.048955   32420 controller.go:115] failed to ensure node lease exists, will retry in 200ms, error: Get <https://192.168.226.15:16443/apis/coordination.k8s.io/v1beta1/namespaces/kube-node-lease/leases/ailearn1?timeout=10s:> context deadline exceeded (Client.Timeout exceeded while awaiting headers)
     Oct 18 09:40:25 ailearn1 kubelet: E1018 09:40:25.049862   32420 kubelet_node_status.go:385] Error updating node status, will retry: error getting node "ailearn1": Get <https://192.168.226.15:16443/api/v1/nodes/ailearn1?resourceVersion=0&timeout=10s:> context deadline exceeded
     Oct 18 09:40:25 ailearn1 kubelet: W1018 09:40:25.097616   32420 container.go:523] Failed to update stats for container "/kubepods/besteffort/pod773453ca-f580-11ed-bae1-08f1ea91456c/6d3c0f52f641224e8f0610ab8bdcef2a2cd536b6bdcfbd1cb5ceaa5ddc234179": unable to determine device info for dir: /ai/docker/overlay2/5d623d1dd6ffbbd41d7149c44b4895b9f44bdf984a40530cc6cfaac7d46fa028/diff: stat failed on /ai/docker/overlay2/5d623d1dd6ffbbd41d7149c44b4895b9f44bdf984a40530cc6cfaac7d46fa028/diff with error: no such file or directory, continuing to push stats
     Oct 18 09:40:25 ailearn1 kubelet: E1018 09:40:25.202361   32420 kubelet_volumes.go:154] Orphaned pod "ee152b6f-a2f8-11eb-84d3-08f1ea914680" found, but volume subpaths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
     Oct 18 09:40:25 ailearn1 kubelet: E1018 09:40:25.427450   32420 kubelet_volumes.go:154] Orphaned pod "ee152b6f-a2f8-11eb-84d3-08f1ea914680" found, but volume subpaths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
     Oct 18 09:40:25 ailearn1 kubelet: I1018 09:40:25.465644   32420 log.go:172] http: superfluous response.WriteHeader call from k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader (httplog.go:184)
     Oct 18 09:40:25 ailearn1 kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20130517/exfield-389)
     Oct 18 09:40:25 ailearn1 kernel: ACPI Error: Method parse/execution failed [\\_SB_.PMI0._PMM] (Node ffff8af21390a960), AE_AML_BUFFER_LIMIT (20130517/psparse-536)
     Oct 18 09:40:25 ailearn1 kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20130517/power_meter-339)
     Oct 18 09:40:25 ailearn1 kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20130517/exfield-389)
     Oct 18 09:40:25 ailearn1 kernel: ACPI Error: Method parse/execution failed [\\_SB_.PMI0._PMM] (Node ffff8af21390a960), AE_AML_BUFFER_LIMIT (20130517/psparse-536)
     Oct 18 09:40:25 ailearn1 kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20130517/power_meter-339)
     Oct 18 09:40:27 ailearn1 containerd: time="2023-10-18T09:40:26.792866007+09:00" level=info msg="shim reaped" id=69112597163b02f87bae188e6d2e80bfe6f8e8efca4eabe52df4dfa9dfd20696
     Oct 18 09:40:27 ailearn1 kubelet: E1018 09:40:26.867022   32420 kubelet_volumes.go:154] Orphaned pod "ee152b6f-a2f8-11eb-84d3-08f1ea914680" found, but volume subpaths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
     Oct 18 09:40:27 ailearn1 dockerd: time="2023-10-18T09:40:26.810884883+09:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
     Oct 18 09:40:27 ailearn1 containerd: time="2023-10-18T09:40:27.382287323+09:00" level=info msg="shim reaped" id=5c2aef9b0ae9cfb44ac1b6ae58ed02daf934196de468b010bc7063267345a672
     Oct 18 09:40:27 ailearn1 dockerd: time="2023-10-18T09:40:27.389231624+09:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
     Oct 18 09:40:27 ailearn1 containerd: time="2023-10-18T09:40:27.547277584+09:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/da46e88060bf4bc3aad8abf5af2c0c5a53873cbf4661a82dce76efd96d2adfc9/shim.sock" debug=false pid=28761
     Oct 18 09:40:27 ailearn1 containerd: time="2023-10-18T09:40:27.600160346+09:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/e5b4df64e5ac4698d98f0a4533bf7025b5d7579c08f71ee50be5b3fd402f3d94/shim.sock" debug=false pid=29084
     Oct 18 09:40:27 ailearn1 systemd: Started Session 85209 of user root.
     Oct 18 09:40:27 ailearn1 systemd: Starting Session 85209 of user root.
     Oct 18 09:40:27 ailearn1 systemd: Started Session 85208 of user root.
     Oct 18 09:40:27 ailearn1 systemd: Starting Session 85208 of user root.
     Oct 18 09:40:27 ailearn1 systemd: Started PC/SC Smart Card Daemon.
     Oct 18 09:40:27 ailearn1 systemd: Starting PC/SC Smart Card Daemon...
     
    정상화 이후 keepalived 상태변경
     Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) Entering MASTER STATE
     Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) setting protocol VIPs.
     Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15
     Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens10f0 for 192.168.226.15
     Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15
     Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15
     Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15
     Oct 18 09:40:46 ailearn1 Keepalived_vrrp[7287]: Sending gratuitous ARP on ens10f0 for 192.168.226.15
     Oct 18 09:40:46 ailearn1 avahi-daemon[1841]: Registering new address record for 192.168.226.15 on ens10f0.IPv4.


  • oom-killer messages 로그의 프로세스별 메모리 사용량

    /var/log/syslog 에서 oom-killer 확인. (site2 예시)

    위 예시에서는 oom-killer가 작동하는 시점에 k9s 프로세스가 다수 확인되는데 rss 의 값은 프로세스가 사용하는 메모리 크기를 확인할 수 있습니다.(KB단위)
    따라서 종료되지 않은 다수의 k9s 프로세스가 약 12GB 메모리를 점유하며 노드의 메모리 사용량이 임계치에 도달했음을 알 수 있습니다.


rss란 Resident Set Size의 약자로, 프로세스가 실제로 점유하고 있는 물리 메모리의 크기입니다.

이 값은 프로세스가 사용하고 있는 페이지 수에 페이지 사이즈를 곱한 값입니다

조금 더 상세하게 프로세스에 대한 분석을 하고 싶다면 아래 경로에서 확인할수 있습니다.
/proc/$PID/status 
/proc/$PID/statm 
/proc/$PID/smaps




마치며







아티클이 유용했나요?

훌륭합니다!

피드백을 제공해 주셔서 감사합니다.

도움이 되지 못해 죄송합니다!

피드백을 제공해 주셔서 감사합니다.

아티클을 개선할 수 있는 방법을 알려주세요!

최소 하나의 이유를 선택하세요
CAPTCHA 확인이 필요합니다.

피드백 전송

소중한 의견을 수렴하여 아티클을 개선하도록 노력하겠습니다.

02-558-8300