• There is NO official Otland's Discord server and NO official Otland's server list. The Otland's Staff does not manage any Discord server or server list. Moderators or administrator of any Discord server or server lists have NO connection to the Otland's Staff. Do not get scammed!

Linux server out of memory crash

Gsp

RP
Joined
Jan 3, 2008
Messages
250
Solutions
1
Reaction score
4
Hi guys, I'm hosting a server which runs on a fast dedicated server. TFS is the only major application that is running on the server. It uses %20 of the memory, around %5 of the cpu. It is not an active server but players log in from time to time. Yesterday the game server randomly crashed for the second time. When I checked the SSH command line it just said "Killed". A player was online during the crash so it made me wonder if it was done deliberately.

This is how it crashed from the syslog:

Code:
Sep 26 01:08:39 ns3067 systemd[1]: exim4-base.service: Succeeded.
Sep 26 01:08:39 ns3067 systemd[1]: Finished exim4-base housekeeping.
Sep 26 01:08:39 ns3067 systemd[1]: exim4-base.service: Consumed 9.556s CPU time.
Sep 26 01:08:39 ns3067 systemd[1]: Starting Rotate log files...
Sep 26 01:08:53 ns3067 kernel: [1459646.523577] f2b/f.sshd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
Sep 26 01:08:53 ns3067 kernel: [1459646.523579] f2b/f.sshd cpuset=/ mems_allowed=0
Sep 26 01:08:53 ns3067 kernel: [1459646.523583] CPU: 4 PID: 720 Comm: f2b/f.sshd Tainted: G          I       4.19.0-17-amd64 #1 Debian 4.19.194-3
Sep 26 01:08:53 ns3067 kernel: [1459646.523584] Hardware name: Supermicro X8STi/X8STi, BIOS 2.0        09/17/10 
Sep 26 01:08:53 ns3067 kernel: [1459646.523584] Call Trace:
Sep 26 01:08:53 ns3067 kernel: [1459646.523592]  dump_stack+0x66/0x81
Sep 26 01:08:53 ns3067 kernel: [1459646.523595]  dump_header+0x6b/0x283
Sep 26 01:08:53 ns3067 kernel: [1459646.523598]  ? do_try_to_free_pages+0x2ec/0x370
Sep 26 01:08:53 ns3067 kernel: [1459646.523599]  oom_kill_process.cold.30+0xb/0x1cf
Sep 26 01:08:53 ns3067 kernel: [1459646.523602]  ? oom_badness+0x23/0x140
Sep 26 01:08:53 ns3067 kernel: [1459646.523603]  out_of_memory+0x1a5/0x450
Sep 26 01:08:53 ns3067 kernel: [1459646.523605]  __alloc_pages_slowpath+0xbd8/0xcb0
Sep 26 01:08:53 ns3067 kernel: [1459646.523607]  __alloc_pages_nodemask+0x28b/0x2b0
Sep 26 01:08:53 ns3067 kernel: [1459646.523609]  filemap_fault+0x333/0x780
Sep 26 01:08:53 ns3067 kernel: [1459646.523611]  ? alloc_set_pte+0xf2/0x560
Sep 26 01:08:53 ns3067 kernel: [1459646.523612]  ? filemap_map_pages+0x360/0x3a0
Sep 26 01:08:53 ns3067 kernel: [1459646.523636]  ext4_filemap_fault+0x2c/0x40 [ext4]
Sep 26 01:08:53 ns3067 kernel: [1459646.523638]  __do_fault+0x36/0x130
Sep 26 01:08:53 ns3067 kernel: [1459646.523639]  __handle_mm_fault+0xdf9/0x11f0
Sep 26 01:08:53 ns3067 kernel: [1459646.523641]  handle_mm_fault+0xd6/0x200
Sep 26 01:08:53 ns3067 kernel: [1459646.523643]  __do_page_fault+0x249/0x4f0
Sep 26 01:08:53 ns3067 kernel: [1459646.523646]  ? page_fault+0x8/0x30
Sep 26 01:08:53 ns3067 kernel: [1459646.523647]  page_fault+0x1e/0x30
Sep 26 01:08:53 ns3067 kernel: [1459646.523649] RIP: 0033:0x5d0949
Sep 26 01:08:53 ns3067 kernel: [1459646.523652] Code: Bad RIP value.
Sep 26 01:08:53 ns3067 kernel: [1459646.523653] RSP: 002b:00007f750affbfb0 EFLAGS: 00010202
Sep 26 01:08:53 ns3067 kernel: [1459646.523654] RAX: 00007f751057ff60 RBX: 0000000000000000 RCX: 0000000000ee1ea8
Sep 26 01:08:53 ns3067 kernel: [1459646.523655] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000fc0390
Sep 26 01:08:53 ns3067 kernel: [1459646.523655] RBP: 000000000073ef00 R08: 0000000000d8bbc0 R09: 0000000000905bc0
Sep 26 01:08:53 ns3067 kernel: [1459646.523656] R10: 0000000000d8be40 R11: 0000000000000008 R12: 00007f75103bae00
Sep 26 01:08:53 ns3067 kernel: [1459646.523657] R13: 00007f75103bb710 R14: 00007f75114b70b0 R15: 00007f7510b7a990
Sep 26 01:08:53 ns3067 kernel: [1459646.523658] Mem-Info:
Sep 26 01:08:53 ns3067 kernel: [1459646.523661] active_anon:3626675 inactive_anon:384094 isolated_anon:0
Sep 26 01:08:53 ns3067 kernel: [1459646.523661]  active_file:103 inactive_file:38 isolated_file:0
Sep 26 01:08:53 ns3067 kernel: [1459646.523661]  unevictable:0 dirty:54 writeback:0 unstable:0
Sep 26 01:08:53 ns3067 kernel: [1459646.523661]  slab_reclaimable:11496 slab_unreclaimable:24062
Sep 26 01:08:53 ns3067 kernel: [1459646.523661]  mapped:1186 shmem:2097 pagetables:8958 bounce:0
Sep 26 01:08:53 ns3067 kernel: [1459646.523661]  free:33200 free_pcp:1416 free_cma:0
Sep 26 01:08:53 ns3067 kernel: [1459646.523664] Node 0 active_anon:14506700kB inactive_anon:1536376kB active_file:412kB inactive_file:152kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4744kB dirty:216kB writeback:0kB shmem:8388kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1953792kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Sep 26 01:08:53 ns3067 kernel: [1459646.523665] Node 0 DMA free:15884kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523667] lowmem_reserve[]: 0 3466 15982 15982 15982
Sep 26 01:08:53 ns3067 kernel: [1459646.523669] Node 0 DMA32 free:64356kB min:14640kB low:18300kB high:21960kB active_anon:3506928kB inactive_anon:52kB active_file:56kB inactive_file:52kB unevictable:0kB writepending:4kB present:3644928kB managed:3579360kB mlocked:0kB kernel_stack:16kB pagetables:6836kB bounce:0kB free_pcp:24kB local_pcp:0kB free_cma:0kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523671] lowmem_reserve[]: 0 0 12516 12516 12516
Sep 26 01:08:53 ns3067 kernel: [1459646.523673] Node 0 Normal free:52560kB min:52872kB low:66088kB high:79304kB active_anon:11000032kB inactive_anon:1536324kB active_file:408kB inactive_file:352kB unevictable:0kB writepending:92kB present:13107200kB managed:12821332kB mlocked:0kB kernel_stack:2928kB pagetables:28996kB bounce:0kB free_pcp:5640kB local_pcp:648kB free_cma:0kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523676] lowmem_reserve[]: 0 0 0 0 0
Sep 26 01:08:53 ns3067 kernel: [1459646.523677] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523682] Node 0 DMA32: 753*4kB (UME) 283*8kB (UM) 103*16kB (UM) 51*32kB (UME) 18*64kB (UME) 9*128kB (UM) 3*256kB (UME) 2*512kB (UE) 3*1024kB (UME) 2*2048kB (ME) 11*4096kB (M) = 64876kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523688] Node 0 Normal: 549*4kB (UME) 502*8kB (UME) 795*16kB (UME) 328*32kB (UME) 147*64kB (UME) 45*128kB (UME) 12*256kB (UME) 7*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 53300kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523695] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523695] 2729 total pagecache pages
Sep 26 01:08:53 ns3067 kernel: [1459646.523696] 202 pages in swap cache
Sep 26 01:08:53 ns3067 kernel: [1459646.523697] Swap cache stats: add 262629, delete 262427, find 9815/10060
Sep 26 01:08:53 ns3067 kernel: [1459646.523697] Free swap  = 0kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523698] Total swap = 1048568kB
Sep 26 01:08:53 ns3067 kernel: [1459646.523698] 4192024 pages RAM
Sep 26 01:08:53 ns3067 kernel: [1459646.523699] 0 pages HighMem/MovableOnly
Sep 26 01:08:53 ns3067 kernel: [1459646.523699] 87880 pages reserved
Sep 26 01:08:53 ns3067 kernel: [1459646.523699] 0 pages hwpoisoned
Sep 26 01:08:53 ns3067 kernel: [1459646.523700] Tasks state (memory values in pages):
Sep 26 01:08:53 ns3067 kernel: [1459646.523700] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Sep 26 01:08:53 ns3067 kernel: [1459646.523704] [    310]     0   310    51085      237   421888        3          -250 systemd-journal
Sep 26 01:08:53 ns3067 kernel: [1459646.523705] [    324]     0   324     5323      276    73728        0         -1000 systemd-udevd
Sep 26 01:08:53 ns3067 kernel: [1459646.523707] [    389]     0   389      777       45    45056       11             0 mdadm
Sep 26 01:08:53 ns3067 kernel: [1459646.523708] [    479]     0   479    24955      305    69632       38             0 dhclient
Sep 26 01:08:53 ns3067 kernel: [1459646.523709] [    540]     0   540     1654       52    53248       17             0 cron
Sep 26 01:08:53 ns3067 kernel: [1459646.523710] [    541]   104   541     1983      118    53248       53          -900 dbus-daemon
Sep 26 01:08:53 ns3067 kernel: [1459646.523712] [    547]     0   547    63408     2471   180224        2             0 php-fpm8.0
Sep 26 01:08:53 ns3067 kernel: [1459646.523713] [    548]     0   548    55184      428    73728       13             0 rsyslogd
Sep 26 01:08:53 ns3067 kernel: [1459646.523714] [    553]     0   553     3395      187    65536       44             0 systemd-logind
Sep 26 01:08:53 ns3067 kernel: [1459646.523715] [    555]   105   555     1485       50    53248       26             0 nscd
Sep 26 01:08:53 ns3067 kernel: [1459646.523717] [    557]     0   557    99232     4146   143360      316             0 fail2ban-server
Sep 26 01:08:53 ns3067 kernel: [1459646.523718] [    566]     0   566      682       13    45056        5             0 agetty
Sep 26 01:08:53 ns3067 kernel: [1459646.523719] [    576]   106   576    18608      147    61440       29             0 ntpd
Sep 26 01:08:53 ns3067 kernel: [1459646.523720] [    598]     0   598     4668      194    57344       31             0 nginx
Sep 26 01:08:53 ns3067 kernel: [1459646.523722] [    599]    33   599     4795      538    65536       19             0 nginx
Sep 26 01:08:53 ns3067 kernel: [1459646.523723] [    600]    33   600     4936      697    65536       26             0 nginx
Sep 26 01:08:53 ns3067 kernel: [1459646.523724] [    602]    33   602     4849      598    65536       26             0 nginx
Sep 26 01:08:53 ns3067 kernel: [1459646.523725] [    603]    33   603     4787      531    65536       27             0 nginx
Sep 26 01:08:53 ns3067 kernel: [1459646.523727] [    604]     0   604     3302      202    61440       32         -1000 sshd
Sep 26 01:08:53 ns3067 kernel: [1459646.523728] [    666]   109   666   599310    97751   991232      272             0 mariadbd
Sep 26 01:08:53 ns3067 kernel: [1459646.523729] [   1070]   108  1070     4572      157    77824       86             0 exim4
Sep 26 01:08:53 ns3067 kernel: [1459646.523731] [   1084]     0  1084     3842      310    73728       51             0 systemd
Sep 26 01:08:53 ns3067 kernel: [1459646.523732] [   1085]     0  1085    41688      496    94208      214             0 (sd-pam)
Sep 26 01:08:53 ns3067 kernel: [1459646.523733] [   1121]  1001  1121     1733        2    45056      146             0 screen
Sep 26 01:08:53 ns3067 kernel: [1459646.523735] [   1122]  1001  1122     2000        2    53248      384             0 bash
Sep 26 01:08:53 ns3067 kernel: [1459646.523736] [   1127]  1001  1127  4197431  3894541 33419264   260307             0 theforgottenser
Sep 26 01:08:53 ns3067 kernel: [1459646.523738] [ 368445]    33 368445    63477     2599   188416        1             0 php-fpm8.0
Sep 26 01:08:53 ns3067 kernel: [1459646.523739] [ 368494]    33 368494    63464     2595   188416        1             0 php-fpm8.0
Sep 26 01:08:53 ns3067 kernel: [1459646.523740] [ 368495]    33 368495    63464     2593   188416        1             0 php-fpm8.0
Sep 26 01:08:53 ns3067 kernel: [1459646.523741] [ 369079]     0 369079     5721     2543    86016        0             0 certbot
Sep 26 01:08:53 ns3067 kernel: [1459646.523743] [ 369080]     0 369080      604       17    40960        0             0 sessionclean
Sep 26 01:08:53 ns3067 kernel: [1459646.523744] [ 369087]     0 369087      604       24    40960        0             0 sessionclean
Sep 26 01:08:53 ns3067 kernel: [1459646.523745] [ 369090]     0 369090     5500       30    53248        0             0 sort
Sep 26 01:08:53 ns3067 kernel: [1459646.523746] [ 369092]     0 369092     5500       30    53248        0             0 sort
Sep 26 01:08:53 ns3067 kernel: [1459646.523747] [ 369093]     0 369093      604       24    40960        0             0 sessionclean
Sep 26 01:08:53 ns3067 kernel: [1459646.523749] [ 369101]     0 369101    20724      796   143360        0             0 php8.0
Sep 26 01:08:53 ns3067 kernel: [1459646.523750] [ 369116]     0 369116     1617       62    53248        0             0 logrotate
Sep 26 01:08:53 ns3067 kernel: [1459646.523751] [ 369126]     0 369126     1617       62    53248        0             0 logrotate
Sep 26 01:08:53 ns3067 kernel: [1459646.523752] Out of memory: Kill process 1127 (theforgottenser) score 953 or sacrifice child
Sep 26 01:08:53 ns3067 kernel: [1459646.523820] Killed process 1127 (theforgottenser) total-vm:16789724kB, anon-rss:15578164kB, file-rss:0kB, shmem-rss:0kB
Sep 26 01:08:53 ns3067 systemd[1]: session-1.scope: A process of this unit has been killed by the OOM killer.
Sep 26 01:08:53 ns3067 systemd[1]: phpsessionclean.service: Succeeded.
Sep 26 01:08:53 ns3067 systemd[1]: Finished Clean php session files.
Sep 26 01:08:53 ns3067 systemd[1]: phpsessionclean.service: Consumed 11.894s CPU time.
Sep 26 01:08:53 ns3067 kernel: [1459647.187742] oom_reaper: reaped process 1127 (theforgottenser), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Sep 26 01:08:53 ns3067 mariadbd[666]: 2021-09-26  1:08:53 6 [Warning] Aborted connection 6 to db: 'dbaseuser' user: 'root' host: 'localhost' (Got an error reading communication packets)
Sep 26 01:08:53 ns3067 systemd[1]: certbot.service: Succeeded.
Sep 26 01:08:53 ns3067 systemd[1]: Finished Certbot.
Sep 26 01:08:53 ns3067 systemd[1]: certbot.service: Consumed 12.674s CPU time.
Sep 26 01:08:54 ns3067 systemd[1]: rsyslog.service: Sent signal SIGHUP to main process 548 (rsyslogd) on client request.

I'll be very happy if someone can help me read this log and find out what might be the problem.
 
Solution
Alright, I am not the right person for this but I am interested in:

  • What are the specs of the machine that you're using, ram to "pc" specs. OBV, the host machine specs which is not hosted on an actual pc
  • Using Nginx or Apache? - What i can see, it looks like Apache?

Code: Bad RIP value. (Stands inside the log file)
This came out of a google search:
This suggests errors like not properly initializing a pointer to a function before trying to use it, or perhaps overwriting a function's return address in stack, so the RET machine code instruction will end up trying to return to a wrong address


Can it be that you're server is too big for the machine to handle (Ram), the player might have done something "Specific" in the...
Alright, I am not the right person for this but I am interested in:

  • What are the specs of the machine that you're using, ram to "pc" specs. OBV, the host machine specs which is not hosted on an actual pc
  • Using Nginx or Apache? - What i can see, it looks like Apache?

Code: Bad RIP value. (Stands inside the log file)
This came out of a google search:
This suggests errors like not properly initializing a pointer to a function before trying to use it, or perhaps overwriting a function's return address in stack, so the RET machine code instruction will end up trying to return to a wrong address


Can it be that you're server is too big for the machine to handle (Ram), the player might have done something "Specific" in the game which might have led to the server crashing from something in there?
- Possible to get this info from the player, would help to solve it.


- Out of memory, means that the server uses too much ram for the machine to handle which leads it to kill the server, what map are you running and any heavy systems that's installed on it?
(Here comes the Ram question, how much)

Got some more questions but waiting for this, I love this and would love to solve it.
 
Solution
Thank you Klonera for your help. lol I know what you mean, solving problems is one of the reasons I love ots :p

Here is some info about the server:

Intel W3520 - 16GB DDR3 ECC 1333 MHz - 240GB SSD
Bash:
root@ns3067:~# free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       3.3Gi        12Gi       3.0Mi       325Mi        12Gi
Swap:          1.0Gi          0B       1.0Gi

and game server:
TFS 0.4 8.6, real map. Using nginx as the web server. Been hosting this for years, the game server had an uptime of about 400 days before I upgraded to debian 11 (from debian 8)
It has been extremely stable but I had to make some changes on the ots source to be able to compile it on debian 11. Still, I don't think this is the reason behind the crashes since I was able to host it for like a month without problems after upgrading the system.

It crashed again today and there were no players online. I still couldn't figure out if it is the game server that is throttling the host server. I might try hosting it on a different pc to see how it would run.

I changed theforgottenserver's oom_score to -1000 to save it from the oom killer but then it started killing everything else except tfs.

Code: Bad RIP value
Like you mentioned I think the problem lies here. It might be a kernel issue. I have updated&upgraded the system and the kernel. Lets see what happens.

I have also checked ram performance to see if it is a hardware issue.

Bash:
root@ns3067:~# memtester 100 1
memtester version 4.5.0 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 100MB (104857600 bytes)
got  100MB (104857600 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok

Done.
 
Thank you Klonera for your help. lol I know what you mean, solving problems is one of the reasons I love ots :p

Here is some info about the server:

Intel W3520 - 16GB DDR3 ECC 1333 MHz - 240GB SSD
Bash:
root@ns3067:~# free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       3.3Gi        12Gi       3.0Mi       325Mi        12Gi
Swap:          1.0Gi          0B       1.0Gi

and game server:
TFS 0.4 8.6, real map. Using nginx as the web server. Been hosting this for years, the game server had an uptime of about 400 days before I upgraded to debian 11 (from debian 8)
It has been extremely stable but I had to make some changes on the ots source to be able to compile it on debian 11. Still, I don't think this is the reason behind the crashes since I was able to host it for like a month without problems after upgrading the system.

It crashed again today and there were no players online. I still couldn't figure out if it is the game server that is throttling the host server. I might try hosting it on a different pc to see how it would run.

I changed theforgottenserver's oom_score to -1000 to save it from the oom killer but then it started killing everything else except tfs.

Code: Bad RIP value
Like you mentioned I think the problem lies here. It might be a kernel issue. I have updated&upgraded the system and the kernel. Lets see what happens.

I have also checked ram performance to see if it is a hardware issue.

Bash:
root@ns3067:~# memtester 100 1
memtester version 4.5.0 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 100MB (104857600 bytes)
got  100MB (104857600 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok

Done.
I can't spot a why from this unless the DDR3 memory, but no clue why I think that would be an issue.
Try to do it on another PC as you said, could be that the upgrade messed something up? - check it again, see if you can spot anything there

Also, this came up in the console logs on your website
May be ports being blocked?
2d4d9e2b45c5c698303a978a1dc75f40.png
 
Last edited:
Back
Top