TFS 0.X Crashes I'm facing with

kor · Apr 18, 2021

Alpha · Apr 19, 2021

For the most information, instead of just running bt on the currently selected thread, you could paste us the output of this command: thread apply all bt full
This should give you a backtrace of all the threads, and adding full after bt prints the values of the local variables also.

Paco · Apr 19, 2021

I am your Mexican alterego. I run Rookgaard Tales server. I faced many crash problems in the past too. I've fixed them all, server stable now. Maybe I can help you. Send me a message or pay a visit to my server and we can have a chat!

kor · Apr 19, 2021

@Alpha Thanks, didn't know such tricks! I will wait for one more and update main post.
@Paco Hello, I will

Well, every server is stable until it will face a crash

Gesior.pl · Apr 19, 2021

@kor
1. Are you sure you are not running out of RAM? Is it possible, that someone is attacking you, by making XXX.XXX connections, to make you run out of RAM?
Do you use some firewall to limit connections per minute per IP?

EDIT:
2. Did you compile that engine on machine you are running it or you copied binary file from other machine?
3. Did you switch machine, reinstall linux or update any linux packages?

kor · Apr 19, 2021

Hello.

1. My VPS is 2 GB RAM, while TFS uses 524 MB, mysql 421 MB and PHP together with nginx and node scripts 202 MB. Rest (~850 MB) is free and some resides in buff/cache which I clear every server save at 06:00 with sync; echo 1 > /proc/sys/vm/drop_caches. But well, graphs shows something different at the time of crashes -

so it might be the case. About connection limit, I'm using only this iptables entry on that port iptables -A INPUT -p tcp --syn --dport XXX -m connlimit --connlimit-above 3 -j REJECT

2. Yes, engine was compiled on the same machine and configuration where it's running.
3. In 2018 I've moved to machine I'm currently on and according to apt logs, last time I updated packages was in Feb 2020

Gesior.pl · Apr 19, 2021

Running close to RAM limit is easiest way to get random crash.
For some reason 4 crashes were in 'ConnectionManager::createConnection' function and 5th in other function related to network. Easiest explanation would be attack with mass connections that make server go out of RAM in 1 second. Other possible reason is bug in some c++ library (boost?), but finding it would be super hard.

First step would be upgrade to 4 GB ram. You can also add logs of 'new connections' ( TCPDUMP capture new connections only (https://serverfault.com/questions/798745/tcpdump-capture-new-connections-only) ). After crash you can check, if there was spike in number of connections before crash.

Paco · Apr 19, 2021

Gesior.pl said:
Running close to RAM limit is easiest way to get random crash.
For some reason 4 crashes were in 'ConnectionManager::createConnection' function and 5th in other function related to network. Easiest explanation would be attack with mass connections that make server go out of RAM in 1 second. Other possible reason is bug in some c++ library (boost?), but finding it would be super hard.

First step would be upgrade to 4 GB ram. You can also add logs of 'new connections' ( TCPDUMP capture new connections only (https://serverfault.com/questions/798745/tcpdump-capture-new-connections-only) ). After crash you can check, if there was spike in number of connections before crash.

That's true I had to expand the RAM aswell.

kor · Apr 27, 2021

@Gesior.pl someone attacked me again. He was prepared this time, because during investigation (I have system which record player's packets to and from server) I found someone cloned some items (but it's just side-effect I'm able to revert). It happened 3x times today:

18:30:30 - console: just Segmentation fault (core dumped), gdb with thread apply all bt full as @Alpha mentioned: (gdb) thread apply all bt fullThread 3 (Thread 0x7fee1144e700 (LWP 18449)): - Pastebin.com (https://pastebin.com/Lfkyg5Wp)
18:36:45 - console: * Error in `./tfs0': corrupted size vs. prev_size: 0x0000000000b24300 *=== - Pastebin.com (https://pastebin.com/7mqm7jAP), gdb: (gdb) thread apply all bt fullThread 3 (Thread 0x7fdb884e1700 (LWP 29473)): - Pastebin.com (https://pastebin.com/1NVwS2Ka)
18:42:13 - console: * Error in `./tfs0': corrupted size vs. prev_size: 0x000000000172f7a0 *=== - Pastebin.com (https://pastebin.com/DpjUBmkN), gdb: (gdb) thread apply all bt fullThread 3 (Thread 0x7f593ba10700 (LWP 29783)): - Pastebin.com (https://pastebin.com/i2jJrRbn)

As you asked, there was running tcpdump from command tcpdump -i ens3 "port port_here and tcp[tcpflags] & (tcp-syn) != 0" and here it's output: 18:29:54.414073 IP 185.107.80.219.38708 > server_ip.server_port: Flags [S], seq - Pastebin.com (https://pastebin.com/KCJ9kU2y)

I'm still on 2 GB RAM, but this time at the time of crash it was on good enough level (drops are caused by sync; echo 1 > /proc/sys/vm/drop_caches cron command) and there wasn't any spikes while monitoring live:

Because of cloning happened, I can exclude random crash. Also, I've extended tcpdump to tcpdump -i ens3 -X -vv -e "portrange 7000-8000 and tcp[tcpflags] & (tcp-syn) != 0" to be sure it's fired on some other ports.

Gesior.pl · Apr 27, 2021

kor said:
@Gesior.pl someone attacked me again. He was prepared this time, because during investigation (I have system which record player's packets to and from server) I found someone cloned some items (but it's just side-effect I'm able to revert). It happened 3x times today:

18:30:30 - console: just Segmentation fault (core dumped), gdb with thread apply all bt full as @Alpha mentioned: (gdb) thread apply all bt fullThread 3 (Thread 0x7fee1144e700 (LWP 18449)): - Pastebin.com (https://pastebin.com/Lfkyg5Wp)

18:36:45 - console: * Error in `./tfs0': corrupted size vs. prev_size: 0x0000000000b24300 *=== - Pastebin.com (https://pastebin.com/7mqm7jAP), gdb: (gdb) thread apply all bt fullThread 3 (Thread 0x7fdb884e1700 (LWP 29473)): - Pastebin.com (https://pastebin.com/1NVwS2Ka)

18:42:13 - console: * Error in `./tfs0': corrupted size vs. prev_size: 0x000000000172f7a0 *=== - Pastebin.com (https://pastebin.com/DpjUBmkN), gdb: (gdb) thread apply all bt fullThread 3 (Thread 0x7f593ba10700 (LWP 29783)): - Pastebin.com (https://pastebin.com/i2jJrRbn)

As you asked, there was running tcpdump from command tcpdump -i ens3 "port port_here and tcp[tcpflags] & (tcp-syn) != 0" and here it's output: 18:29:54.414073 IP 185.107.80.219.38708 > server_ip.server_port: Flags [S], seq - Pastebin.com (https://pastebin.com/KCJ9kU2y)

I'm still on 2 GB RAM, but this time at the time of crash it was on good enough level (drops are caused by sync; echo 1 > /proc/sys/vm/drop_caches cron command) and there wasn't any spikes while monitoring live:

Because of cloning happened, I can exclude random crash. Also, I've extended tcpdump to tcpdump -i ens3 -X -vv -e "portrange 7000-8000 and tcp[tcpflags] & (tcp-syn) != 0" to be sure it's fired on some other ports.

Monitoring check RAM every X seconds. It cannot detect 1 second spike.
From that dump I can only read that AFTER crash people tried to connect again to OTS and last connection that came to server and - probably - crashed it was:

Code:

18:30:31.803718 IP 89-64-118-68.dynamic.chello.pl.6694 > server_ip.server_port: Flags [S], seq 3009279448, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0

but it looks absolutely normal.

How much RAM was allocated by OTS in moment of crash? Check size of 'core' file.

About "Fir3element/3777" engine. I updated it yesterday to make it compilable on Ubuntu 20.04. I updated many engines, but that was worst. It has more errors than 0.4 3777 and OTX2 together. If compilator report errors like 'return false from function that should return std::string', it means there must be 100 other logic errors that are not detectable by compiler.

eyez · Apr 27, 2021

kor said:
@Gesior.pl someone attacked me again. He was prepared this time, because during investigation (I have system which record player's packets to and from server) I found someone cloned some items (but it's just side-effect I'm able to revert). It happened 3x times today:

18:30:30 - console: just Segmentation fault (core dumped), gdb with thread apply all bt full as @Alpha mentioned: (gdb) thread apply all bt fullThread 3 (Thread 0x7fee1144e700 (LWP 18449)): - Pastebin.com (https://pastebin.com/Lfkyg5Wp)

18:36:45 - console: * Error in `./tfs0': corrupted size vs. prev_size: 0x0000000000b24300 *=== - Pastebin.com (https://pastebin.com/7mqm7jAP), gdb: (gdb) thread apply all bt fullThread 3 (Thread 0x7fdb884e1700 (LWP 29473)): - Pastebin.com (https://pastebin.com/1NVwS2Ka)

18:42:13 - console: * Error in `./tfs0': corrupted size vs. prev_size: 0x000000000172f7a0 *=== - Pastebin.com (https://pastebin.com/DpjUBmkN), gdb: (gdb) thread apply all bt fullThread 3 (Thread 0x7f593ba10700 (LWP 29783)): - Pastebin.com (https://pastebin.com/i2jJrRbn)

As you asked, there was running tcpdump from command tcpdump -i ens3 "port port_here and tcp[tcpflags] & (tcp-syn) != 0" and here it's output: 18:29:54.414073 IP 185.107.80.219.38708 > server_ip.server_port: Flags [S], seq - Pastebin.com (https://pastebin.com/KCJ9kU2y)

I'm still on 2 GB RAM, but this time at the time of crash it was on good enough level (drops are caused by sync; echo 1 > /proc/sys/vm/drop_caches cron command) and there wasn't any spikes while monitoring live:

Because of cloning happened, I can exclude random crash. Also, I've extended tcpdump to tcpdump -i ens3 -X -vv -e "portrange 7000-8000 and tcp[tcpflags] & (tcp-syn) != 0" to be sure it's fired on some other ports.

How did u catch the cloners?

Gesior.pl said:
Monitoring check RAM every X seconds. It cannot detect 1 second spike.
From that dump I can only read that AFTER crash people tried to connect again to OTS and last connection that came to server and - probably - crashed it was:

Code:

18:30:31.803718 IP 89-64-118-68.dynamic.chello.pl.6694 > server_ip.server_port: Flags [S], seq 3009279448, win 64240, options [mss 1420,nop,wscale 8,nop,nop,sackOK], length 0

but it looks absolutely normal.

How much RAM was allocated by OTS in moment of crash? Check size of 'core' file.

About "Fir3element/3777" engine. I updated it yesterday to make it compilable on Ubuntu 20.04. I updated many engines, but that was worst. It has more errors than 0.4 3777 and OTX2 together. If compilator report errors like 'return false from function that should return std::string', it means there must be 100 other logic errors that are not detectable by compiler.

I also use this engine here...

"I updated it yesterday to make it compilable on Ubuntu 20.04"
what are u mean with this? u mean that now u can compile this on ubuntu 20.04?
because i'm using it here on debian 9
"It has more errors than 0.4 3777" what are u mean?

where is your changes?

Gesior.pl · Apr 27, 2021

eyez said:
How did u catch the cloners?

I also use this engine here...

"I updated it yesterday to make it compilable on Ubuntu 20.04"
what are u mean with this? u mean that now u can compile this on ubuntu 20.04?
because i'm using it here on debian 9
"It has more errors than 0.4 3777" what are u mean?

where is your changes?

I made changes on someones VPS. I don't have copy of these sources. If you need version compilable on Debian 10 / Ubuntu 20.04, you can create it on your own using tutorial (error no 15 occured first time in these sources): [C++/Linux] Compiling old engine (sources) on Debian 10 / Ubuntu 20.04 (https://otland.net/threads/c-linux-compiling-old-engine-sources-on-debian-10-ubuntu-20-04.274654/)
VPS owner already did all changes, just made mistake in fixing 'error no 1' and it did not compile.

"It has more errors than 0.4 3777" what are u mean?

Compilation fails reporting new errors, not met on 0.4 3777. So author of that engine added his own new bugs. Bugs that bad that it's not possible to compile code. If he added there bugs like that, he probably added 100 other bugs that are not auto-detected by compilator, but may crash server.

kor · Apr 27, 2021

@Gesior.pl Indeed, every connection on that gameport looks normal as players was trying to rejoin during server summary between crashes and when it was opened. My core files very from 542 027 776 to 542 183 424 bytes. So when normally I have 400-500 MB free it really might be an issue? I thought Fir3element was actually 0.4 3777 version. Should I use "clean" version of 3777 from Backup of some old sources (https://otland.net/threads/backup-of-some-old-sources.199436/) or migrate as soon as possible to 1.2?

P.S. It's possible, that attack is done for other port and engine just got hit by ricochet?

@eyez As I mentioned, I've implemented "cam" system long time ago, so I can filter who was online just before crash and watch their record later like for example I'm doing sending proofs for AFK botters

Gesior.pl · Apr 27, 2021

kor said:
P.S. It's possible, that attack is done for other port and engine just got hit by ricochet?

Possible? Yes. Again, I would use attack related to RAM limit of your server. Spam website/overload mysql by website spam to make it use more RAM. Next, some app (OTS) try to allocate RAM and crashes.
It's keep on crashing on 'new player connection'. There must be some problem with that code.

First I would move to some new operating system and 4 GB machine. If it's somehow system/library related, you will get new versions of libraries and maybe bug will auto-fix.

Pure 3777 (official otland repo):

GitHub - otland/tfs-old-svn at r3777

TFS repository once kept private, converted from SVN - GitHub - otland/tfs-old-svn at r3777

github.com

Pure 3777 with changes that make it compilable on Debian 10 and Ubuntu 20.04:

GitHub - gesior/tfs_0.4_on_debian_10: TFS 0.4 rev 3777 changes to make it compilable on Debian 20 / Ubuntu 20

TFS 0.4 rev 3777 changes to make it compilable on Debian 20 / Ubuntu 20 - gesior/tfs_0.4_on_debian_10

github.com

Of course converting to 1.2 would be good, but in case you want to use 8.6 protocol, it will require a lot of work. There is no ready-to-run TFS 1.2 client 8.6 version. There are a lot of them, but everyone who use them report some problems with 8.6 features.

eyez · Apr 27, 2021

Gesior.pl said:
Pure 3777 (official otland repo):

GitHub - otland/tfs-old-svn at r3777

TFS repository once kept private, converted from SVN - GitHub - otland/tfs-old-svn at r3777

github.com

Pure 3777 with changes that make it compilable on Debian 10 and Ubuntu 20.04:

GitHub - gesior/tfs_0.4_on_debian_10: TFS 0.4 rev 3777 changes to make it compilable on Debian 20 / Ubuntu 20

TFS 0.4 rev 3777 changes to make it compilable on Debian 20 / Ubuntu 20 - gesior/tfs_0.4_on_debian_10

github.com

Of course converting to 1.2 would be good, but in case you want to use 8.6 protocol, it will require a lot of work. There is no ready-to-run TFS 1.2 client 8.6 version. There are a lot of them, but everyone who use them report some problems with 8.6 features.

Nice, i was almost trying to move my 8.6 for 1.2
But looking on the forum i found a lot of people asking for help because of bugs (my 0.4 is totally stable, i got 400 hours of uptime)

kor said:
@eyez As I mentioned, I've implemented "cam" system long time ago, so I can filter who was online just before crash and watch their record later like for example I'm doing sending proofs for AFK botters

Lol your server made this video?

kor · Apr 27, 2021

@Gesior.pl Thank you so much much much. For now, I will try to run server on top of your changes with higher RAM. I asked about 1.2, because my own migration began 1,5 years ago, but never managed to finish and test it yet, looks like the time has come

@eyez every server is stable until someone will take it down

About video, my server recorded and "displayed" it for me in client, I still have to screen-record it and put on YT, but most things I already automated.

TFS 0.X Crashes I'm facing with

kor

PHP ziom

Alpha

Relentless

Paco

Member

kor

PHP ziom

Gesior.pl

Mega Noob&LOL 2012

kor

PHP ziom

Gesior.pl

Mega Noob&LOL 2012

Paco

Member

kor

PHP ziom

Gesior.pl

Mega Noob&LOL 2012

eyez

Member

Gesior.pl

Mega Noob&LOL 2012

kor

PHP ziom

Gesior.pl

Mega Noob&LOL 2012

GitHub - otland/tfs-old-svn at r3777

GitHub - gesior/tfs_0.4_on_debian_10: TFS 0.4 rev 3777 changes to make it compilable on Debian 20 / Ubuntu 20

eyez

Member

GitHub - otland/tfs-old-svn at r3777

GitHub - gesior/tfs_0.4_on_debian_10: TFS 0.4 rev 3777 changes to make it compilable on Debian 20 / Ubuntu 20

kor

PHP ziom

Similar threads