Optimizing TFS Pathfinding

gudan garam · Jul 28, 2018

kondra said:
if you really want to optimize it, i would recommend to start by learning some theory. The problem you are facing is the Shortest path problem (fastest path in this case but that's almost the same) - Shortest path problem - Wikipedia
Original solution is using something similar to Dijkstra's algorithm (complexity nlog(n)). With your modification you changes it to heuristic algorithm (also wiki page - A* search algorithm - Wikipedia ). It's gonna help, but in this case, only a bit.
The problem with tfs implementation is that it doesn't use processor cache correctly (it loads much data from memory in random order).
If you don't know what I am talking about, i prepared small example

Code:

#include <chrono> #include <iostream> std::chrono::high_resolution_clock::time_point time_point; uint64_t micros; int main() { int T[1000][1000] = {0}; time_point = std::chrono::high_resolution_clock::now(); for(int x = 0; x < 1000; ++x) { for(int y = 0; y < 1000; ++y) { T[x][y] = x + y; } } micros = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now() - time_point).count(); if(T[11][33] == 0) // without that gcc will remove for as part of optimization coz T won't be used return 0; std::cout << "Took " << micros << " microseconds" << std::endl; time_point = std::chrono::high_resolution_clock::now(); for(int x = 0; x < 1000; ++x) { for(int y = 0; y < 1000; ++y) { T[y][x] = x + y; } } micros = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now() - time_point).count(); if(T[11][33] == 0) return 0; std::cout << "Took " << micros << " microseconds" << std::endl; return 0; }

In this code you have 2 loops, doing exactly the same, but execution time is much different.

Code:

Took 409 microseconds Took 3677 microseconds

First loop is almost 9x faster than second! And as I said, both are doing exactly the same. It's good to know why if you want to start optimizing code in really efficient way.
If you want to make that function really fast, you must keep all processing data in memory block smaller than 4kb (it's possible to do it in smaller than 1kb block).
It's possible to speed up this function at least by 4 times, in normal case it's using ~10% of cpu time used by tfs (for eg. walking is using up to 30%).

This occurs because T is stored in rows and not columns, so it is layed out in memory like this:
T[0][0], T[0][1], T[0][2], ..., T[1000][999], T[1000]T[1000].

So when you are acessing its memory in the first loop, you have the region you are using already in cache because when the processor pulls T[0][0] to cache, T[0][1] T[0][2]... comes together.

Now on the second loop you change T[0][0], then T[1][0] and theese two memory points have a thousand ints in between them (the whole T[0] index), so the processor has to go into memory to get it which takes more time. (more cache misses)

I could be wrong

Edit--

I've recently changed a couple of std::lists to std::vector on tfs and that alone was like magic regarding cpu usage, but unfortunately when the vector gets reallocated we get some crashes becuase the pointers are invalidated.

Ezzz · Aug 7, 2018

This post is interesting, so in my Tibia 7.7 recreation project which has been slowly building up during these last weeks, I made an A* from zero, and I wanted to test the benchmark by spamming click on map and running the search loop 1000 times.

Now when adding code to send a teleport effect on every node that is checked to search for the best predecessor node within neighbors and points.

(CPU increased by a small margin due to sending multiple teleport effects, without it the results are as the first image, red area is start area, yellow area is the area clicked on the map).

The generated interval is made up with:

Code:

std::cout << "> Map loading time: " << (g_game.serverMilliseconds() - micros) / (1000.) << " seconds." << std::endl;

This post might be useless in here, but I found interesting the roughly over 0.010 s I'm seeing here from other tests.
The view range of the A* is 15 SQMs long in both X & Y range.

Are my results okay or what should I do to benchmark my own A* - or am I getting the good results due to what Kondrah said a few posts earlier?

Yamaken · Aug 7, 2018

gudan garam said:
This occurs because T is stored in rows and not columns, so it is layed out in memory like this:
T[0][0], T[0][1], T[0][2], ..., T[1000][999], T[1000]T[1000].

So when you are acessing its memory in the first loop, you have the region you are using already in cache because when the processor pulls T[0][0] to cache, T[0][1] T[0][2]... comes together.

Now on the second loop you change T[0][0], then T[1][0] and theese two memory points have a thousand ints in between them (the whole T[0] index), so the processor has to go into memory to get it which takes more time. (more cache misses)

I could be wrong

Edit--

I've recently changed a couple of std::lists to std::vector on tfs and that alone was like magic regarding cpu usage, but unfortunately when the vector gets reallocated we get some crashes becuase the pointers are invalidated.

Where you did chance std::lists to std::vector?

pepsiman · Aug 7, 2018

i like pro developers wars xD

DemShion · Sep 25, 2018

For anyone curious this an extract from the following link What is a "cache-friendly" code? provided by @kondra (thanks a lot yo).

Exploiting the ordering (e.g. changing column index first in c++):

M[0][0] (memory) + M[0][1] (cached) + M[1][0] (memory) + M[1][1] (cached)
= 1 + 2 + 3 + 4
--> 2 cache hits, 2 memory accesses
Not exploiting the ordering (e.g. changing row index first in c++):

M[0][0] (memory) + M[1][0] (memory) + M[0][1] (memory) + M[1][1] (memory)
= 1 + 3 + 2 + 4
--> 0 cache hits, 4 memory accesses
In this simple example, exploiting the ordering approximately doubles execution speed (since memory access requires much more cycles than computing the sums). In practice the performance difference can be much larger.

Togu · Feb 9, 2019

How I made my own pathfinder for an unity top down game with north, east, west and south move directions:

Learn the theory and make your own pathfinder function. Mine is called "GetShortestPath" and you can call it every time the target moves or everytime your character hits a moving obstacle like monster or player.

TFS 1.3 pathfinder system is bugged for fast monsters and people are offering 100$ on github for a revamp.

Ahilphino · Feb 9, 2019

Togu said:
TFS 1.3 pathfinder system is bugged for fast monsters and people are offering 100$ on github for a revamp.

thats an easy fix, u change 2 lines of code and the monsters works as they should lol
and its also available for the public

Night Wolf · Feb 11, 2019

GitHub - bonzaiferroni/Traveler: Traveler - A general movement solution for Screeps.com

Traveler - A general movement solution for Screeps.com - bonzaiferroni/Traveler

github.com

Jarofir · Apr 30, 2019

Ahilphino said:
thats an easy fix, u change 2 lines of code and the monsters works as they should lol
and its also available for the public

Where can I find that fix?

rwxsu · May 26, 2019

Yamaken said:
Where you did chance std::lists to std::vector?

He was probably referring to std::forward_list in the original post

rwxsu · Jun 14, 2019

@kondra Did you rewrite the whole map structure so that columns are accessed sequentially in memory? In other words: to cache columns instead of rows? (because when sending to client it expects one column at a time; not row). And because all servers have it the wrong way, when they send information to the client, the tile info is not accessed sequentially? Or did you find other more important caching problems?

Example:

Servers store their tiles in map like this in memory:

0x00	0x01	0x02	0x03	0x04	0x05	0x06	0x07
100,100,7	101,100,7	102,100,7	103,100,7	104,100,7	105,100,7	106,100,7	107,1007

Client expects:

0x00	0x01	0x02	0x03	0x04	0x05	0x06	0x07
100,100,7	100,101,7	100,102,7	100,103,7	100,104,7	100,105,7	100,106,7	100,107,7

so when sending you don't get sequential memory access. Instead you should store it the same way the client expects it?

Other:
I have only looked at how client 7.4 expects map
I have not looked at how RME stores map (you probably have to rewrite how to read map) and initial map loading is probably going to increase

kondra · Jun 17, 2019

rwxsu said:
@kondra Did you rewrite the whole map structure so that columns are accessed sequentially in memory? In other words: to cache columns instead of rows? (because when sending to client it expects one column at a time; not row). And because all servers have it the wrong way, when they send information to the client, the tile info is not accessed sequentially? Or did you find other more important caching problems?

Example:

Servers store their tiles in map like this in memory:

0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
100,100,7 101,100,7 102,100,7 103,100,7 104,100,7 105,100,7 106,100,7 107,1007

Client expects:

0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
100,100,7 100,101,7 100,102,7 100,103,7 100,104,7 100,105,7 100,106,7 100,107,7

so when sending you don't get sequential memory access. Instead you should store it the same way the client expects it?

Other:
I have only looked at how client 7.4 expects map
I have not looked at how RME stores map (you probably have to rewrite how to read map) and initial map loading is probably going to increase

Yes, I use custom structure for storing map instead O(log2) complexity to get tile it does it in O(1) (in 3 steps/ram queries exactly). So getTile is 6-8x faster in the end

ralke · Jul 1, 2021

If I might ask, I don't know much about programming but I would like to improve the creature behaviour, I see the pathfinding of this post is much more improved on far distances if the client has extended window. How this can exactly affects the perfomance of the server? Will raise the memory ussage a little, or a lot and players won't be able to play?

Also saw that the old code has some differences than nekiro's 8.6, example:

creature.cpp

Code:

std::forward_list<Direction>

and lastest sources have:

Code:

std::vector<Direction>

How do I properly merge? I have errors when merging because the syntaxis changed on newer TFS and don't know what I have to change to make it work. Thanks in advance!

Kontownik · Dec 31, 2021

Do you guys think it's worth rewriting it for smaller servers/maps? Because from what I understand, most of you host RL Map ones, right?
It will take some effort to rewrite the distro together with RME as mentioned by rwxsu. On the other hand, maybe it's worth including it in the official TFS core (both pathfinding and map changes)?

Optimizing TFS Pathfinding

gudan garam

Advanced OT User

Ezzz

Developer of Nostalrius and The Violet Project

Yamaken

Pro OpenTibia Developer

pepsiman

---

DemShion

New Member

Togu

Advanced OT User

Ahilphino

Excellent OT User

Night Wolf

I don't bite.

GitHub - bonzaiferroni/Traveler: Traveler - A general movement solution for Screeps.com

Jarofir

New Member

rwxsu

Computer Engineer

rwxsu

Computer Engineer

kondra

Excellent OT User

ralke

(҂ ͠❛ ෴ ͡❛)ᕤ

Kontownik

New Member

Similar threads

Optimizing TFS Pathfinding

Advanced OT User

Developer of Nostalrius and The Violet Project

Pro OpenTibia Developer

---

New Member

Advanced OT User

Excellent OT User

I don't bite.

New Member

Computer Engineer

Computer Engineer

Excellent OT User

(҂ ͠❛ ෴ ͡❛)ᕤ

New Member

Similar threads

(҂ ͠❛ ෴ ͡❛)ᕤ