Optimizing TFS Pathfinding

Flatlander · Mar 20, 2018

I am making this tutorial to go along with my Monster View-Range Tutorial.

When adding more view-range to a monster, you quickly find your server is constantly freezing due to path-finding. (Especially if you are like me and want wolves to stalk your poor players from 50 tiles away)

To fix this, I added a bit more optimization to the Pathfinding in TFS.
Below is how you can also make these changes.
(I am self-taught, what I do does work, but it might not be the most efficient and best way to do it, if someone else knows how to make my code even more optimized please criticize and teach me how)

First we need to be able to pull the Target Position while in BestNode to help calculate the true "best node".

In Creature.cpp edit the following:

Code:

bool Creature::getPathTo(const Position& targetPos, std::forward_list<Direction>& dirList, const FindPathParams& fpp) const
{
   return g_game.map.getPathMatching(*this, dirList, FrozenPathingConditionCall(targetPos), fpp);
}

Add targetPos into getPathMatching after *this,

Code:

bool Creature::getPathTo(const Position& targetPos, std::forward_list<Direction>& dirList, const FindPathParams& fpp) const
{
   return g_game.map.getPathMatching(*this, targetPos, dirList, FrozenPathingConditionCall(targetPos), fpp);
}

In map.h edit the following:

Code:

       bool getPathMatching(const Creature& creature, std::forward_list<Direction>& dirList,
                            const FrozenPathingConditionCall& pathCondition, const FindPathParams& fpp) const;

Add targetPos here too:

Code:

       bool getPathMatching(const Creature& creature, const Position& targetPos, std::forward_list<Direction>& dirList,
                            const FrozenPathingConditionCall& pathCondition, const FindPathParams& fpp) const;

And also in map.h edit the following:

Code:

AStarNode* getBestNode();

Adding targetPos here too:

Code:

AStarNode* getBestNode(const Position& targetPos);

In map.cpp edit the following:

Code:

bool Map::getPathMatching(const Creature& creature, std::forward_list<Direction>& dirList, const FrozenPathingConditionCall& pathCondition, const FindPathParams& fpp) const

Add targetPos here too:

Code:

bool Map::getPathMatching(const Creature& creature, const Position& targetPos, std::forward_list<Direction>& dirList, const FrozenPathingConditionCall& pathCondition, const FindPathParams& fpp) const

Then edit this BestNode in map.cpp:

Code:

    AStarNode* found = nullptr;
   while (fpp.maxSearchDist != 0 || nodes.getClosedNodes() < 100) {
       AStarNode* n = nodes.getBestNode();

To include targetPos:

Code:

    AStarNode* found = nullptr;
   while (fpp.maxSearchDist != 0 || nodes.getClosedNodes() < 100) {
       AStarNode* n = nodes.getBestNode(targetPos);

And now for the big edit, the getBestNode() function itself:

Code:

AStarNode* AStarNodes::getBestNode()
{
   if (curNode == 0) {
       return nullptr;
   }

   int32_t best_node_f = std::numeric_limits<int32_t>::max();
   int32_t best_node = -1;
   for (size_t i = 0; i < curNode; i++) {
       if (openNodes[i] && nodes[i].f < best_node_f) {
           best_node_f = nodes[i].f;
           best_node = i;
       }
   }

   if (best_node >= 0) {
       return nodes + best_node;
   }
   return nullptr;
}

Change it to the above (I have comments on why each line exists)

Code:

AStarNode* AStarNodes::getBestNode(const Position& targetPos)
{
   if (curNode == 0) {
       return nullptr;
   }
  
   int32_t diffNode = 0; //Difference between nodes

   int32_t best_node_f = std::numeric_limits<int32_t>::max();
   int32_t best_node = -1;
   for (size_t i = 0; i < curNode; i++) {
       //I am not familiar with 1.X so there is probably a better way to get all these variables, also the below is the untested in TFS 1.X, hope it works
       const int_fast32_t Sx = nodes[0].x; //Starting X Position
       const int_fast32_t Sy = nodes[0].y; //Starting Y Position
       const int_fast32_t Cx = nodes[i].x; //node[i] X Position
       const int_fast32_t Cy = nodes[i].y; //node[i] Y Position
       int32_t SdiffX = std::abs(targetPos.x - Sx); //X distance from the starting location
       int32_t SdiffY = std::abs(targetPos.y - Sy); //Y distance from the starting location
       int32_t NdiffX = std::abs(targetPos.x - Cx); //X distance from node[i] position
       int32_t NdiffY = std::abs(targetPos.y - Cy); //X distance from node[i] position
       diffNode = ((nodes[i].f+(((NdiffX-SdiffX)*5)+((NdiffY-SdiffY)*5)))+(std::max(NdiffX, NdiffY)*5)); //I messed around with this formula a lot to try to get the best performance, for me this works the best
       if (openNodes[i] && diffNode < best_node_f) { //Compares the formula above with the current Best Node
           best_node_f = diffNode; //Sets the Best Nodes new value to beat
           best_node = i;
       }
   }

   if (best_node >= 0) {
       return nodes + best_node;
   }
   return nullptr;
}

The above SHOULD work.
The only thing that may be done incorrectly is getting the pos.x and pos.y from the Nodes[] (I do this differently in TFS 0.X, but tried to use the same way TFS 1.X has it done)

AFTER you do this, you should be able to go to my next tutorial adding monster ViewRange to the monsters.xml so you can have monsters that see farther than screen-size.

kito2 · Apr 29, 2018

Flatlander said:

I am making this tutorial to go along with my Monster View-Range Tutorial.

When adding more view-range to a monster, you quickly find your server is constantly freezing due to path-finding. (Especially if you are like me and want wolves to stalk your poor players from 50 tiles away)

To fix this, I added a bit more optimization to the Pathfinding in TFS.
Below is how you can also make these changes.
(I am self-taught, what I do does work, but it might not be the most efficient and best way to do it, if someone else knows how to make my code even more optimized please criticize and teach me how)

First we need to be able to pull the Target Position while in BestNode to help calculate the true "best node".

In Creature.cpp edit the following:

Code:

bool Creature::getPathTo(const Position& targetPos, std::forward_list<Direction>& dirList, const FindPathParams& fpp) const
{
   return g_game.map.getPathMatching(*this, dirList, FrozenPathingConditionCall(targetPos), fpp);
}

Add targetPos into getPathMatching after *this,

Code:

bool Creature::getPathTo(const Position& targetPos, std::forward_list<Direction>& dirList, const FindPathParams& fpp) const
{
   return g_game.map.getPathMatching(*this, targetPos, dirList, FrozenPathingConditionCall(targetPos), fpp);
}

In map.h edit the following:

Code:

       bool getPathMatching(const Creature& creature, std::forward_list<Direction>& dirList,
                            const FrozenPathingConditionCall& pathCondition, const FindPathParams& fpp) const;

Add targetPos here too:

Code:

       bool getPathMatching(const Creature& creature, const Position& targetPos, std::forward_list<Direction>& dirList,
                            const FrozenPathingConditionCall& pathCondition, const FindPathParams& fpp) const;

And also in map.h edit the following:

Code:

AStarNode* getBestNode();

Adding targetPos here too:

Code:

AStarNode* getBestNode(const Position& targetPos);

In map.cpp edit the following:

Code:

bool Map::getPathMatching(const Creature& creature, std::forward_list<Direction>& dirList, const FrozenPathingConditionCall& pathCondition, const FindPathParams& fpp) const

Add targetPos here too:

Code:

bool Map::getPathMatching(const Creature& creature, const Position& targetPos, std::forward_list<Direction>& dirList, const FrozenPathingConditionCall& pathCondition, const FindPathParams& fpp) const

Then edit this BestNode in map.cpp:

Code:

    AStarNode* found = nullptr;
   while (fpp.maxSearchDist != 0 || nodes.getClosedNodes() < 100) {
       AStarNode* n = nodes.getBestNode();

To include targetPos:

Code:

    AStarNode* found = nullptr;
   while (fpp.maxSearchDist != 0 || nodes.getClosedNodes() < 100) {
       AStarNode* n = nodes.getBestNode(targetPos);

And now for the big edit, the getBestNode() function itself:

Code:

AStarNode* AStarNodes::getBestNode()
{
   if (curNode == 0) {
       return nullptr;
   }

   int32_t best_node_f = std::numeric_limits<int32_t>::max();
   int32_t best_node = -1;
   for (size_t i = 0; i < curNode; i++) {
       if (openNodes[i] && nodes[i].f < best_node_f) {
           best_node_f = nodes[i].f;
           best_node = i;
       }
   }

   if (best_node >= 0) {
       return nodes + best_node;
   }
   return nullptr;
}

Change it to the above (I have comments on why each line exists)

Code:

AStarNode* AStarNodes::getBestNode(const Position& targetPos)
{
   if (curNode == 0) {
       return nullptr;
   }
  
   int32_t bestNodeF = 1000000; //High Cost to start with
   uint32_t bestNode = 0; //Current best node
   int32_t diffNode = 0; //Difference between nodes

   int32_t best_node_f = std::numeric_limits<int32_t>::max();
   int32_t best_node = -1;
   for (size_t i = 0; i < curNode; i++) {
       //I am not familiar with 1.X so there is probably a better way to get all these variables, also the below is the untested in TFS 1.X, hope it works
       const int_fast32_t Sx = nodes[0]->x; //Starting X Position
       const int_fast32_t Sy = nodes[0]->y; //Starting Y Position
       const int_fast32_t Cx = nodes[i]->x; //node[i] X Position
       const int_fast32_t Cy = nodes[i]->y; //node[i] Y Position
       int32_t SdiffX = std::abs(targetPos.x - Sx); //X distance from the starting location
       int32_t SdiffY = std::abs(targetPos.y - Sy); //Y distance from the starting location
       int32_t NdiffX = std::abs(targetPos.x - Cx); //X distance from node[i] position
       int32_t NdiffY = std::abs(targetPos.y - Cy); //X distance from node[i] position
       diffNode = ((nodes[i].f+(((NdiffX-SdiffX)*5)+((NdiffY-SdiffY)*5)))+(std::max(NdiffX, NdiffY)*5)); //I messed around with this formula a lot to try to get the best performance, for me this works the best
       if (openNodes[i] && diffNode < best_node_f) { //Compares the formula above with the current Best Node
           best_node_f = diffNode; //Sets the Best Nodes new value to beat
           best_node = i;
       }
   }

   if (best_node >= 0) {
       return nodes + best_node;
   }
   return nullptr;
}

The above SHOULD work.
The only thing that may be done incorrectly is getting the pos.x and pos.y from the Nodes[] (I do this differently in TFS 0.X, but tried to use the same way TFS 1.X has it done)

AFTER you do this, you should be able to go to my next tutorial adding monster ViewRange to the monsters.xml so you can have monsters that see farther than screen-size.

Nice!

Flatlander · Apr 30, 2018

I fixed all the code issues on this thread to match the Pull Request I made to TFS.

This is now tested (Thanks to Xagul) and it is 100% working.
Mark has expressed some concerns about it may not choosing the "best" path, I asked him to give an example of what could go wrong.

Shouldn't be too hard to fix any issue I don't think. I double-checked the code and can't find a way we would get a wrong path, so i'll have to wait for more information from him.

kito2 · Apr 30, 2018

Excellent news, I am going to integrate this optimization too.

4drik · May 6, 2018

I just tested it on YurOTS.
First time I made a test for the same path:

I saw that in the case of shorter distances, the old code is doing better.
However, it was only one first test.
So I decided to do the measurement a few more times - adding the result this time and comparing the final number.
It looked like this:

Final results:
New: 0.212 s, 0.143 s, 0.157 s
Old: 0.112 s, 0.12 s, 0.076 s

Each time the old bestNode function won.

Take a note that I just wanted to check how it works for me, for old server which is YurOTS.

Flatlander · May 28, 2018

Interesting, I guess you are going to make me do a testing video huh?

Also can you let me know how you tested it?

Also what path(s) did you run?

Flatlander · May 29, 2018

Below is testing on the Newest TFS Version.
It is on a fast computer, therefor we needed to find the path a thousand times to get even milliseconds to show up.

This is using the "Use" function to go to a tile, so this is running a lower number of max-nodes (I think 100 is the default for onUse movement)
When you see "No Way" it means it could not find a path to that location by the time it reached Max Nodes.

Maze:

Wall:

Open:

From this, I have determined that it takes around 20-25% more time to run my version of "getBestNode".
BUT, it more often reduces the amount of nodes required so greatly that it is still faster to use my new optimized system.

kito2 · May 29, 2018

Flatlander said:
Below is testing on the Newest TFS Version.
It is on a fast computer, therefor we needed to find the path a thousand times to get even milliseconds to show up.

This is using the "Use" function to go to a tile, so this is running a lower number of max-nodes (I think 100 is the default for onUse movement)
When you see "No Way" it means it could not find a path to that location by the time it reached Max Nodes.

Maze:

Wall:

Open:

From this, I have determined that it takes around 20-25% more time to run my version of "getBestNode".
BUT, it more often reduces the amount of nodes required so greatly that it is still faster to use my new optimized system.

Excellent!

kondra · May 29, 2018

if you really want to optimize it, i would recommend to start by learning some theory. The problem you are facing is the Shortest path problem (fastest path in this case but that's almost the same) - Shortest path problem - Wikipedia
Original solution is using something similar to Dijkstra's algorithm (complexity nlog(n)). With your modification you changes it to heuristic algorithm (also wiki page - A* search algorithm - Wikipedia ). It's gonna help, but in this case, only a bit.
The problem with tfs implementation is that it doesn't use processor cache correctly (it loads much data from memory in random order).
If you don't know what I am talking about, i prepared small example

Code:

#include <chrono>
#include <iostream>

std::chrono::high_resolution_clock::time_point time_point;
uint64_t micros;

int main() {
    int T[1000][1000] = {0};
    time_point = std::chrono::high_resolution_clock::now();
    for(int x = 0; x < 1000; ++x) {
        for(int y = 0; y < 1000; ++y) {
            T[x][y] = x + y;
        }
    }
    micros = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now() - time_point).count();
    if(T[11][33] == 0)  // without that gcc will remove for as part of optimization coz T won't be used
        return 0;
    std::cout << "Took " << micros << " microseconds" << std::endl;
    time_point = std::chrono::high_resolution_clock::now();
    for(int x = 0; x < 1000; ++x) {
        for(int y = 0; y < 1000; ++y) {
            T[y][x] = x + y;
        }
    }
    micros = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now() - time_point).count();
    if(T[11][33] == 0)
        return 0;
    std::cout << "Took " << micros << " microseconds" << std::endl;
    return 0;
}

In this code you have 2 loops, doing exactly the same, but execution time is much different.

Code:

Took 409 microseconds
Took 3677 microseconds

First loop is almost 9x faster than second! And as I said, both are doing exactly the same. It's good to know why if you want to start optimizing code in really efficient way.
If you want to make that function really fast, you must keep all processing data in memory block smaller than 4kb (it's possible to do it in smaller than 1kb block).
It's possible to speed up this function at least by 4 times, in normal case it's using ~10% of cpu time used by tfs (for eg. walking is using up to 30%).

Flatlander · May 29, 2018

kondra said:
if you really want to optimize it, i would recommend to start by learning some theory. The problem you are facing is the Shortest path problem (fastest path in this case but that's almost the same) - Shortest path problem - Wikipedia
Original solution is using something similar to Dijkstra's algorithm (complexity nlog(n)). With your modification you changes it to heuristic algorithm (also wiki page - A* search algorithm - Wikipedia ). It's gonna help, but in this case, only a bit.
The problem with tfs implementation is that it doesn't use processor cache correctly (it loads much data from memory in random order).
If you don't know what I am talking about, i prepared small example

Code:

#include <chrono> #include <iostream> std::chrono::high_resolution_clock::time_point time_point; uint64_t micros; int main() { int T[1000][1000] = {0}; time_point = std::chrono::high_resolution_clock::now(); for(int x = 0; x < 1000; ++x) { for(int y = 0; y < 1000; ++y) { T[x][y] = x + y; } } micros = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now() - time_point).count(); if(T[11][33] == 0) // without that gcc will remove for as part of optimization coz T won't be used return 0; std::cout << "Took " << micros << " microseconds" << std::endl; time_point = std::chrono::high_resolution_clock::now(); for(int x = 0; x < 1000; ++x) { for(int y = 0; y < 1000; ++y) { T[y][x] = x + y; } } micros = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now() - time_point).count(); if(T[11][33] == 0) return 0; std::cout << "Took " << micros << " microseconds" << std::endl; return 0; }

In this code you have 2 loops, doing exactly the same, but execution time is much different.

Code:

Took 409 microseconds Took 3677 microseconds

First loop is almost 9x faster than second! And as I said, both are doing exactly the same. It's good to know why if you want to start optimizing code in really efficient way.
If you want to make that function really fast, you must keep all processing data in memory block smaller than 4kb (it's possible to do it in smaller than 1kb block).
It's possible to speed up this function at least by 4 times, in normal case it's using ~10% of cpu time used by tfs (for eg. walking is using up to 30%).

First, I want to thank you for trying to help and teach me something I am completely ignorant on.
I have no idea how to optimize code by using different methods, or how to use hardware (like processor cache) to my advantage.

Things like this are basically impossible to "find" online.
If you google "Best way to ~~ in C++" you get thousands of competing theories of how you should do things, so I just end up doing what looks easiest to understand or what I know.

Now, onto your above post.
I'm trying to "understand" why the first one would be faster, so I'm guessing:

In my "Non-Programmer Mind" I visualize it like this.
T[1000][1000] creates a table inside a table you can put information in.

Below is why I think the first one runs faster. (Let me know if I am correct)

Lets say it's just: T[2][2]
Basically I visualize it as:

Code:

t = {
  [1] = {
    [1] = {},
    [2] = {},
  },
  [2] = {
    [1] = {},
    [2] = {},
  },
}

Your first code basically grabs 1 sub-table, then loops through it's content.
But your 2nd run alternates tables.

I'm guessing what you mean by Processor Cache, is that when you run:
T[1][1] = 1 + 1
Which is: (T[x][y] = x + y)
It pulls the sub-table into the Processor Cache:
so It now has
T[1] in cache so if you reference it again it can pull it super-fast.

I think the problem is that the nodes function is basically this:

Code:

nodes = { [0] = {x = 0, y = 0, z = 0, f = 0, prev = 0} [1] = {x = 1, y = 1, z = 0, f = 2, prev = 0} [2] = {x = 2, y = 1, z = 0, f = 4, prev = 1} }

There isn't a sub-table to loop through, and I'm guessing you are saying to try to somehow change the table-structure and the way it searches through it to utilize sub-tables so it can cache it and be faster.

I'm not sure how I would even achieve that at first thought, but I can brain-storm it I guess.

Also, I may be COMPLETELY incorrect on what I think you are trying to teach me, which is why I explained it in as much detail as possible so you can correct me if i'm wrong.

Let me know if I am right about how processor cache works using my above example.

Evil Hero · May 29, 2018

As far as I understand it's a different case.
In the faster example you already create the multidimensional array which means that 1000 x 1000 indices equal to 0
As you already have a pre created array and just change values it's a lot faster then create the index and re sign it's value, which is 2 steps instead of 1 (in c++ if I recall right)

In lua you have the same problem
table.insert(t, 1) is always slower then t[#t+1] = 1

kondra · May 29, 2018

i am bad at teaching and explaining, but google is better - What is a "cache-friendly" code?
Evil hero, maybe try to change order first and test it before posting - spoiler - it's not connected. And I talk about c++, not lua.

Evil Hero · May 29, 2018

nvm been overflying your post and had a main focus on his, so I missed the main point

just noticed that the indexing is swapped in both of the loops, makes sense now why it's tons slower.
In that case jumping through arrays and altering them is ofc a lot slower then loading an array loop through it and go onto the next upper array.
I should read more carefully next time, my fault

Flatlander · May 29, 2018

Evil Hero said:
nvm been overflying your post and had a main focus on his, so I missed the main point
just noticed that the indexing is swapped in both of the loops, makes sense now why it's tons slower.
In that case jumping through arrays and altering them is ofc a lot slower then loading an array loop through it and go onto the next upper array.
I should read more carefully next time, my fault

So it sounds like I was right.
When you load an Array, and loop through it, it is faster than altering which array you are dealing with each call.

But I still can't think of how I can apply this type of optimization to getBestNode or nodes.

kingsley666 · May 30, 2018

Hello, i really don't know how TFS cores works, but i was thinking on what you say about optimizing the BestNode(), this is what i thought (i don´t know the original code):

- Depends on the range of view of the creature, you can draw a square arround the creature based on its range of view, for each sqm you analize if it's a walkable sqm to help the algoryth determine the fastest way on walkable sqm, all that information will go to a table, then you locate the creature and the target positions and then you aply the theory for shorter way on graphs.

- In theory this should reduce de calculations about the best route to get the target.

darkshin · May 30, 2018

Not sure if this will help, but Ive tested and its working.
If someone wants to run a benchmark on it,
somehow my tries only printed 0.0000 ms

I've avoided to declare new variables each loop and copy values, keeping only pointers to the new values.
Im not sure its relevance since I couldn't benchmark it
Here it goes:

Code:

    uint16_t* Sx = 0;
    uint16_t* Sy = 0;
    uint16_t* Cx = 0;
    uint16_t* Cy = 0;
 
    for (size_t i = 0; i < curNode; i++) {
        Sx = &nodes[0].x; //Starting X Position
        Sy = &nodes[0].y; //Starting Y Position
        Cx = &nodes[i].x; //node[i] X Position
        Cy = &nodes[i].y; //node[i] Y Position

        diffNode = ((nodes[i].f +
            (((std::abs(&targetPos.x - Cx) - std::abs(&targetPos.x - Sx)) * 5) + //X distance from node[i] position - X distance from the starting location
            ((std::abs(&targetPos.y - Cy) - std::abs(&targetPos.y - Sy)) * 5))) + //Y distance from node[i] position - Y distance from the starting location
                (std::max<int32_t>(std::abs(&targetPos.x - Cx), std::abs(&targetPos.y - Cy)) * 5)); //max(X distance from node[i] position, Y distance from node[i] position,)
   
        if (openNodes[i] && diffNode < best_node_f) { //Compares the formula above with the current Best Node
            best_node_f = diffNode; //Sets the Best Nodes new value to beat
            best_node = i;
        }
    }

PS: Creatures are running away from me lol, must be the position set at getPathMatching xD

Edit: For some reason calls from Creature::goToFollowCreature() results in an inverted path.. Tested with the current thread code and the one I posted up. My TFS is edit so, if anyone can confirm it would be nice

Flatlander · May 30, 2018

darkshin said:

Not sure if this will help, but Ive tested and its working.
If someone wants to run a benchmark on it,
somehow my tries only printed 0.0000 ms

I've avoided to declare new variables each loop and copy values, keeping only pointers to the new values.
Im not sure its relevance since I couldn't benchmark it
Here it goes:

Code:

    uint16_t* Sx = 0;
    uint16_t* Sy = 0;
    uint16_t* Cx = 0;
    uint16_t* Cy = 0;
 
    for (size_t i = 0; i < curNode; i++) {
        Sx = &nodes[0].x; //Starting X Position
        Sy = &nodes[0].y; //Starting Y Position
        Cx = &nodes[i].x; //node[i] X Position
        Cy = &nodes[i].y; //node[i] Y Position

        diffNode = ((nodes[i].f +
            (((std::abs(&targetPos.x - Cx) - std::abs(&targetPos.x - Sx)) * 5) + //X distance from node[i] position - X distance from the starting location
            ((std::abs(&targetPos.y - Cy) - std::abs(&targetPos.y - Sy)) * 5))) + //Y distance from node[i] position - Y distance from the starting location
                (std::max<int32_t>(std::abs(&targetPos.x - Cx), std::abs(&targetPos.y - Cy)) * 5)); //max(X distance from node[i] position, Y distance from node[i] position,)
   
        if (openNodes[i] && diffNode < best_node_f) { //Compares the formula above with the current Best Node
            best_node_f = diffNode; //Sets the Best Nodes new value to beat
            best_node = i;
        }
    }

PS: Creatures are running away from me lol, must be the position set at getPathMatching xD

Mark made a good point on my pull request:
If we calculate the "distance Cost" when we create the node, it would avoid calculating the distance cost for each node each time getBestNode() is run.

I'll work on this when I get home from work today.
After this I have no idea how I could optimize it further, any suggestions are welcome.

I created this optimization because I personally have monsters that can see 50 sqm away and path up/down stairs. so the old TFS Pathing was obsolete for this task.

darkshin · May 30, 2018

Flatlander said:
Mark made a good point on my pull request:
If we calculate the "distance Cost" when we create the node, it would avoid calculating the distance cost for each node each time getBestNode() is run.

I'll work on this when I get home from work today.
After this I have no idea how I could optimize it further, any suggestions are welcome.

I created this optimization because I personally have monsters that can see 50 sqm away and path up/down stairs. so the old TFS Pathing was obsolete for this task.

You sure have some gold hidden dont ya hahaha

I don't know very well the getPathMatching function. So heres what I understood about the function where it comes to the getBestNode loop:

The 'node list' is created at this code line

Code:

 AStarNodes nodes(pos.x, pos.y);

Then we go to the loop

Code:

AStarNode* found = nullptr;
while (fpp.maxSearchDist != 0 || nodes.getClosedNodes() < 100) {
AStarNode* n = nodes.getBestNode();
if (!n) {
if (found) {
break;
}
return false;
}
const int_fast32_t x = n->x;
const int_fast32_t y = n->y;
pos.x = x;
pos.y = y;
if (pathCondition(startPos, pos, fpp, bestMatch)) {
found = n;
endPos = pos;
if (bestMatch == 0) {
break;
}
}
uint_fast32_t dirCount;
int_fast32_t* neighbors;
if (n->parent) {
const int_fast32_t offset_x = n->parent->x - x;
const int_fast32_t offset_y = n->parent->y - y;
if (offset_y == 0) {
if (offset_x == -1) {
neighbors = *dirNeighbors[DIRECTION_WEST];
} else {
neighbors = *dirNeighbors[DIRECTION_EAST];
}
} else if (!fpp.allowDiagonal || offset_x == 0) {
if (offset_y == -1) {
neighbors = *dirNeighbors[DIRECTION_NORTH];
} else {
neighbors = *dirNeighbors[DIRECTION_SOUTH];
}
} else if (offset_y == -1) {
if (offset_x == -1) {
neighbors = *dirNeighbors[DIRECTION_NORTHWEST];
} else {
neighbors = *dirNeighbors[DIRECTION_NORTHEAST];
}
} else if (offset_x == -1) {
neighbors = *dirNeighbors[DIRECTION_SOUTHWEST];
} else {
neighbors = *dirNeighbors[DIRECTION_SOUTHEAST];
}
dirCount = fpp.allowDiagonal ? 5 : 3;
} else {
dirCount = 8;
neighbors = *allNeighbors;
}
const int_fast32_t f = n->f;
for (uint_fast32_t i = 0; i < dirCount; ++i) {
pos.x = x + *neighbors++;
pos.y = y + *neighbors++;
if (fpp.maxSearchDist != 0 && (Position::getDistanceX(startPos, pos) > fpp.maxSearchDist || Position::getDistanceY(startPos, pos) > fpp.maxSearchDist)) {
continue;
}
if (fpp.keepDistance && !pathCondition.isInRange(startPos, pos, fpp)) {
continue;
}
const Tile* tile;
AStarNode* neighborNode = nodes.getNodeByPosition(pos.x, pos.y);
if (neighborNode) {
tile = getTile(pos.x, pos.y, pos.z);
} else {
tile = canWalkTo(creature, pos);
if (!tile) {
continue;
}
}
//The cost (g) for this neighbor
const int_fast32_t cost = AStarNodes::getMapWalkCost(n, pos);
const int_fast32_t extraCost = AStarNodes::getTileWalkCost(creature, tile);
const int_fast32_t newf = f + cost + extraCost;
if (neighborNode) {
if (neighborNode->f <= newf) {
//The node on the closed/open list is cheaper than this one
continue;
}
neighborNode->f = newf;
neighborNode->parent = n;
nodes.openNode(neighborNode);
} else {
//Does not exist in the open/closed list, create a new node
neighborNode = nodes.createOpenNode(n, pos.x, pos.y, newf);
if (!neighborNode) {
if (found) {
break;
}
return false;
}
}
}
nodes.closeNode(n);
}

Though the only node open is the start node, which owns the pos.xy coordenates.
So the first getBestNode() will return the starting node, which is already open, and as it loops other nodes will be opened by the function createOpenNode(...)
Basically, the calculation for the distance cost would be called by createOpenNode and getBestNode would just make a check between the already calculated values. Don't seem very hard to do.
The trade off would be a uint16 variable for each new opened node vs less calculation processing.
Which means... (Im not sure the impact)

Found this about calculation processing vs memory lookup is at least interesting.
Optimizing Code - CPU vs Memory

Gesior.pl · Jun 4, 2018

Flatlander said:
First, I want to thank you for trying to help and teach me something I am completely ignorant on.
I have no idea how to optimize code by using different methods, or how to use hardware (like processor cache) to my advantage.
...
Let me know if I am right about how processor cache works using my above example.

Everything you should know about processor cache (all I know after 'system programming' at studies):

PHP:

Latency Comparison Numbers (~2012)
----------------------------------
Processor register - inside processor computing unit, ZERO ( https://en.wikipedia.org/wiki/Processor_register )
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache

Read/write to L1 cache is 200 times faster then to RAM.
Read/write to L2 cache is 14 times faster then to RAM.
You can't decide what is stored in L1/L2 - it's processor decision!
Processor cache synchronize data with RAM in blocks. I heard that it uses 4KB blocks, because loading X KB block takes exactly same time as loading 1 byte from RAM.
In case of image below, simple program:

PHP:

int x = sum + age;

Processor must load values of 'sum' and 'age', sum them and set it's result to processor register temporary named 'x'.
1. You tell processor to load variable 'sum' (RAM address 9000,0000-9000,0003) to use it's value. Processor loads it and 4KB around (adjusted to block) to cache and 'sum' value to processor register. It takes 100 ns.
2. You tell processor to load variable 'age' (RAM address 9000,0004-9000,0005) to use it's value. This address is in cache! Processor loads it from L1 cache to processor register. It takes 0.5 ns!
3. Sum of 2 variables from processor register and assigment to other processor register should take 1 processor cycle (0.3 ns on 3GHz cpu).

By using variables that 'probably will be close to each other in memory' (table values next to each other, 2 variables initialized in same time) or keep on using few variables (which can be keeped in cache, there is a place for few 4KB blocks) your program can become super-fast!
So doing few operations on one variable (0.3 ns per cpu cycle) can be faster then reading other variable in some loop (up to 100ns).
In case of object programming: all object attributes are allocated at once, so they are next to each other in RAM!

THAT'S ALL I KNOW ABOUT OPTIMIZING CACHE.

Erexo · Jun 5, 2018

While at CPU caching topic, instruction cache can also be mentioned.
With a little bit of engineering magic CPU can determine what will be the next set of instructions and where are they located, so whenever you are performing a jump, your CPU already knew it and loaded instructions from the destination before you even jump there! But when jumping destination is determined at runtime (ie. "read this memory and jump there" a.k.a. delegates) cpu will have hard time to correctly pre-load that part of memory where you want to go before you actually go there (and then it might stall due to tremendous 100ns for load the instructions from RAM). In "modern" and more "user-friendly" languages like C# it might be detected and correctly redesigned (in mild cases, L1/L2/L3 caches as well), but still I consider this as useful knowledge.
And the caching thing that Gesior mentioned can be excellently used while using vector of structs instead of objects.

Optimizing TFS Pathfinding

Species Developer

www.masteria.net

Species Developer

www.masteria.net

Advanced OT User

Species Developer

Species Developer

www.masteria.net

Excellent OT User

Species Developer

Legacy Member

Excellent OT User

Legacy Member

Species Developer

Active Member

New old member

Species Developer

New old member

Mega Noob&LOL 2012

Kage

Similar threads