- Introduction.
A few questions asked on GitHub recently have convinced me that it might be necessary to write a tutorial about some quirks and new performance characteristics of the upcoming 1.2 release of TFS. I won't go very deep into the internals as I know that most people reading this probably aren't CS/Programming experts. Initially this tutorial will contain tips for tuning the networking stack, however I intend to keep this tutorial updated in the future, should there be other things that can affect performance of your server. It is assumed that the reader knows how to compile the server and has basic C++ knowledge.
WARNING: There is no guarantee of results. It might take a lot of trial and error and measurements to find the settings that best fit your server! - Useful terms.
Compile time constant - a number that is known (or can be calculated) at compile time. The advantage of using it over a dynamic config value (in other words, things that can be changed through config.lua) is that the compiler can use optimizations to increase the efficiency of code. Changing this constant requires recompilation of the translation unit it was defined in (this is C++-speak, if you don't understand that, don't worry, all you need to know is that recompilation is required, the tools will hopefully do the rest for you).
Latency - usually used in networking, means the length of time between a request/command and a response/execution of a command, people sometimes refer to high latency (and low responsiveness) as 'lag'.
Throughput - usually defined as amount of work that is done per unit time. In Computer Science there is usually a tradeoff between throughput and latency.
Object pooling - a technique used in programming high-performance systems that amortizes the overhead of memory management when additional information about object lifetime is available to the programmer that cannot be conveyed to the typical memory management tools.
- Network stack tuning.
TFS 1.2 boasts a new networking stack (around half of the base networking code has been rewritten). Initially these changes were intended to provide thread safety guarantees for the casting system, however, some parts had to be radically redesigned. Even though in some areas of code performance has decreased (to provide correctness guarantees in all sane situations) the overall effect has been a performance increase, however there is a price. The old system was mostly fool-proof when it came to the server admin - he didn't have to worry about using the right setting for the network stack to work well. The new system has one big drawback in this area - if configured improperly it will either kill your throughput (and increase latency) or eat A LOT of your RAM.
(If you don't want to know how stuff works internally you can skip this paragraph)
I will now briefly try to describe how the system works. Almost all network packets related to the game protocol sent by the server to the client are so-called buffered messages. What that means is new information to be delivered to the game client is copied into a single buffer and sent once every "Auto-send cycle". This has the effect of greatly increasing throughput at the cost of latency. These buffers are internally called OutputMessages and, because they're non-local and large objects, they cannot be directly allocated and deallocated every time they're used because allocating large amounts of memory is slow (24 KiB might not seem like a large amount of memory, but for a C++ object it's fairly large), therefore a high performance pool is used (if you're interested, read about lock-free stacks). This pool has a fixed maximal capacity defined by a compile time constant, however, it is lazily populated, meaning that new OutputMessages will be created if the pool is empty and a OutputMessage is required. Another important feature of the new networking stack is auto-send scheduling. Before TFS 1.2 the auto-send queue would be checked every single dispatcher cycle (which was a waste of dispatcher time). Now, a more efficient approach is used - all protocols eligible for auto-send are checked every time a certain time quantum elapses. This time is configurable through a compile time constant.
Both mentioned constants are located in the src/outputmessage.cpp file:
Code:const uint16_t OUTPUTMESSAGE_FREE_LIST_CAPACITY = 2048; const std::chrono::milliseconds OUTPUTMESSAGE_AUTOSEND_DELAY {10};
The default capacity of the output message pool is, as you can see, 2048. This default should be enough for relatively small servers (up to perhaps 150 clients). If you regularly have more clients, you should consider increasing the pool capacity. What value should you choose? It can't be to low (because your server will have noticeable lag) and it can't be too high (no one likes to waste RAM). The best way to determine this value is to add code that prints a warning message every time the pool has been exhausted. You will need to modify the deallocate() member function in src/lockfree.h. Here's how this function has to look like:
Code:void deallocate(T* p, size_t) const { if (!getFreeList().bounded_push(p)) { std::cout << "Warning: OutputMessage pool capacity exhausted!" << std::endl; //Release memory without calling the destructor of T //(it has already been called at this point) operator delete(p); } }
You should run your server with this modification under normal load. If you see the warning printed regularly, that's a sign you need to increase the capacity of the pool. I suggest incrementing it by 50% every time you see these warnings printed regularly. Once you tune the pool size, I recommend removing these changes, because this piece of code is fairly performance critical - you don't want unused garbage there!
Note: If you're OK with potentially wasting at most 1.8 GiB of memory, go ahead and set the pool capacity to max (65534).
The second constant that will be discussed here is the autosend delay. The default should be fine in most cases, however if your players experience lag spikes, you should consider decreasing it (setting it below 1 ms is NOT recommended). In some cases increasing it might allow you to run more demanding scripts on your server, however, any value above 20 or 30 ms will probably be noticeable by players, so be careful!