Scalable Server Proposal

Blackheart OT · Nov 29, 2014

Hello fellow developers,

I have an idea to code a scalable version of the open tibia server. This means that when load increases, I (or the system automatically) can create more instances running the server to meet demands. A load balancer is made aware of the changes and dishes out connections to distribute load. Its a fairly common setup, and I hypothesize is it possible to do for a Tibia server.

The problems that immediately come to mind are that although the database is external (no changes needed here), the server runs primarily on ram, making read requests only when it needs new info, and writes upon saving events.

The solution is simply adding a layer of distributed cache that the application exclusively uses for all variables that would be required between servers to dish out responses where each request could hit a different server. A simple and common API like memcached would be used. Servers can also coordinate events such as raids, saves, cleans, etc through the cache so that only one server creates these objects.

I would like your thoughts on feasibility. I may start this project as a fork of an existing open server, and if anyone is interested in coding let me know.

Sincerely,

Mike

Cadyan · Nov 29, 2014

So this would essentially create a new server when the load is high enough, while connecting all of the networking for these "sub-servers". Am I looking at this right?

Blackheart OT · Nov 29, 2014

Cadyan said:
So this would essentially create a new server when the load is high enough, while connecting all of the networking for these "sub-servers". Am I looking at this right?

That is correct. Theoretically, heavy load lag could be eliminated seamlessly, so long as you are willing to spend more resources on demand by setting some sort of autoscaling policy. Conversely, you can save money by scaling down to lower tier machines during low demand periods.

Cadyan · Nov 29, 2014

Oh i see. You're looking from the perspective of a person who may want to run like 100 of these servers, and this method would be very helpful to reduce cost.

dominique120 · Nov 29, 2014

So you plan on running these instances on the same machine or on different machines?

Blackheart OT · Nov 29, 2014

dominique120 said:
So you plan on running these instances on the same machine or on different machines?

Different machine, otherwise you'd be hurting performance even further.

dominique120 · Nov 29, 2014

Blackheart OT said:
Different machine, otherwise you'd be hurting performance even further.

My thoughts exactly but if you have this on two machines lag between servers could cripple the game even more than long save times or "twitchy" gameplay. You would need to have the servers on the same network with capable hardware between them.

Ezzz · Nov 29, 2014

I wonder if isn't that the point of Threads already? And when have we faced a "heavy usage" game server?

Blackheart OT · Nov 29, 2014

dominique120 said:
My thoughts exactly but if you have this on two machines lag between servers could cripple the game even more than long save times or "twitchy" gameplay. You would need to have the servers on the same network with capable hardware between them.

I'm assuming deployment will occur in a capable infrastructure. Normally speaking connections between machines on the same data center's private cloud are extremely fast, in the 100 Mbps to 1Gbps. This should be more than enough to update the 200-2000 MB data likely used by the servers when handling > 1000 players online. Note that not all of that memory data needs to be updated at once. Also, I am ball parking my memory use estimate, correct me if I'm wrong.

Ezzz said:
I wonder if isn't that the point of Threads already? And when have we faced a "heavy usage" game server?

Correct me if I'm wrong, but open tibia servers aren't using multithreading. Distributed servers aren't just about computational performance increases, open tibia servers have low CPU utilization, its also about distributing the number of packet responses a server has to make per second - the usual cause of jittery gameplay. And, during a server save even if one of the servers freezes slightly as it updates the database, the other servers continue to serve responses, thus removing the SS lag as well.

Overall the main feature of this system would be that it can potentially scale to have tens of thousands of players on the same server. You could even create a map with redundant regions and randomly distribute players, lowering the problem of overcrowding, with a command for players to join someone else's map section so that you don't inhibit cooperative play. This structure supports newer features not currently possible by eliminating the bottlenecks of only one machine.

Ezzz · Nov 29, 2014

Blackheart OT said:
I'm assuming deployment will occur in a capable infrastructure. Normally speaking connections between machines on the same data center's private cloud are extremely fast, in the 100 Mbps to 1Gbps. This should be more than enough to update the 200-2000 MB data likely used by the servers when handling > 1000 players online. Note that not all of that memory data needs to be updated at once. Also, I am ball parking my memory use estimate, correct me if I'm wrong.

Correct me if I'm wrong, but open tibia servers aren't using multithreading. Distributed servers aren't just about computational performance increases, open tibia servers have low CPU utilization, its also about distributing the number of packet responses a server has to make per second - the usual cause of jittery gameplay. And, during a server save even if one of the servers freezes slightly as it updates the database, the other servers continue to serve responses, thus removing the SS lag as well.

Overall the main feature of this system would be that it can potentially scale to have tens of thousands of players on the same server. You could even create a map with redundant regions and randomly distribute players, lowering the problem of overcrowding, with a command for players to join someone else's map section so that you don't inhibit cooperative play. This structure supports newer features not currently possible by eliminating the bottlenecks of only one machine.

ProtocolGame & ProtocolLogin are two different threads as far as I'm aware.
Scheduler class has a boost lock, and pretty much handles all scheduled tasks into different threads each, as I'm aware.
The rest is maintained on the main thread, but remember that most if not all Game class functions are scheduled as well on the Scheduler class, hence more threaded functions with recursive mutexes.

Blackheart OT · Nov 29, 2014

Ezzz said:
ProtocolGame & ProtocolLogin are two different threads as far as I'm aware.
Scheduler class has a boost lock, and pretty much handles all scheduled tasks into different threads each, as I'm aware.
The rest is maintained on the main thread, but remember that most if not all Game class functions are scheduled as well on the Scheduler class, hence more threaded functions with recursive mutexes.

Thank you very much! I guess I had never opened those files and took for fact a thread I saw about how adding multithreding was not done and unnecessary. I guess that was old. Either way, this doesn't change my proposal in any way, adding low tier computers is usually cheaper per thread than procuring a machine with more threads, and you get the added bandwidth handling.

zakius · Nov 29, 2014

Autoscalable VPS would work, some deal that makes your VPS limits increase if you reach lets say 85% of current mlimit and go down if you fall below 50% and you pay after moth ends

trying to synchronize game bertween different machines would be really bad
only possible improvement would be separating login server and making it call game server then game server calls client so noone can spam directly game server, but probably current protocol makes client receive game servers address and connect to it, not the other way

even 1GBps network infrastructure is slower, and have incomparable bigger lag than direct communication between CPU and RAM

SS lag? simple: make servers stable enough to stand 48 hours without saves or cleans and then save and restart every day, just like cipsoft does
then you can make save take even half an hour and it won't be a problem

Ezzz · Nov 29, 2014

zakius said:
Autoscalable VPS would work, some deal that makes your VPS limits increase if you reach lets say 85% of current mlimit and go down if you fall below 50% and you pay after moth ends

trying to synchronize game bertween different machines would be really bad
only possible improvement would be separating login server and making it call game server then game server calls client so noone can spam directly game server, but probably current protocol makes client receive game servers address and connect to it, not the other way

even 1GBps network infrastructure is slower, and have incomparable bigger lag than direct communication between CPU and RAM

SS lag? simple: make servers stable enough to stand 48 hours without saves or cleans and then save and restart every day, just like cipsoft does
then you can make save take even half an hour and it won't be a problem

He's unaware of how SS works on OTServers, first of all we know that most Game class functions are scheduled tasks for Scheduler.cpp, however, creatures thinking besides other game functioning and handling is done in the main thread, what happens at server save is that the function is ran directly from the main thread, not on a Scheduler or a separate thread, that is why it would freeze when the server is saving, that is why some OTServers kicks all the players first.

zakius · Nov 29, 2014

but performing save on live server is generally bad idea, if SOMEHOW one manages to transfer item from one char to another during save and then serv crashes we can lose or clone items

Blackheart OT · Nov 30, 2014

zakius said:
but performing save on live server is generally bad idea, if SOMEHOW one manages to transfer item from one char to another during save and then serv crashes we can lose or clone items

Yeah, this is the concern behind the default bulk saving scheme. Over the last couple of days I have re-written that system on my server to a constant background saving scheme. It was my improvement on the onAdvance save solution some people have been using, and expanded to all savable data. Of course, in the event of a sever crash, this is very prone to item cloning, but I've yet to experience my first server crash. Additionally, as a provider my focus is on mitigating player loss, and I believe this outweighs the potential for item cloning.

There is the possibility of abuse of this system if the players are made aware and are capable of causing a server crash. However, those server crashes would be logged and addressed over the subsequent days. Thoughts? BTW, thank you to everyone participating in this conversation!

dominique120 · Nov 30, 2014

Instead of working on massive scaling like this that no one will use we should focus on optimizing our servers for single machine use first. Because whats the point of spawning instances across a culster if your instances are not really optimized to fully utilize a single machine? I think we should go more towards multi threading before we expand like this.

And do you really think people would use this? There are people who cant even form coherent ideas on this forum, I really dont think they will be able to set up something like this.

I dont mean to be party crasher or however that expression goes but I think we should focus on more on foundation things.

Scalable Server Proposal

Blackheart OT

Defense in depth

Cadyan

Well-Known Member

Blackheart OT

Defense in depth

Cadyan

Well-Known Member

dominique120

Science & Reason

Blackheart OT

Defense in depth

dominique120

Science & Reason

Ezzz

Developer of Nostalrius and The Violet Project

Blackheart OT

Defense in depth

Ezzz

Developer of Nostalrius and The Violet Project

Blackheart OT

Defense in depth

zakius

Enter the Ninja!

Ezzz

Developer of Nostalrius and The Violet Project

zakius

Enter the Ninja!

Blackheart OT

Defense in depth

dominique120

Science & Reason

Similar threads