• There is NO official Otland's Discord server and NO official Otland's server list. The Otland's Staff does not manage any Discord server or server list. Moderators or administrator of any Discord server or server lists have NO connection to the Otland's Staff. Do not get scammed!

TFS: an attempt a fix to the client can't receive data fast enough issue

gcmcnutt

New Member
Joined
Jul 13, 2008
Messages
3
Reaction score
0
We've isolated a primary memory leak issue on our TFS server. The scenario is this:

- we have a fairly dense training area in the map
- folks come into the training area, start an attack
- they leave the terminal running
- AND, they are on a relatively slow link [perhaps < 10kbit download during intervals]
- Then, the server gets into a state where the send messages are sent faster than can be transmitted to the client

I've simulated this in a dense training area: by adding a network bandwidth limiter to the client, setting up an attack and instrumeting the server to see the connection output grow without bounds.

On our production server, we've seen up to 1MB of VM growth per *second* on the server -- growing to beyond the VM limit of the host, thus crashing the server.

I've prototyped a connection buster for when an output queue grows beyond a certain message count. If this happens we force a disconnect [to trigger the equivalent behavior as a lost link would]. Here's a code snipped from connection.cpp::send

#ifdef __DEBUG_NET__
fprintf(stdout, "Connection: send another [0x%x][%d:%d]\n", this, m_pendingWrite, m_outputQueue.size());
#endif
m_outputQueue.push_back(msg);
m_pendingWrite++;
if (m_pendingWrite > 500) {
fprintf(stdout, "Force closing slow connection [0x%x][%d]\n",
this, m_pendingWrite);
closeConnection();
}

If anyone has worked on this behavior or wondered why their memory grows over time, please check around this area.
 
.. or maybe you cant upload it? everyone has good download speeds nowadays.
 
Well, other versions of this code will be different line numbers so a cut and paste may not work. But, here is the complete connection::send() function we're running:

I'm pretty sure this is a slow link problem in cases where a high density 'training' area has a lot of players hitting and getting hit. We've reproduced the problem in several ways -- and often the offending users are on dialups (Mexico usually):

Code:
bool Connection::send(OutputMessage* msg)
{
    #ifdef __DEBUG_NET_DETAIL__
    std::cout << "Connection::send init" << std::endl;
    #endif

    OTSYS_THREAD_LOCK(m_connectionLock, "");
    if(m_closeState == CLOSE_STATE_CLOSING || m_writeError)
    {
        OTSYS_THREAD_UNLOCK(m_connectionLock, "");
        return false;
    }

    msg->getProtocol()->onSendMessage(msg);

    if(m_pendingWrite == 0)
    {
        #ifdef __DEBUG_NET_DETAIL__
        std::cout << "Connection::send " << msg->getMessageLength() << std::endl;
        #endif
        internalSend(msg);
    }
    else
    {
        #ifdef __DEBUG_NET__
        fprintf(stdout, "Connection: send another [0x%x][%d:%d]\n", this, m_pendingWrite, m_outputQueue.size());
        #endif
        m_outputQueue.push_back(msg);
        m_pendingWrite++;
        if (m_pendingWrite > 500) {
            fprintf(stdout, "Force closing slow connection [0x%x][%d]\n",
                this, m_pendingWrite);
            closeConnection();
        }
    }
    OTSYS_THREAD_UNLOCK(m_connectionLock, "");
    return true;
}
 
Back
Top