Scalability

Serving Large Domains

Handling High-Volume Local Delivery

Supporting Many Concurrent Clients

Setting the TCP TIME_WAIT time

Handling High-Volume SMTP Delivery

Estimating Resources Usage

OS Limitations

This section explains how CommuniGate Pro and the Server OS can be configured to serve large (10,000-100,000 accounts) sites.

For carrier-level sites (from 100,000 up to several million accounts) the multi-server Cluster configurations should be used.

Serving Large Domains

If some domains you serve have a large number of accounts (10,000 are more), you should consider to store accounts in subdirectories rather than in a flat domain directory.

You can move domain subdirectories to other disks, just replace the moved subdirectories with their symbolic links.

You can also move domain directories from the Domains directory and replace them with symbolic links.

Handling High-Volume Local Delivery

When the number of messages to be delivered to local CommuniGate Pro accounts is expected to be higher than 1 message/second, you should allocate more "processors" in the Local Delivery Module. This is especially important for environments that process heavy inbound SMTP traffic (often used as a performance test environment). Insufficient number of Local Delivery module processors (threads) may result in excessive Queue growth and large latency in message delivery. You should watch the Local Delivery module Monitor and allocate more processors (threads) to that module if you see that the module Queue size grows to more than 200-300 messages. Do not allocate additional threads if, for example, you have 10 Local Delivery processors and see the waiting Local Delivery queue of 200 messages: this Queue size introduces only 1-2 seconds delivery latency. Increase the number of Local Delivery threads only if you see that Queue growing.

Administrators of high-end mail servers may want to disable the User Conservative Updates option (located in the Local Account Options on the WebAdmin Obscure Settings page). This decreases the load on file i/o subsystem.

Supporting Many Concurrent Clients

For ISP and large corporate installations, the number of users that can be served simultaneously is an issue of a very high concern.

In order to estimate how many users you can serve at the same time, you should realize what type of service your clients will use.

POP3 Clients

POP3 mailers connect to the server just to download new messages. Based on the average connections speeds, expected mail traffic, and your user habits, you can estimate how much time an average session would take. For example, if you are an ISP and you estimate that an average your "check mail" operation will take 15 seconds, and they mostly check their accounts during 12 peak hours, then with 100,000 POP3 users you can expect to see 100,000 * 15 sec / (12*60*60 sec) = 35 concurrent POP3 sessions.

This number is not high, but POP3 sessions put a high load on your disk I/O and network I/O subsystems: after authentication, a POP3 session is essentially, a "file downloading" type of activity.

IMAP4 Clients

The IMAP protocol allows a much more sophisticated processing than POP3. Mail is usually left on the server, and some unwanted messages can be deleted by users without downloading them first.

But since the IMAP protocol is "mail access", not "mail downloading" protocol, IMAP users spend much more time being connected to the server. In corporate environments, users can leave their IMAP sessions open for hours, if not days. While such inactive sessions do not put any load on your disk or network I/O subsystems, or CPU, each session still requires an open network connection and a processing thread in the server. Since the IMAP protocol allows users to request search operations on the server, IMAP users can also consume a lot of CPU resources if they use this feature a lot.

WebUser Clients

The CommuniGate Pro WebUser interface provides the same features provided by IMAP mailer clients, but it does not require an open network connection (and processing thread) for each user session. When a client (a browser) sends a request, a network connection is established, the request is processed with a server thread, and the connection is closed.

This allows the server to use just 100 HTTP connections to serve 3,000 or more open sessions.

When you know the type and number of clients you plan to serve, you can estimate the resources they will need on your Server.

Setting the TCP `TIME_WAIT` time

When you expect to serve many TCP/IP connections, it is important to check the time your Server OS waits before releasing a logically closed TCP/IP socket. If this time is too long, those "died" sockets can consume all OS TCP/IP resources, and all new connections will be rejected on the OS level, so the CommuniGate Pro Server will not be able to warn you.

This problem can be seen even on the sites that have just few hundred accounts. This indicates that some of the clients have configured their mailers to check the server too often. If client mailers connect to the server every minute, and the OS TIME_WAIT time is set to 2 minutes, the number of "died" sockets will grow, and eventually, they will consume all OS TCP/IP resources.

It is recommended to set the TIME_WAIT time to 20-30 seconds.

The TIME_WAIT problem is a very common one for Windows NT systems. Unlike most Unix systems, Windows NT does not have a generic setting for the TIME_WAIT interval modification. To modify this setting, you should create an entry in the Windows NT Registry (the information below is taken from the http://www.microsoft.com site:

Run Registry Editor (RegEdit.exe).
Go to the following key in the registry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\tcpip\Parameters
Choose Add Value from the Edit menu and create the following entry:

Value Name:
TcpTimedWaitDelay
Data Type:
REG_DWORD
Value:
30-300 (decimal) - time in seconds
Default: 0xF0 (240 decimal) not in registry by default
Quit the Registry Editor
Restart the computer for the registry change to take effect.

Description: This parameter determines the length of time that a connection will stay in the TIME_WAIT state when being closed. While a connection is in the TIME_WAIT state, the socket pair cannot be reused. This is also known as the "2MSL" state, as by RFC the value should be twice the maximum segment lifetime on the network. See RFC793 for further details.

Handling High-Volume SMTP Delivery

To handle high-volume (more than 50 messages/second) SMTP delivery load you need to ensure that your DNS server(s) can handle the load CommuniGate Pro generates and that the UDP packet exchange between CommuniGate Pro and the DNS servers does not suffer from excessive packet loss. You may want to re-configure your Routers to give UDP traffic a higher priority over the TCP traffic.

You may want to try various values for the Concurrent Requests setting in the Domain Name Resolver panel on the Obscure Settings page: depending on your DNS server(s) setup, increasing the number of Concurrent Requests over 10-20 can result in DNS server performance degradation.

If an average size of the messages sent via SMTP is higher than 20K, you should carefully select the number of SMTP sending channels (threads), too. Too many concurrent data transfers can exceed the available network bandwidth and result in performance degradation. 500 channels sending data to remote sites with a relatively slow 512Kbit/sec connectivity can generate 250Mbit/sec outgoing traffic from your site. Usually the traffic is much lighter, since outgoing channels spend a lot of time negotiating parameters and exchanging envelope information. But as the average message size grows channels spend more time sending actual message data and the TCP traffic generated by each channel increases.

Estimating Resources Usage

Each network connection requires one network socket descriptor in the server process. On Unix systems, the total number of sockets and files opened within a server process is limited.

When the CommuniGate Pro server starts, it tries to put this limit as high as possible, and then it decreases it a bit, if it sees that the limit set can be equal to the system-wide limit (if the CommuniGate Pro consumes all the "descriptors" available on the server OS, this will most likely result in the OS crash). The resulting limit is recorded in the CommuniGate Pro Log.

To increase the maximum number of file and socket descriptors the CommuniGate Pro Server process can open, see the instructions below.

Each network connection is processed by a server thread. Each thread has its own stack, and the CommuniGate Pro threads have 100Kbyte stacks on most platforms. Most of the stack memory is not used, so they do not require a lot of real memory, but they do add up, resulting in bigger virtual memory demand. Most OSes do not allow the process virtual memory to exceed a certain limit. Usually, that limit is set to the OS swap space plus the real memory size. So, on a system with just 127Mbytes of the swap space and 96Mbytes of real memory, the maximum virtual memory that can be allocated is 220Mbytes. Since the swap space is shared by all processes that run under the server OS, the effective virtual memory limit on such a system will be around 100-150MB - and, most likely, the CommuniGate Pro Server will be able to create 500-1000 processing threads.

On 32-bit computers 4GB of virtual memory is the theoretical memory size limit, and allocating more than 4GB of disk space for page swapping does not change anything.

During a POP3 or IMAP4 access session one of the account mailboxes is open. If that mailbox is a text file (BSD-type) mailbox, the mailbox file is open. During an incoming SMTP session a temporary file is created for an incoming message, and it is kept open while the message is being received. So, on Unix systems, the total number of open POP, IMAP, and SMTP connections cannot exceed 1/2 of the maximum number of socket/file descriptors per process.

While a WebUser session does not require a network connection (and thus a dedicated socket and a thread), it can keep more than one mailbox open.

On Unix systems, when the Server detects that the number of open network sockets and file descriptors is coming close to the set limit, it starts to reject incoming connections, and reports about this problem via the Log.

OS Limitations

This section explains how you can check and increase the limits imposed by various server Operating Systems. The most important limits are:

The maximum number of files and network sockets a process can open.
The maximum size of virtual memory available to a process.

Mac OS X Server

The Mac OS X Server has 1600-1800 descriptors/process "hard limit" set by default.

The Mac OS X sets a 6MB limit on "additional" virtual memory an application can allocate. This is not enough for sites with more than 2,000 users, and you should increase that limit by specifying:
ulimit -d 100000
command in the CommuniGate Pro startup file.

Linux

The pre 2-2.x Linux kernels allowed a process to open 256 files descriptors only. If your server should be able to handle more than 100 TCP/IP connections, use the Linux kernel 2.2.x or better to avoid the "out of file descriptors" problem.

The Linux threads library uses the one-to-one model, so each CommuniGate Pro thread is a kernel thread (actually, a "process"). This may be not the best solution for very large systems that should run several thousand threads.

In spite of the fact that the Linux threads are handled within the kernel, the Linux thread library has its own scheduler, too. By default, that scheduler uses a static table with 1024 entries, so no more than 1024 threads can be created. This is enough for even large sites serving may POP and WebMail users, but can cause problems for sites that need to serve several hundred IMAP users. To increase this number, the Linux threads library has to be recompiled with the PTHREAD_THREADS_MAX parameter increased.

The Linux threads library allocates thread stacks with 2MB steps. This does not allow the system to start more than 1000 threads on 32-bit machines. CommuniGate Pro threads do not need stacks of that size. You may want to recompile the Linux threads library decreasing the STACK_SIZE parameter to 128K.

AIX

The AIX threads implementation does not allow you to create more than 512 threads per process.

Scalability

Serving Large Domains

Handling High-Volume Local Delivery

Supporting Many Concurrent Clients

Setting the TCP TIME_WAIT time

Handling High-Volume SMTP Delivery

Estimating Resources Usage

OS Limitations

Mac OS X Server

Linux

AIX

Setting the TCP `TIME_WAIT` time