CAS connection scalability between Exchange 2007 & 2010

Exchange 2007

With Exchange Server 2003 we were limited by kernel memory and that affected our scale.  With Exchange Server 2007 and a 64-bit operating system, we discovered that the OS and application had other bottlenecks that prevented our scalability in terms of connections.

From an inbound connection perspective, we are not limited per session as each client connection is made up of a source IP address, source port, destination IP address, and destination port.  This unique combination allows us to scale up the number of inbound connections.

Where we see a scale limitation is between CAS and mailbox.  Prior to Windows Server 2008, each outbound connection would only use a single source port regardless of the destination IP or whether the source port was available for use; in other words, once the source port was used, it could not be used for any other outbound connection on the server.  Thus we were limited to the maximum number of TCP/IP connections, which for Exchange Server 2007 is 60,000 (MaxUserPort TCP setting).  We addressed this in Windows Server 2008 by allowing the source port to be used once on a per IP address basis.  So now as long as we have additional IP addresses on CAS, we can scale 60,000 outbound connections per source IP address.  However, the corresponding applications had to take advantage of this new feature.  In the case of Outlook Anywhere, the RPC Proxy service on Windows Server 2008 was updated to do so.  DSProxy, on the other hand, was not – so the mailbox server is limited to 60,000 outbound connections to global catalog servers.

In addition we had one other connection scalability bottleneck – the store process was hard-coded to only support 60,000 RPC context handles.

All told, this limited us to 15,000 active mailboxes before even considering the message profile of these mailboxes and their affect on CPU scalability.

A CAS server is not limited to 60000 TCP connections. It is limited to 60000 unique combinations of source IP, source port, destination IP, and destination port for each IP defined on the CAS server. This means that a CAS server with a single IP address can support more than 60000 TCP connections. In our OA testing the CAS server topped out around 124,000 total TCP connections.

Each set of connections from any given client will have the same combinations of source IP, source port, destination IP, and destination port. That means we can support up to 60000 connections from any given client (good thing we usually need less than 16). It also means that we can support more than 60000 total inbound TCP connections. In theory if we had 50,000 clients with 10 inbound connections each there is no TCP limitation preventing the CAS server from supporting 500,000 inbound connections. Where we do see inbound limitations is in scenarios where we have ISA or a hardware load balancer inline that masks the client IP. If all client connections to CAS appear to come from the internal VIP then we max out at 60000 client connections because every client connection has the same source IP, source port, destination IP, and destination port mapping.

The real issue appears when the connections get proxies to the MBX server. From a TCP perspective you should be able to get 60000 connections between each CAS and MBX server because that represents a unique source IP, source port, destination IP, and destination port mapping. So if you have 4 MBX servers and a single CAS server, you should be able to get 60000 TCP connections to each MBX server or 262136 outbound TCP connections total. But what we see in reality is RPC Proxy only uses each source port once regardless of the destination IP address (the issue is that right now with a wildcard port bind, the stack picks a port that is used by no other address—this means that even if the port is available for bind with other addresses, it cannot be used—thus, applications run out of TCP connections after 60k connections). So regardless of the number of MBX servers on the backend we don’t get 60000 connections to each, we get 60000 outbound connections total.

Windows 2008 addressed this limitation by making it per IP address as opposed to per server.  However, the corresponding application has to take advantage of the multiple IP addresses when making the outbound connection.  The RPC proxy service, for instance, was updated to accommodate this TCP/IP change.

Exchange 2010

It is 100 RPC connections per RPCCA service/process. Between RPCCA on CAS and Outlook clients, everything looks like it does in E12 – there is a 1:1 mapping of connections to sessions because when the connection is dropped, it is expected that the session is gone.  This means for each connection/session Outlook makes we have a corresponding connection/session on the CAS.

In Exchange 2010, there is a change on the server so that it is possible to disconnect a connection and it still persist session info/state.  As a result, between CAS and MBX servers, there is a 1:1 mapping of sessions from what you see from the clients but they all reuse a pool of 100 connections to perform actions – if the sessions are not actively doing something, they just sit around in a disconnected state until the client connection is dropped or they need to perform some action.  It is similar to the connection pool we used in E12 with XSO sessions where we supported all the OWA/EAS/EWS/etc.

This entry was posted in Exchange Servers. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s