We have recently completed a new performance benchmark which demonstrates that MigratoryData WebSocket Server is able to handle 12 million concurrent users from a single server Dell PowerEdge R610 while pushing a substantial amount of live data (1.015 gigabit per second). This benchmark scenario shows that MigratoryData WebSocket Server is ideal for infrastructures delivering real-time data to a huge number of users, especially for mobile push notifications infrastructures that are typically demanded by telecom customers with tens of millions users.

Benchmark Results

In this benchmark scenario, MigratoryData scales up to 12 million concurrent users from a single Dell PowerEdge R610 server while pushing up to 1.015 Gbps live data (each user receives a 512-byte message every minute). The CPU utilization diagram below shows that MigratoryData WebSocket Server scales linearly with the hardware.

c12m-migratorydata-cpu-usage

According to the chart above, MigratoryData uses only 57% CPU to handle 12 million users. The remaining 43% CPU could be used to scale even more. However, we are limited by the RAM available on this machine (we use Centos 6.4 with standard Linux kernel and only the Linux kernel memory footprint for 12 million sockets is about 36 GB).

Also, the linearity of CPU utilization above indicates that MigratoryData WebSocket Server should normally scale vertically beyond 12 million concurrent users if additional RAM and CPU are available.

Detailed Results of MigratoryData WebSocket Server Running on a Single Dell R610 Server

In the table below, it is important to note that we’ve obtained the results using the default configuration of MigratoryData WebSocket Server, a fresh installation of Linux Centos 6.4 (without any kernel source code modification or other special tuning), and the standard network configuration (employing the default MTU 1500, etc).

         
Number of concurrent client connections 3,000,000 6,000,000 9,000,000 12,000,000
Number of messages per minute to each client 1 1 1 1
Total Messages Throughput 50,000 100,000 150,000 200,000
Average Latency (milliseconds) 5 35 92 268
Standard Deviation for Latency (milliseconds) 36 123 263 424
Maximum Latency (milliseconds) 640 951 1292 2024
Network Utilization 0.254 Gbps 0.507 Gbps 0.767 Gbps 1.015 Gbps
CPU Utilization (average) 14% 24% 39% 57%
RAM Memory Allocated to the Java JVM 54 GB 54 GB 54 GB 54 GB
Latency is defined here as the time needed for a message to propagate from the publisher to the client, via the MigratoryData server. In other words, the latency of a message is the difference between the time at which the message is sent by the benchmark publisher to the MigratoryData server and the time at which the message is received by the benchmark client from the MigratoryData server.

Hardware & Setup

MigratoryData Websocket Server version 4.0.7 ran on a single Dell PowerEdge R610 server as follows:

   
Model Name Dell PowerEdge R610
Manufacturing Date Q4 2011
Dimension 1U
Number of CPUs 2
Number of Cores per CPU 6
Total Number of Cores 12
CPU type Intel Xeon Processor X5650 (12 MB Cache, 2.66 GHz, 6.40 GT/s QPI)
Memory 96 GB RAM (DDR3 1333 MHz)
Network Intel X520-DA2 (10 Gbps)
Operating System Centos 6.4
Java Version Oracle (Sun) JRE 1.7

The benchmark clients and benchmark publishers ran on 14 identical Dell PowerEdge SC1435 servers as follows:

  • Four servers Dell SC1435 were used to run up to four instances of the benchmark publisher. For example, to publish 100,000 messages per second, we used four instances of the benchmark publisher on the four servers Dell SC1435, each publisher sending 25,000 messages per second.
  • Ten servers Dell SC1435 were used to run up to ten instances of the benchmark client. For example, to open 12,000,000 concurrent connections, we used ten instances of the benchmark client on the ten servers Dell SC1435, each client opening 1,200,000 concurrent connections.

The server Dell PowerEdge R610 (used to run a single instance of MigratoryData Server) and the 14 servers Dell PowerEdge SC1435 (used to run benchmark clients and benchmark publishers) were connected via a gigabit switch Dell PowerConnect 6224 enhanced with a 2-port 10 Gbps module.

The Benchmark Scenario

  • Each client subscribes to a single different subject; for example, to achieve 12 million concurrent users, we used 12 million subjects.
  • Each client receives a message every minute; for example, to push a message per minute to 12 million concurrent users, the publisher sent 200,000 messages per second (the subject of each message was chosen randomly from the total of 12 million subjects)
  • The payload of each message is a 512-byte string (consisting of 512 random alphanumeric characters)

Methodology

We performed 4 benchmark tests corresponding to the 4 results summarized above, in order to simulate 3,000,000 / 6,000,000 / 9,000,000 / 12,000,000 concurrent users from a single instance of MigratoryData WebSocket Server.

The clock of the Dell R610 server (used to run MigratoryData Server) and the clocks of the 14 servers Dell SC1435 (used to run benchmark clients and benchmark publishers) were synchronized via ntpd. The latency was measured for all messages, not only for a sample. We’ve measured mean latency, maximum latency and standard deviation for the latency during 10 minutes and the results are reported above. We’ve also ran the most demanding scenario with 12 million concurrent connections during 6 hours and observed that MigratoryData WebSocket Server remains perfectly stable.

Linear Horizontal Scalability

MigratoryData WebSocket Server and its APIs offer the possibility to build a high-availability cluster.

Each instance of MigratoryData WebSocket Server in the cluster runs independently from the other cluster members. It exchanges only negligible coordination information or, depending on the clustering type you configure, does not exchange any information at all with the other cluster members. Therefore, MigratoryData WebSocket Server offers linear horizontal scalability.

One can deploy a high-availability cluster of MigratoryData servers to achieve any number of concurrent users. For example, using the linear horizontal scalability of MigratoryData WebSocket Server and the 12 million vertical scalability demonstrated here, one could achieve say 60 million connections using a cluster with 5 instances of MigratoryData WebSocket Server running on 5 Dell PowerEdge R610 servers.

Even if MigratoryData WebSocket Server comes with linear horizontal scalability, in a production deployment, one also needs to consider the situation when a cluster member might go down. If this were to occur, the users of the server which goes down will automatically be reconnected by the MigratoryData API to the other cluster members. Thus, the other cluster members would support the load introduced by the member which fails.

The implication of this is that, for the example above, in a production deployment, it is recommended to have at least 7-8 servers to achieve 60 million concurrent users such that, if a failure were to occur, each server will have enough reserve to accept part of the users of the cluster member which fails.

Conclusion

In 2010, we’ve achieved 1 million concurrent connections on a single 1U server. While handling 1 million concurrent connections on a small server still remains a challenge for the WebSocket servers’ industry, we prove here that MigratoryData’s WebSocket Server scales an order of magnitude higher and achieves 12 million concurrent connections on a single 1U server.