We have recently completed a new performance benchmark which demonstrates that MigratoryData WebSocket Server is able to handle 12 million concurrent users from a single server Dell PowerEdge R610 while pushing a substantial amount of live data (1.015 gigabit per second). This benchmark scenario shows that MigratoryData WebSocket Server is ideal for infrastructures delivering real-time data to a huge number of users, especially for mobile push notifications infrastructures that are typically demanded by telecom customers with tens of millions users.
In this benchmark scenario, MigratoryData scales up to 12 million concurrent users from a single Dell PowerEdge R610 server while pushing up to 1.015 Gbps live data (each user receives a 512-byte message every minute). The CPU utilization diagram below shows that MigratoryData WebSocket Server scales linearly with the hardware.
According to the chart above, MigratoryData uses only 57% CPU to handle 12 million users. The remaining 43% CPU could be used to scale even more. However, we are limited by the RAM available on this machine (we use Centos 6.4 with standard Linux kernel and only the Linux kernel memory footprint for 12 million sockets is about 36 GB).
Also, the linearity of CPU utilization above indicates that MigratoryData WebSocket Server should normally scale vertically beyond 12 million concurrent users if additional RAM and CPU are available.
Detailed Results of MigratoryData WebSocket Server Running on a Single Dell R610 Server
In the table below, it is important to note that we’ve obtained the results using the default configuration of MigratoryData WebSocket Server, a fresh installation of Linux Centos 6.4 (without any kernel source code modification or other special tuning), and the standard network configuration (employing the default MTU 1500, etc).
|Number of concurrent client connections||3,000,000||6,000,000||9,000,000||12,000,000|
|Number of messages per minute to each client||1||1||1||1|
|Total Messages Throughput||50,000||100,000||150,000||200,000|
|Average Latency (milliseconds)||5||35||92||268|
|Standard Deviation for Latency (milliseconds)||36||123||263||424|
|Maximum Latency (milliseconds)||640||951||1292||2024|
|Network Utilization||0.254 Gbps||0.507 Gbps||0.767 Gbps||1.015 Gbps|
|CPU Utilization (average)||14%||24%||39%||57%|
|RAM Memory Allocated to the Java JVM||54 GB||54 GB||54 GB||54 GB|
Hardware & Setup
MigratoryData Websocket Server version 4.0.7 ran on a single Dell PowerEdge R610 server as follows:
|Model Name||Dell PowerEdge R610|
|Manufacturing Date||Q4 2011|
|Number of CPUs||2|
|Number of Cores per CPU||6|
|Total Number of Cores||12|
|CPU type||Intel Xeon Processor X5650 (12 MB Cache, 2.66 GHz, 6.40 GT/s QPI)|
|Memory||96 GB RAM (DDR3 1333 MHz)|
|Network||Intel X520-DA2 (10 Gbps)|
|Operating System||Centos 6.4|
|Java Version||Oracle (Sun) JRE 1.7|
The benchmark clients and benchmark publishers ran on 14 identical Dell PowerEdge SC1435 servers as follows:
- Four servers Dell SC1435 were used to run up to four instances of the benchmark publisher. For example, to publish 100,000 messages per second, we used four instances of the benchmark publisher on the four servers Dell SC1435, each publisher sending 25,000 messages per second.
- Ten servers Dell SC1435 were used to run up to ten instances of the benchmark client. For example, to open 12,000,000 concurrent connections, we used ten instances of the benchmark client on the ten servers Dell SC1435, each client opening 1,200,000 concurrent connections.
The server Dell PowerEdge R610 (used to run a single instance of MigratoryData Server) and the 14 servers Dell PowerEdge SC1435 (used to run benchmark clients and benchmark publishers) were connected via a gigabit switch Dell PowerConnect 6224 enhanced with a 2-port 10 Gbps module.
The Benchmark Scenario
- Each client subscribes to a single different subject; for example, to achieve 12 million concurrent users, we used 12 million subjects.
- Each client receives a message every minute; for example, to push a message per minute to 12 million concurrent users, the publisher sent 200,000 messages per second (the subject of each message was chosen randomly from the total of 12 million subjects)
- The payload of each message is a 512-byte string (consisting of 512 random alphanumeric characters)
We performed 4 benchmark tests corresponding to the 4 results summarized above, in order to simulate 3,000,000 / 6,000,000 / 9,000,000 / 12,000,000 concurrent users from a single instance of MigratoryData WebSocket Server.
The clock of the Dell R610 server (used to run MigratoryData Server) and the clocks of the 14 servers Dell SC1435 (used to run benchmark clients and benchmark publishers) were synchronized via ntpd. The latency was measured for all messages, not only for a sample. We’ve measured mean latency, maximum latency and standard deviation for the latency during 10 minutes and the results are reported above. We’ve also ran the most demanding scenario with 12 million concurrent connections during 6 hours and observed that MigratoryData WebSocket Server remains perfectly stable.
Linear Horizontal Scalability
MigratoryData WebSocket Server and its APIs offer the possibility to build a high-availability cluster.
Each instance of MigratoryData WebSocket Server in the cluster runs independently from the other cluster members. It exchanges only negligible coordination information or, depending on the clustering type you configure, does not exchange any information at all with the other cluster members. Therefore, MigratoryData WebSocket Server offers linear horizontal scalability.
One can deploy a high-availability cluster of MigratoryData servers to achieve any number of concurrent users. For example, using the linear horizontal scalability of MigratoryData WebSocket Server and the 12 million vertical scalability demonstrated here, one could achieve say 60 million connections using a cluster with 5 instances of MigratoryData WebSocket Server running on 5 Dell PowerEdge R610 servers.
The implication of this is that, for the example above, in a production deployment, it is recommended to have at least 7-8 servers to achieve 60 million concurrent users such that, if a failure were to occur, each server will have enough reserve to accept part of the users of the cluster member which fails.
In 2010, we’ve achieved 1 million concurrent connections on a single 1U server. While handling 1 million concurrent connections on a small server still remains a challenge for the WebSocket servers' industry, we prove here that MigratoryData’s WebSocket Server scales an order of magnitude higher and achieves 12 million concurrent connections on a single 1U server.