This benchmark shows that MigratoryData achieves 8X higher scalability than the record obtained by the competition in the same benchmark category; reaffirming it is the most scalable WebSocket server. This benchmark result also demonstrates that, using MigratoryData WebSocket Server, it is feasible and affordable to build real-time web applications delivering high volumes of real-time information to a high number of concurrent users.
In this benchmark scenario, MigratoryData scales up to 192,000 concurrent users (delivering 8.8 Gbps throughput) from a single Dell R610 1U server and achieves an 8X higher scalability than the record obtained by the competition (who used a more recent Dell 1U server with similar specifications). Moreover, MigratoryData achieves lower bandwidth utilization and lower latency as shown in the diagram and table below.
In the table below, it is important to note that we’ve obtained the results using the default configuration of MigratoryData WebSocket Server, a fresh installation of Linux Centos 6.4 (without any kernel recompilation or other special tuning), and the standard network configuration (employing the default MTU 1500, default kernel buffer sizes, etc).
|Number of concurrent client connections||24,000||48,000||72,000||96,000||120,000||144,000||168,000||192,000|
|Number of messages per second to each client||10||10||10||10||10||10||10||10|
|Total Messages Throughput||240,000||480,000||720,000||960,000||1,200,000||1,440,000||1,680,000||1,920,000|
|Average Latency (milliseconds)||2.35||3.09||5.76||39.95||83.23||139.46||225.87||597.27|
|Standard Deviation for Latency (milliseconds)||3.74||3.79||6.73||20.80||39.36||65.12||106.00||269.74|
|Maximum Latency (milliseconds)||49||54||88||168||291||391||760||1732|
|Network Utilization||1.21 Gbps||2.39 Gbps||3.59 Gbps||4.65 Gbps||5.75 Gbps||6.79 Gbps||7.87 Gbps||8.88 Gbps|
|CPU Utilization (average)||25%||49%||72%||82%||88%||90%||92%||96%|
|RAM Memory Allocated to the Java JVM||2.5 GB||26 GB||26 GB||26 GB||26 GB||26 GB||30 GB||48 GB|
Hardware & Setup
MigratoryData Websocket Server version 4.0.3 ran on a single Dell PowerEdge R610 server as follows:
|Model Name||Dell PowerEdge R610|
|Manufacturing Date||Q4 2011|
|Number of CPUs||2|
|Number of Cores per CPU||6|
|Total Number of Cores||12|
|CPU type||Intel Xeon Processor X5650 (12 MB Cache, 2.66 GHz, 6.40 GT/s QPI)|
|Memory||64 GB RAM (DDR3 1333 MHz)|
|Network||Intel X520-DA2 (10 Gbps)|
|Operating System||Centos 6.4, Linux kernel 2.6.32-358.2.1.el6.x86_64|
|Java Version||Oracle (Sun) JRE 1.6.0_37|
The Benchmark Publisher and the Benchmark Client instances ran on 14 identical Dell PowerEdge SC1435 servers. The Dell R610 server (running MigratoryData WebSocket Server) and the 14 Dell SC1435 servers (running the Benchmark Clients and the Benchmark Publisher) were connected via two gigabit switches: a Dell PowerConnect 5424 and a Dell PowerConnect 6224 (enhanced with a 2-port 10 Gbps module), as detailed in the diagram below:
The total number of concurrent client connections for each benchmark test is achieved using 13 of the 14 Dell SC1435 servers. One instance of the Benchmark Client runs on each of these 13 servers. Thus, one simulates 1/13 of the total concurrent client connections from each of these 13 servers.
The 14th Dell SC1435 server is used to run both an instance of the Benchmark Client (opening 30 concurrent client connections) and an instance of the Benchmark Publisher.
The Benchmark Scenario
- There are a total of 100 different subjects.
- The publisher sends 1000 messages per second.
- The subject of each message is randomly selected from the 100 subjects; thus, each subject is updated 10 times a second.
- Each client subscribes to a single subject randomly selected from the 100 subjects; thus, each client receives 10 messages per second.
- The payload of each message is a 512-byte string (consisting of 512 random alphanumeric characters)
We performed 8 benchmark tests corresponding to the 8 results summarized above, in order to simulate 24,000 / 48,000 / 72,000 / 96,000 / 120,000 / 144,000 / 168,000 / 192,000 concurrent users from a single instance of MigratoryData WebSocket Server and using 13 instances of the Client Benchmark.
For the duration of each test, we ran a 14th instance of the Benchmark Client on the same machine that ran the instance of the Benchmark Publisher. The 14th instance of the Benchmark Client was used to measure latency results. It simulated an additional 30 users on top of the total number of simulated users, ran for 600 seconds, and computed the average, standard deviation, and maximum statistics of the latency of the received messages.
Moreover, the sample size for each test is 180,000 messages (600 second x 10 messages per second x 30 concurrent client connections). Thus, it is large enough such that the latency results are statistically accurate.
Linear Horizontal Scalability
Not only does MigratoryData WebSocket Server offer horizontal scalability via its built-in clustering feature, it also offers linear horizontal scalability because each instance of MigratoryData WebSocket Server in the cluster runs independently from the other cluster members. It exchanges only negligible coordination information or, depending on the clustering type you configure, does not exchange any information at all with the other cluster members.
Therefore, if one wants to deliver real-time information to 1 million concurrent users in this benchmark scenario, then one can deploy 6 instances of MigratoryData WebSocket Server on 6 Dell R610 servers to deliver data to 1,152,000 concurrent users (i.e. 6 servers x 192,000 maximum concurrent connections, as demonstrated by this benchmark).
The implication of this is that, for the example above, in a production deployment, it is recommended to have at least 7-8 servers to achieve 1 million concurrent users such that, if a failure were to occur, each server will have enough reserve to accept part of the users of the cluster member which fails.
This benchmark result reaffirms MigratoryData’s leadership in websocket server scalability.
Using MigratoryData’s high vertical scalability and linear horizontal scalability, one can build cost-effective real-time applications scalable to meet any growth in number of users and data volumes.