Building realtime web and mobile apps that scale to millions of users with Flutter and Kafka

Flutter and Kafka are two popular technologies. Flutter is a framework for building multi-platform native apps, including web apps and mobile apps both for Android and iOS, from a single codebase written in the Dart language. Kafka is a distributed event streaming platform for interconnecting the backend systems of a company, typically across a data center network. Kafka is not designed to address the challenges of connecting lots of devices across the public Internet. Therefore, building a realtime web or mobile app with lots of users to communicate with Kafka over the Internet needs a complementary technology. This blog post takes a look at the technologies available to extend Kafka across the Internet, and discusses the challenges faced to scale to millions of users. Finally, we show how to build a realtime sports scores app that scales using Flutter and Kafka.

Scaling to millions of users

Prior to discussing the technologies available to extend Kafka across the Internet, let’s see what scalability really means for good architecture.

Scaling to millions of users is achievable with any technology that scales horizontally, just by deploying enough machines. Scaling horizontally is simple and necessary but not sufficient for good architecture. Scaling vertically — by handling as many users as possible on each machine — is also needed.

Operating an app with hundreds of machines using a technology which scales vertically poorly, when only a few machines and a vertically scalable technology would be needed, is bad architecture. It's simply a waste of resources in terms of maintenance, hardware failures, power consumption etc. Not to discuss the economic and environmental impact, but scaling to millions of users with an event streaming technology that scales vertically poorly is like searching for a number in an ordered list of numbers using the linear search algorithm. It is simple and feasible, especially if the list is short. After all, if the list has ten numbers, it requires only up to ten comparisons. However, when dealing with one million numbers, it would be a shame to use up to one million comparisons rather than twenty comparisons with the binary search algorithm. Using an event streaming technology which scales vertically is like searching with binary search.

Kafka Client

A Kafka client for Dart exists from the community. It could be used to build Flutter apps to be compiled as web or mobile apps. However, as discussed above, the Kafka protocol is ideal for connecting backend systems across the data center or cloud networks. For example, with the Kafka protocol, the client should connect using TCP connections to all Kafka brokers. This is particularly inconvenient for clients connecting from edge networks or Internet as they must pass through firewalls and load balancers. Therefore, using the Kafka client library for Dart, one could possibly connect a limited number of users across the local network of a company, but not millions of users across the web.

Kafka REST Proxy

The Kafka REST Proxy provides a RESTful interface to Kafka using the standard HTTP protocol, so the clients coming from the Internet could now pass through firewalls and load balancers. Also, REST is easy to implement in any language, including in Dart.

That said, the main issue with this approach is the clients connect using the HTTP request/response pattern, so in order build a real-time app using this approach, a polling mechanism has to be used by periodically asking the Kafka REST Proxy if any fresh data is available on the server side. This typically results in a number of issues, including:

Latency: Latency is the time necessary to propagate a fresh event occurred on the server side to the client. Supposing the client polls the REST Proxy every 15 seconds, this will introduce latencies of up to 15 seconds. However, a realtime web and mobile app would typically expect latencies of tens, up to hundreds of milliseconds. To reduce the latencies, one could increase the polling frequency, but this has an impact on the scalability of the REST Proxy.
Bandwidth: For a realtime web app built with this technique, the web browsers automatically add hundreds of bytes of unnecessary HTTP headers and cookies to each polling request, and therefore the bandwidth usage would increase substantially.

WebSockets Gateways for Kafka

The WebSocket protocol has been designed for realtime bidirectional communication. It uses the HTTP protocol only for the handshake between the client and the server, then upgrades to a persistent TCP connection between the client and the server. This has the advantage of the WebSocket connections using the standard HTTP ports 80/443 which are able to pass through firewalls and load balancers in the same way as the HTTP requests do. On the other hand, by upgrading to a persistent TCP connection, the WebSocket protocol creates the premises for low-latency and high scalability. However, these are not automatic. The degree in which a WebSockets gateway is scalable and achieves low latency depends entirely on the quality of its implementation.

There are WebSockets Gateways for Kafka. Dart also implements the WebSocket protocol. While using WebSockets is the right approach, using plain WebSockets to extend Kafka across the web is not sufficient:

Usability: The WebSocket protocol is a low level transport protocol using data frames rather than messages, so it requires quite important programming efforts on the client side.
Scalability, Performance: The scalability and the low-latency performance of a WebSockets Gateway depends on the quality of its implementation, so its performance and scalability have to be proven by rigorous benchmarks or production use cases.
Ordering, Guaranteed Delivery, Resilience, Security, Authorization: WebSockets creates the premises to build a communication layer which is highly available, where the delivery of events is ordered and guaranteed despite hardware failures and network reconnections, where data is accessed securely. Again, all these features depend on the quality of the implementation of the WebSockets gateway and are not automatically available.

Enterprise WebSockets Gateways using Kafka Connect

Enterprise solutions for realtime web messaging over WebSockets exist, either software or in the cloud. These enterprise solutions typically implement all necessary features for deploying a reliable communication layer, including clustering, high availability, fault tolerance, ordering, guaranteed delivery, security, authorization, scalability etc. Also, the usability is solved by exposing an easy-to-use higher level publish/subscribe messaging protocol on top of the low-level framing protocol of WebSockets.

Most of these enterprise solutions integrates with Kafka using the Kafka Connect layer. A few examples are Ably, Diffusion, MigratoryData, and Solace. While Kafka Connect is a sophisticated service of the Kafka ecosystem which makes the integration with other systems easy, it introduces an additional layer to the architecture. So, in order to build a realtime web or mobile app, three communication layers have to be deployed:

a cluster of Kafka brokers
a cluster of Kafka Connect services, and
a cluster of enterprise WebSockets gateways

This results in an unnecessary complex architecture.

Kafka-Native Enterprise WebSockets Gateways

A few enterprise solutions integrate natively with Kafka, without using the intermediary Kafka Connect layer. HiveMQ and MigratoryData are two examples.

MQTT and the Feedback Implosion Problem

HiveMQ uses the MQTT protocol, a lightweight application-layer publish/subscribe messaging protocol which works over WebSockets. The MQTT protocol is standard for the IoT industry. The protocol is designed to be simple enough to accommodate the constrained IoT devices. For example, it implements guaranteed delivery using simple acknowledgements: when a message is sent by the server to the client, the client sends back an acknowledgement message to confirm its reception. This makes sense for IoT because data typically flow from lots of IoT devices to the server, and less frequently vice-versa, from the server to lots of IoT devices. However, for realtime web or mobile apps with millions of users, there are use cases where data is broadcasted from the server to lots of clients. For example, an app which displays sports scores in realtime (see the live demo below). Supposing a score update is broadcasted to one million concurrent sports fans using MQTT, this will result into one million feedback acknowledgment messages. This is known as the Feedback Implosion problem, and has an impact on vertical scalability.

Reliable Messaging with Feedback Implosion Suppression

MigratoryData also uses a lightweight application-layer publish/subscribe messaging protocol on top of the WebSockets protocol, but where guaranteed delivery is achieved using only negative feedback. Sequence numbers are assigned to messages on the server side and messages are replicated across the cluster. If a user receives a message with a sequence which is not in order or it receives no data, including no heartbeat for a while (failure detection mechanism), only in this rare situation would a client send negative feedback. It reconnects automatically to the cluster of WebSockets gateways and provides the last correct sequence number it received to get the new messages from that sequence number. In this way, MigratoryData achieves both reliable messaging and vertical scalability to address use cases with millions of users. Check out the following paper Reliable Messaging to Millions of Users with MigratoryData for more details.

Live Demo using Flutter, Kafka, and MigratoryData

The MigratoryData client library for Dart is available as a package on pub.dev, the official package repository for Dart and Flutter apps. We’ve built a live demo app which produces and consumes sports scores into and from Kafka through the MigratoryData WebSockets gateway, and displays the live scores into a scalable realtime web app as follows:

The demo consists of connecting to the MigratoryData WebSockets gateway, subscribing to ten subjects /matches/football/1, … , /matches/football/10, and finally publishing a random number of up to five score updates every 2 seconds as messages on randomly selected subjects from the list of ten subjects.

MigratoryData uses a simple convention to dynamically map the MigratoryData subjects to the Kafka topics, with no mapping configuration required. For example, a MigratoryData message with the subject /matches/football/5 and the content score_home, will be automatically mapped into a Kafka message with the topic matches, the key football/5, and the content score_home, and vice-versa. So, the first segment of a MigratoryData subject corresponds to a Kafka topic, and the remaining segments of the MigratoryData subject correspond to a Kafka key. For more details read this blog post or this documentation. Also, you can read more about the concepts of MigratoryData and Kafka and their correspondence in this blog post.

This is how realtime data travels across this demo:

Flutter Demo with Kafka and MigratoryData

Supposing you deployed MigratoryData and Kafka locally using the default configuration. To enable MigratoryData as a WebSockets gateway for Kafka, configure ClusterEngine=kafka in migratorydata.conf. Then, as all subjects /matches/football/1..10 of this demo correspond to the Kafka topic matches, configure topics=matches in integrations/kafka/consumer.properties such that MigratoryData subscribes ❶ to the Kafka topic matches.

Each user of the Flutter app subscribes ❷ to the subjects /matches/football/1..10. When a user publishes ❸ a message on a subject say /matches/football/5 with the content score_home, the message is received by MigratoryData and automatically published to Kafka ❹ as a Kafka message with the topic matches, the key football/5, and the content score_home. As MigratoryData subscribes to the Kafka topic matches, it will automatically get the Kafka message ❺. Then, as all users are subscribed to /matches/football/5, MigratoryData will automatically push ❻ the Kafka message it received as a MigratoryData message with the subject /matches/football/5 and the content score_home to all users. When the user receives a message with the content score_home on a subject /matches/football/5 it increments the score on the 6th line of the table for the home team or the score of the away team if the content of the message is score_away.

For didactic reasons, we’ve first written the live demo above as a web-only app written in Dart. The source code of the web-only app is available on github here. We’ve also written the live demo as a multi-platform app written in Dart and using the Flutter UI, such that you can compile it as a web app, Android app, iOS app, or a desktop app for MacOS, Linux, or Windows. The source code of the multi-platform app is available on github here.

Conclusion

In this blog post we’ve discussed the challenges faced when extending Kafka to millions of users over Internet by presenting the existing technologies and discussing their limitations: the Kafka client library and the limitations of the Kafka protocol, the REST Proxy and the request/response polling limitation, the WebSockets gateway and the need for enterprise features, the enterprise web messaging solutions and the need for an additional Kafka Connect layer. We concluded with the Kafka-native enterprise web messaging solutions which are best suitable for extending Kafka to millions of users over the Internet. Here we’ve found solutions based on the MQTT protocol, which is appropriate for IoT, but less appropriate for web and mobile due to the Feedback Implosion problem. Finally, we saw that MigratoryData, a Kafka-native enterprise web messaging solution, implements a Feedback Implosion Suppression algorithm which allows reliable messaging at scale.

If you have a realtime app with lots of users that needs higher scalability and lower latency than your current solution can deliver, or if you want an advantage over your competitors, contact us.