Apache Kafka hands-on: synchronized communications between platforms
How can you improve your data flows and processes with Apache Kafka?
If you’re in a hybrid online/offline business, possibly spread across multiple platforms, then probably one of your most significant challenges is the complexity of your software ecosystem. Managing multiple software solutions coming from different providers and with independent lifecycles can have a significant impact on how (and whether) your business functions.
In this case, you might need solutions to simplify your ecosystem, either by reducing the number of its components or by better integrating/synchronizing them.
In this article, we discuss how we optimized online/offline data flows and processes with Apache Kafka.
Basic intro to Kafka
Kafka is a data store optimized for ingesting and processing streaming data in real-time. With data that’s generated and sent at the same time by thousands of data sources, Kafka can handle the influx and process it sequentially and incrementally.
It can be distributed and is highly scalable, able to support trillions of messages per day.
Very important: Kafka is open-source and community-driven, therefore free to install.
(Disclaimer: Kafka appliances are huge, and we hereby cover only a quick hands-on, suited for a Kafka introduction and for a quick practical solution to a relatively simple business case.)
The business situation
The client is a mixed business in the fashion industry, with both an online store and a vast network of physical shops. The online store was powered by its own software infrastructure (a customized e-shop platform), while the brick-and-mortar stores were running custom-made cashbox software.
Of course, the two systems need to be synchronized. For example, the value vouchers have to be recognized/accepted across the whole business, regardless of where they are purchased.
Using direct communications between the two systems/services is not the optimal solution since it needs both services to always be available at the same time – which cannot be guaranteed.
Enter Apache Kafka: it helps us build a message-based communication system between the two platforms that run even if one of the platforms is temporarily unavailable/inaccessible.
Installing Kafka, hands-on
For a quick start, Kafka is available in three ways:
- A local installation
- A local installation using Docker
- Kafka as a service, on cloud
Since we went through all of them, here’s our quick take.
(Please note: should you need heavy-duty Kafka, and you have the technical expertise, there are other options, too – just check the managed Kafka instances commercially available on all major cloud providers.)
Local installation of Kafka
The installation steps are quite simple:
- Make sure you have Java (at least version 8) on the system.
- Download two applications/packages: Kafka is a bundle of two applications (the Broker and the Zookeeper).
- Start them individually via the provided scripts.
- Use other provided scripts to administer Kafka, but also to test the installation by creating a test topic and by producing and consuming test messages.
Here is a detailed description of the process: https://kafka.apache.org/quickstart.
Our client’s infrastructure is based on a group of on-premises Windows-2012 servers, so we decided to install Kafka on one of them. We did get some early warnings about Kafka on Windows, but since the installation process looked fairly simple, we just went along with it and gave it a try.
Everything went smoothly, and in a few minutes, we were able to see Kafka in action. Soon after, we started implementing our particular producers and consumers, and it was almost too good to be true.
However, 24 hours later, the Kafka Broker crashed, complaining about some locked files. Restarting the Broker “fixed” the issue, but a few hours later, the same error appeared.
After some research and attempts to find a proper solution (more about the error here: https://github.com/apache/kafka/pull/6329), we decided to move on to a different option.
We then searched for a Kafka service on the cloud. We selected Cloudkarafka for two main reasons: it has a free tier for testing and development, and it has accessible commercial plans once you decide to continue with it.
We just registered for a trial version and instantly got all the connection details we need to start using it.
Migrating the code from one Kafka (locally installed) to another (on Cloud) was just a matter of changing configurations, considering we didn’t yet have any actual events to migrate.
Here are the pros of Kafka as a cloud service:
- No installation time and no hassle, so it’s available right away
- No worries about maintenance and configuration
- Security is configured by default
…And the cons:
- It does cost a monthly fee that depends on the data speed and volume processed through Kafka.
- For European companies, the data must be stored inside the EU, and not all Cloud providers can guarantee that. Cloudkarafka does, though.)
Linux installation of Kafka using Docker
Although the Kafka cloud solution was good enough for our business purposes, we still wanted to have Kafka as a local installation.
Since the Windows server was not an option anymore, we prepared a new Ubuntu server for that purpose. Here are the steps needed to have Kafka on it:
- Make sure that you have docker and docker-compose available on the system
- Prepare a docker-compose.yml file (there are plenty of default configurations for that).
- Fire up the docker images using the common docker-compose commands.
- You will still have the admin scripts to test your installation, but since all is running inside a docker, you will need to fire them:
docker exec -it <container-name> <command>
(Please note: for a detailed description of such a process, you can check the “Single Node setup” section from https://www.baeldung.com/ops/kafka-docker-setup).
This time, the installation was as stable as it can be, and we eventually used it as our final solution that went into production.
Using Kafka – bare minimum
When it comes to actually use Kafka, the three most important terms are:
- Topic: a category used to organize messages; each topic will have its own queue of messages
- Producer: a client application that produces messages on a certain topic
- Consumer: a client application that subscribes to messages coming from a certain topic
Creating topics can be done via the admin scripts mentioned above. For our first particular business case, we created two topics named test.voucher and prod.voucher.
We have a convention to name the topics by the particular environment (prefix) followed by the business event (suffix). Here is the command to create a topic:
kafka-topics.sh --create --topic test.voucher --bootstrap-server localhost:9092
Producer and Consumer
Kafka’s community offers support to write producers and consumers in all major languages. The exact indication and clear code snippets can be found here: https://docs.confluent.io/clients-confluent-kafka-dotnet/current/overview.html
To map this to our simple business case, a workflow will look like this:
- A voucher is bought at the physical store
- A dedicated Producer is activated to produce a message with all the details about the voucher purchase, then pushed to a certain Topic.
- A dedicated Consumer, acting on behalf of the Web-shop and listening to that Topic, will “catch” that message and notify the Web-shop to register the voucher into its own system.
Kafka best practices, the Berg Software edition
Kafka REST proxy
The code that implements the producers and consumers is better kept as a separate microservice, acting like a Kafka REST proxy. This proxy will stay between your business services and the Kafka instance.
- When the business decides to broadcast a message, it will simply fire a REST call to the Kafka proxy, which will then use the dedicated Producer to actually push that message to the right topic.
- When the business decides that it’s interested in a particular message, it will register for it with a REST endpoint at the Kafka proxy, which will then instruct a Consumer to call that endpoint when a message arrives at the right topic.
Retry and fallback strategy
- Use the Commit mechanism to ensure that messages are properly processed. A failure during processing will also fail to commit that message, and so it will stay in the queue.
- An alternative to the above technique is to use a dedicated topic for messages that failed to be processed. They will be inserted into this topic and processed by a different consumer.
- Kafka comes without any graphical user interface. You will need some viewing and administration tools beyond the barebones command interface.
- Use monitoring tools like Confluent Control Center, Prometheus & Grafana, or Conduktor, to have a clear, instant view of your Kafka instance.
With the use of Apache Kafka, we were able to solve the main issue on hand – i.e., the synchronization of various, not-directly-connected parts of the client’s software ecosystem.
But then, we reached other, equally important benefits, such as:
- ecosystem resilience,
- preservation of all important data;
- even when one part of the system is down for some time, the communication is still functional eventually.
At Berg Software, we turn business ideas into software – and this Apache Kafka case is a straightforward example of how we deploy software solutions to support and streamline clients’ businesses.