Microservices, Clean Architecture, and Kafka in GoJek

Wildan G. Budhi
9 min readFeb 17, 2020

So you’re thinking of building a super app like GOJEK? You need to know these 3 main things that should be implemented in building a super app. We got interesting experiences about these 3 things during our time in the Winter Business Intelligence Internship program in GoJek, even though we were not placed in Product Engineering.

1 App, 1 Server, Many Product

Let’s look at the time before we joined this Internship. If we want to make an application, as usual, we do many lectures and website projects, then we will put all the features or products into one application and on one server. For example let’s make a simple clone GoJek App that only has 3 products, GoRide, GoCar, GoBluebird, and every product has Booking, Payment, and Rating Features. As we know the most famous, at least for us, Many Web Framework is implementing Monolithic Architecture and Model-View-Controller Design Pattern or MVC. So let’s make the App using that.

Before we make the app, maybe some of you are still asking what Monolithic Architecture and MVC Design Pattern are. So Monolithic Architecture is one of Software Engineering Architecture that the user interface, logic processing, and data access are combined into a single program and placed inside a single platform. And MVC Design Pattern is a design of Software that is divided into 3 parts, Model which handles All Connection and Data Retrieval, Controller which do all the Logic Processing, View is the user interface. MVC has rules that View cannot have a direct relation to Model.

GOJEK CLONE using Monolithic Architecture and MVC Design Pattern

We are not here discussing concurrency in access by users. So let’s discuss the development case. The case is our system has been deployed and our users have reached millions of users.

“We want to add a Payment Method to GoRide Product”

In Monolithic Architecture what we have to do if there is any requirement to change the business flow whether it’s adding features, or removing features, or changing features are :

  1. Make a Copy of Whole Code or Pull From the Repo
  2. Make some changes
  3. Test the system
  4. Shutdown the server
  5. Deploy the new code
  6. Restart the server

We can see that in Monolithic Architecture if we want to make some changes we have to change the whole system ? because as I said before, the whole process is combined into one program. Either we only need to change business flow in one product or need to change business flow in the whole product we have to reconstruct the whole code and ensure the new code does not damage or change the other code of the product or other features. The most difficult thing is to ensure the new code is compatible with other product codes or other features. In this example, there are only 3 Products and each Product only has 2 Features, what about the real GoJek case with 20+ Products and hundreds and even features of each Product

The next problem is we should shut down the whole server to deploy a new program. In this example, we only add a new Payment Method in GoRide product and its impact on the whole app, make other product users disrupted, unable to access the product, even though the product they are accessing is not being changed. And in financial terms, it can make huge losses. In this example case, we may only suffer millions or tens of millions of losses. What about the real GoJek case? That can make GoJek suffer losses of up to tens of billions or hundreds of billions, and that is not a small number for money.

To solve that problem, GoJek implements Microservices Architecture to the whole GoJek system.

Microservices

So, what Microservice is? Microservice is a software development technique that arranges an application as a collection of loosely coupled services. Coupling is the degree of interdependence between software modules. So we can say Microservice is where an app or program split into many small modules that have very small not interdependence.

GOJEK CLONE using Microservices Technique in Single Server

What’s the difference, it’s still using one app and one server? Ok. let’s look. Microservices Technique can divide all the programs into independent programs or apps, whether it is on a separate server or in a single server but using Docker Container. Read More about Docker Container here.

GOJEK CLONE using Microservices Technique in Separate Server and Docker Container

We can see that gateway and product are in separate servers, every product is also split into independent servers and every module of every product is inside independent Container, so everything is separated into independent app or service that loosely coupled. By implementing this technique we can make some changes but not disturb other products, even other modules, whether the code or the app. So we don’t need to ensure that the new code is compatible with the other code of other services and no need to shut down the entire app to deploy the changes. We just need to change or add the new service and deploy the service that we need to have changes or new services.

But how do we make the code that can implement the Microservices Technique? Well, in GoJek and many big technology companies implement Clean Architecture to create the code that implements Microservices.

Clean Architecture

Clean Architecture Diagram

Let’s see about Clean Architecture. Clean Architecture is one of Software Design that supports Microservice Architecture. Why is Clean Architecture good for implementing Microservice Architecture? It’s because Clean Architecture split a program into many small programs. Not only it can be split Big program into many small programs or modules, but inside modules are also divided into some Layers to help Software Development. There are 4 Layer, Frameworks & Drivers, Interface Adapter, Application Business Rules, and Enterprise Business Rules. All these layers form a circle that is interdependent with the circle of other layers that are outside it. So the rules of the flows are from the outermost layer to the innermost layer and the return is vice versa.

The first layer is Frameworks & Drivers. From the name of the layer, we can see that Clean Architecture allows us to use everything tools such as library and Framework. In this layer, external tools and connections are placed such as Database Connection, Link or Address of Other Services that would become dependent on the service or module, User Interface, etc. In software development we better to passing the Instance of Connection, such as DB Connection, to other objects or layers than save it in Global Memory or Outside Object, because it could help in Software Testing on Function Test so we can test all test case that provides combination of all data that will be sent to the function including the Instance. In this layer, all kinds of data checking, business logic, and DB Connection are to be forbidden. Following the rule of flow, this layer has dependencies on the next layer.

The next layer is Interface Adapter. This layer is got data from Frameworks & Drivers layer and do some data checking before the data send to the next layer, such as Email Format, Username Format, Password Format, etc. Not only checking the data but in this layer, we also shaping the data into a data format that compatible with the next layer. For example, the Name string must be Camelcase in the next layer, so this layer format it into camelcase. Also, this layer has dependencies on the next layer. Not only format the data that will send to the next layer but this layer also formats the data that comes from the next layer into a previous layer format such as HTML, or JSON String, etc.

The third layer is Application Business Rules Layer or some developers call it UseCase. Like the previous layer in this layer is got the data from the previous layer that already shaped to the format required from this layer. In this layer is all Business Logic of this module is done, but the other logic like logic that process data from other services or DB are not allowed in this layer, like calculate from 2 services that giving Discount Data and Price Rate to get the final cost after discount. Same with the other 2 layers before, this layer also has dependencies on the next layer.

Before we move to the next layer, some developers add one additional part. Why do we say “part ”rather than “layer”? Because it isn't the part of Clean Architecture, it becomes a helper only. It’s call Repository. In this part is done all basic operations that integrated into the External things, such as fetching data from Database, send a request to other services and catch the response, update data in Database, delete data in Database, API Call, and etc. This layer is required at least an External Things Instance that sent from UseCase Layer. And it will return the result to the UseCase Layer.

You will definitely ask if we need to do some operation to process the data obtained from Repository, in which layer we should do that? Then welcome the last layer of this Clean Architecture, it’s called Enterprise Business Rule Layer. All operations that need to do but it is outside of this module context are done in this layer. Since this layer is the last layer, this layer doesn’t have dependencies on other layers or parts.

Message Broker: Kafka

Let’s add the database for the last diagram to see how the data flows in the Microservices Technique. Many developers make a different Database for different Product so the Concurrency is not overloaded.

GOJEK CLONE using Microservices Technique with Database

Let’s say you are an analyst assigned to analyze the booking data for all products. Since the architecture of the Program is using the Microservices Technique and absolutely every product must have an independent database, so you need to query to get the booking data on every product database. It feels easy in this case because the example only contains 3 Products. what about the real GoJek case with 20+ Products and hundreds and even features of each Product, are you still want to query manually to each of the DB? How GoJek resolve this problem? They implement the Message Broker.

What is Message Broker? Message Broker is a System of the data stream. Just like a river with multiple upstream and with multiple downstream. So we just need to create a worker to push data from Product Database to Message Broker, called Producer, and create a worker to request to Message Broker then push the data to Data Storage or Data Warehouse, called Consumer. In GoJek they use Kafka Message Broker. So you just need to query from Data Storage or Data Warehouse.

GOJEK CLONE Data Flows using Kafka

Here it is possible to make more than one Producer on each Product and it is possible to make more than one Consumer according to the amount that requires data.

In this flow, there is a problem the data is not clustered which data is Booking data, Payment data, and Rating Data. We can solve that with features of Kafka called Kafka Topics. Kafka Topics is just like an identifier so we can make 3 Topics, Booking Topics, Payment Topics, and Rating Topics. Then we just need to push data to the corresponding topic. After that, the consumer can fetch the data and make different table for different topics, so analyst or who needs the data just need to query on the table with a specific table and the data ready to be analyzed.

Conclusion

These three things, Microservices, Clean Architecture, and Kafka is the backbone of the Super App. If you want to make a Super App you have to implement these three things to SpeedUp your app. Money can’t wait for your loading app! Your rival is quickly overtaking your business!

Reference

BIG Thanks to

  • PT. Aplikasi Karya Anak Bangsa ( GOJEK )
  • Data Warehouse Team GOJEK
  • Data Engineer Team GOJEK
  • Data Science Platform Team GOJEK
  • Product Engineer Team GOJEK
  • PDG L&D Team GOJEK

--

--