Kotlin, Spring, ElasticSearch and Kafka full-text search microservice 👋✨💫

Alexander Bryksin
6 min readSep 11, 2022

👨‍💻 Full list what has been used:

Spring Spring web framework
Spring WebFlux Reactive REST Services
Elasticsearch client for Java
Kafka Spring Kafka
Zipkin open source, end-to-end distributed tracing
Spring Cloud Sleuth auto-configuration for distributed tracing
Prometheus monitoring and alerting
Grafana for to compose observability dashboards with everything from Prometheus
Kibana is user interface that lets you visualize your Elasticsearch
Docker and docker-compose

Source code you can find in GitHub repository. The main idea of this project is the implementation of a full-text search with support for synonyms, mistyping, and the wrong keyboard layout using Elasticsearch and Kafka.

All UI interfaces will be available on ports:

Swagger UI: http://localhost:8000/webjars/swagger-ui

Grafana UI: http://localhost:3005

Kibana UI: http://localhost:5601/app/home#/

Zipkin UI: http://localhost:16686

Prometheus UI: http://localhost:9090

Docker-compose file for this project:

version: "3.9"

services:

zookeeper:
image: 'bitnami/zookeeper:latest'
ports:
- '2181:2181'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
volumes:
- "./zookeeper:/zookeeper"
networks: [ "microservices" ]

kafka:
image: 'bitnami/kafka:latest'
ports:
- "9092:9092"
- "9093:9093"
volumes:
- "./kafka_data:/bitnami"
environment:
- KAFKA_BROKER_ID=1
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://127.0.0.1:9092
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CLIENT:PLAINTEXT,EXTERNAL:PLAINTEXT
- KAFKA_CFG_LISTENERS=CLIENT://:9092,EXTERNAL://:9093
- KAFKA_CFG_ADVERTISED_LISTENERS=CLIENT://kafka:9092,EXTERNAL://localhost:9093
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=CLIENT
depends_on:
- zookeeper
networks: [ "microservices" ]

node01:
image: docker.elastic.co/elasticsearch/elasticsearch:8.3.3
container_name: node01
restart: always
environment:
- node.name=node01
- cluster.name=es-cluster-8
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.license.self_generated.type=basic
- xpack.security.enabled=false
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- ./es-data01:/usr/share/elasticsearch/data
ports:
- "9200:9200"
- "9300:9300"
networks: [ "microservices" ]

kibana:
image: docker.elastic.co/kibana/kibana:8.3.3
restart: always
environment:
ELASTICSEARCH_HOSTS: http://node01:9200
ports:
- "5601:5601"
depends_on:
- node01
networks: [ "microservices" ]

prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
command:
- --config.file=/etc/prometheus/prometheus.yml
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
networks: [ "microservices" ]

node_exporter:
container_name: node_exporter_container
restart: always
image: prom/node-exporter
ports:
- '9101:9100'
networks: [ "microservices" ]

grafana:
container_name: grafana_container
restart: always
image: grafana/grafana
ports:
- '3005:3000'
networks: [ "microservices" ]

zipkin:
image: openzipkin/zipkin:latest
restart: always
container_name: zipkin
ports:
- "9411:9411"
networks: [ "microservices" ]

volumes:
es-data01:
driver: local

networks:
microservices:
name: microservices

Full-text search with auto-completion can be realized in different ways, it’s up to you choose which is better for your case. First, we have to create mappings for our index, of course, elasticsearch can create it for us, but it’s not a production solution. As data model used here simple abstract shopping product item with searchable title, description, and shop name fields, for this example it’s enough. The creation of the right mapping is very important and tricky, to realize full-text search with synonyms, mistyping, and wrong keyboard layout we will configure our own analyzer, which combines the character filters, tokenizer, and token filters.

Our ngram_filter is elastic builtin edge_ngram filter, for understand how it works highly recommend read elasticsearch documentation, We have to assign min and max values, it always depends on each unique case. Next, let’s specify name_synonym_filter, in synonyms field we have to add an array of strings, each string is a comma separate synonyms which will be used for search.

In mapping properties let’s create our document fields mappings, for full-text search field is “type”: “text”, as an analyzer our newly created autocomplete_analyzer, it’s used when elastic index documents, and for search queries, we don’t need so complex analyzer it will only slow down our search queries, so we add another one search_analyzer: standard Handling of wrong keyboard language layout we can implement in different ways, on elasticsearch side by synonyms, but it makes mappings to huge and on the application level by creating a mapping for keyboard language then map users search term to apposite language and send to elastic query for search any one of it, so for example search the words “Apple, apple, яблоко, Яблоко, z,kjrj, Z,kjrj, Фззду, фззду” will find the same for the all of it. In real-world scenarios usually mapping is much more complicated but here for example it’s enough:

For Java and Kotlin available official elasticsearch client library First we have to configure it, at the start of microservice check indexes and aliases for exists and create if need. In real world production projects is good practice always use aliases:

Our microservice interacts by HTTP using Spring WebFlux and Kafka official client, REST controller has index document and search methods:

The service layer has index and search methods:

The Kafka consumer is listening for and processing messages using Bulk API which has better performance for indexing documents. Bulk insert at the current microservice implemented in two ways, we accumulate documents in memory queue and flush it to Elasticsearch with given interval by Scheduling Tasks or if queue has reach configured size:

The repository has the same methods for index documents and search. For the search method, we use should multi_match query where we pass the original term and mapped to the opposite keyboard language layout search term. Good practice for Elasticsearch is always use alias for indexes. The implementation for keyboard language layout converter is to load JSON file with mappings when the application starts, marshal it to the map and have one method for converting one language to another.

Repository methods:

More details and source code of the full project you can find here, of course, in real-world applications, full-text search and business requirements can be much more complicated and for example includes machine learning, etc. I hope this article is usefully and helpfully, and be happy to receive any feedback or questions, feel free to contact me by email or any messengers 🙂

--

--