Install Apache Kafka 4 nodes No.42

How to install Apache Kafka 4 nodes is shown here. 4 ubuntu nodes are used for Apache Kafka.

▼1. What is Apache Kafka?

Kafka combines three key capabilities.

  • To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems.
  • To store streams of events durably and reliably for as long as you want.
  • To process streams of events as they occur or retrospectively.

Ref: Apache Kafka


▼2. Installing Apache Kafka with 4 nodes

2-1. Preparing for 4 Ubuntu 20.04.01 LTS x64 nodes

One node is used for Zookeeper Server. other 3 nodes are used to provide Kafka Broker services.

2-2. Updating hosts file on 4 nodes

(e.g) /etc/hosts
xxx
10.0.0.1 zookeeper
10.0.0.2 kafkabroker1
10.0.0.3 kafkabroker2
10.0.0.4 kafkabroker3

2-3. Installing Java 8 JDK on 4 nodes

Azul Zulu OpenJDK Java 8 (LTS) is used for this Kafka. Download Azul Zulu Builds of OpenJDK | Azul

例)
sudo apt install ./zulu8.58.0.13-ca-jdk8.0.312-linux_amd64.deb

2-4. Installing Apache Kafka on 4 nodes

There is a download site for Apache Kafka (Kafka 3.0.0 2.13-3) https://www.apache.org/dyn/closer.cgi?path=/kafka/3.0.0/kafka_2.13-3.0.0.tgz

例)
tar -xzf kafka_2.13-3.0.0.tgz
cd kafka_2.13-3.0.0

2-5. Updating environment variables on 4 nodes

/home/hadoop/.profile is updated as below. ${HOME} is /home/hadoop in this case.

e.g)
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HOME/kafka_2.13-3.0.0/bin 

2-6. Only on Zookeeper node, zookeeper.properties is set.

e.g) 
mkdir /home/hadoop/kafka_data/zookeeper_data
vi $HOME/kafka_2.13-3.0.0/config/zookeeper.properties

xxx
dataDir=/home/hadoop/kafka_data/zookeeper_data 

2-7. Starting Zookeeper

/home/hadoop/kafka_2.13-3.0.0/bin/zookeeper-server-start.sh $HOME/kafka_2.13-3.0.0/config/zookeeper.properties

2-8. Updating rc.local due to start of Zookeeper automatically when OS is started

a. Creating rc.local file
sudo vi /etc/rc.local

b, Adding below in rc-local.service 
sudo vi /etc/systemd/system/rc-local.service

[Unit]
 Description=/etc/rc.local Compatibility
 ConditionPathExists=/etc/rc.local
[Service]
 Type=forking
 ExecStart=/etc/rc.local start
 TimeoutSec=0
 StandardOutput=tty
 RemainAfterExit=yes
 SysVStartPriority=99
[Install]
 WantedBy=multi-user.target

d. Setting permission for this file.
sudo chmod +x /etc/rc.local

e. updating /etc/rc.local file
printf '%s\n' '#!/bin/bash' 'exit 0' | sudo tee -a /etc/rc.local

f. Adding below over exit 0 command
/home/hadoop/kafka_2.13-3.0.0/bin/kafka-server-start.sh /home/hadoop/kafka_2.13-3.0.0/config/server.properties> /dev/null 2>&1 & 

g. Starting rc-local.service
sudo systemctl enable rc-local
sudo systemctl start rc-local.service
sudo systemctl status rc-local.service

h. Running zookeeper shell and Confirming if zookeeper server is online. at this time, it is no problem that Kafka broker doesn't exist and broker id doesn't exit.
kafka_2.12-2.0.0/bin/zookeeper-shell.sh localhost:2181 ls /brokers/ids

2-9. Configuring Kafka Brokers

2-9-1. Creating the folder which name is kafka_data each 3 kafka brokers

例)
mkdir /home/hadoop/kafka_data

2-9-2. Updating server.properties on 3 Kafka Brokers

The location of the file is /home/hadoop/kafka_2.13-3.0.0/config/server.properties in this case. server.properties are updated each 3 kafka brokers as below.

設定kafkabroker1 kafkabroker2kafkabroker3
broker.id012
broker.rackRAC1RAC2RAC3
log.dirs/home/hadoop/kafka_data/home/hadoop/kafka_data/home/hadoop/kafka_data
offsets.topic.num.partitions333
offsets.topic.replication.factor222
min.insync.replicas222
default.replication.factor222
zookeeper.connect10.0.0.110.0.0.110.0.0.1
(例)

//kafkabroker1
broker.id=0
broker.rack=RAC1
log.dirs=/home/hadoop/kafka_data
offsets.topic.num.partitions=3
offsets.topic.replication.factor=2
min.insync.replicas=2
default.replication.factor=2
zookeeper.connect=10.0.0.1:2181

//kafkabroker2
broker.id=1
broker.rack=RAC2
log.dirs=/home/hadoop/kafka_data
offsets.topic.num.partitions=3
offsets.topic.replication.factor=2
min.insync.replicas=2
default.replication.factor=2
zookeeper.connect=10.0.0.1:2181

//kafkabroker3
broker.id=2
broker.rack=RAC3
log.dirs=/home/hadoop/kafka_data
offsets.topic.num.partitions=3
offsets.topic.replication.factor=2
min.insync.replicas=2
default.replication.factor=2
zookeeper.connect=10.0.0.1:2181

2-10. Updating rc.local each Brokers. the steps are the same as ones above 2-8.


▼3. Creating Topic, writing and read data

(e.g)
# Setting environment variables of KAFKA Brokers according to a hosts file
export KAFKABROKERS=10.0.0.2:9092,10.0.0.3:9092,10.0.0.4:9092
echo $KAFKABROKERS
(output)
10.0.0.2:9092,10.0.0.3:9092,10.0.0.4:9092

# Setting environment variables of KAFKA Zookeeper according to a host file
export KAFKAZOOKEEPER=10.0.0.1:2181
echo $KAFKAZOOKEEPER

(output)
10.0.0.1:2181

# Running zookeeper shell and confirming broker ids
Zookeeper-shell.sh $KAFKAZOOKEEPER ls /brokers/ids

(output)
Connecting to 10.0.0.1:2181
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[0,1,2]

# Creating the topic which name is topic01
kafka-topics.sh --create --replication-factor 3 --partitions 3 --topic test01 --bootstarp-server $KAFKABROKERS

(output)
Created topic test

# listing the topic
kafka-topics.sh --list --bootstrap-server $KAFKABROKERS

(output)
test

# Showing the details of this topic
kafka-topics.sh --describe --bootstrap-server $KAFKABROKERS --topic test


(output)
Topic: test	TopicId: C6c29UKLQXxxXx2G6EmuyA	PartitionCount: 3	ReplicationFactor: 3	Configs: min.insync.replicas=2,segment.bytes=1073741824
	Topic: test	Partition: 0	Leader: 1	Replicas: 1,2,0	Isr: 1,2,0
	Topic: test	Partition: 1	Leader: 2	Replicas: 2,0,1	Isr: 2,0,1
	Topic: test	Partition: 2	Leader: 0	Replicas: 0,1,2	Isr: 0,1,2


# Generating data
kafka-console-producer.sh --broker-list $KAFKABROKERS --topic test

(input)
>1
>2
>3
>a
>b
>c

# Reading the generated data
Kafka-console-consumer.sh --bootstrap-server $KAFKABROKERS --topic test --from-beginning

(output)
1
2
b
c
3
a
Processed a total of 6 messages

▼4. Reference

  1. Apache Kafka
  2. Installing Multi-node Kafka Cluster – Learning Journal
  3. Apache Kafka/quickstart
  4. How to Enable /etc/rc.local with Systemd – LinuxBabe
  5. Quickstart: Set up Apache Kafka on HDInsight using Azure portal | Microsoft Docs

That’s all. Have a nice day ahead !!!

Leave a Reply

Your email address will not be published. Required fields are marked *