Linux builds a Kafka cluster, and the copy command can be successful

created at 01-02-2022 views: 11

Today I share a set of notes for building a set of Kafka cluster from scratch. I almost helped you step on all the pits. You only need to follow the steps and have your hands.

Kafka relies on jdk and zookeeper environment

Before you start, let’s say a word about it. In the same tutorial, some people fail and others succeed. Most of the small partners who fail are path problems. So, be careful and don’t panic.

Install jdk

Check if your server has jdk, as shown in the figure below, it is installed

[root@localhost ~]# java -version
openjdk version "1.8.0_302"
openJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)

If not installed, you need to install them.

Build zookeeper cluster

Although zk has been integrated in kafka 0.5.x and above, it is better to deploy one set separately for two reasons

  1. The zk that comes with kafka is a stand-alone machine. Modifying the configuration can also be changed to a cluster, but there are risks. Maybe the kafka will be broken.
  2. To be reasonable, although kafka relies on zk, these are two components after all. Of course it is better to be independent. We should reduce the coupling

Installing zookeeper is actually not complicated, but there will be a lot of pitfalls. I don’t miss every step below. The most problematic place is the path. It is recommended that the path is the same as mine, so that you can basically copy my commands directly.

  1. Go to your server, go to the opt/ directory, create a zookeeper/ folder, and go in
cd /opt
mkdir zookeeper
cd zookeeper
  1. Download the zk installation package
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz

If you are prompted that the wget command does not exist wget: command not found

yum -y install wget
  1. Create three folders server1/, server2/, server3/

Because I only have one server, I can only set up a pseudo-cluster. The so-called pseudo-cluster means to open three ports on one machine to simulate three servers (the same is true for the real cluster. Generally speaking, the difference lies in the same port of the pseudo-cluster ip. Different, the real cluster ip but the port is the same)

mkdir server1
mkdir server2
mkdir server3

At this point, your zookeeper directory should look like this

server1
server2
server3
zookeeper-3.4.10.tar.gz

Create two folders data and datalog in the three server directories respectively And extract the zk installation package into three server directories respectively

mkdir server1/data
mkdir server1/datalog
tar -zxvf zookeeper-3.4.10.tar.gz -C server1

mkdir server2/data
mkdir server2/datalog
tar -zxvf zookeeper-3.4.10.tar.gz -C server1

mkdir server3/data
mkdir server3/datalog
tar -zxvf zookeeper-3.4.10.tar.gz -C server1

So far, every server directory should be like this

data
datalog
zookeeper-3.4.10

Now start to configure the zk cluster, the key steps are here

Create a myid file under the data/ folder of the three server directories. The content of the file is a number. 1 for server1, 2 for server2, and 3 for server3.

# create a new file
vi /opt/zookeeper/server1/data/myid
# Press i to enter the editing mode, enter the number 1, esc, colon, wq to save and exit
vi /opt/zookeeper/server2/data/myid
# Press i to enter the editing mode, enter the number 2, esc, colon, wq to save and exit
vi /opt/zookeeper/server3/data/myid
# Press i to enter the editing mode, enter the number 3, esc, colon, wq to save and exit

Then go to the conf directory of zookeeper respectively

There is a file called zoo_sample.cfg, don't touch it, it's useless

We copy a copy to the current directory and call it zoo.cfg, the name is actually not important, but everyone takes it like this

cp -i zoo_sample.cfg zoo.cfg

Current directory structure

configuration.xsl
log4j.properties
zoo.cfg
zoo_sample.cfg

Then modify the zoo.cfg file

Mainly modify several places in the red box below, other defaults or modify according to your own situation

modify zoo.cfg file

If you follow my directory throughout the entire process, just configure it as follows

#/opt/zookeeper/server1/zookeeper-3.4.10/conf/zoo.cfg
dataDir=/opt/zookeeper/server1/data
dataLogDir=/opt/zookeeper/server1/datalog

clientPort=2181

server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889

#/opt/zookeeper/server2/zookeeper-3.4.10/conf/zoo.cfg
dataDir=/opt/zookeeper/server2/data
dataLogDir=/opt/zookeeper/server2/datalog

clientPort=2182

server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889

#/opt/zookeeper/server3/zookeeper-3.4.10/conf/zoo.cfg
dataDir=/opt/zookeeper/server3/data
dataLogDir=/opt/zookeeper/server3/datalog

clientPort=2183

server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889

The configuration is complete, enter the bin directory of zookeeper to start the zk service

cd /opt/zookeeper/server1/zookeeper-3.4.10/bin
./zkServer.sh start

cd /opt/zookeeper/server2/zookeeper-3.4.10/bin
./zkServer.sh start

cd /opt/zookeeper/server3/zookeeper-3.4.10/bin
./zkServer.sh start

Startup information

startup information

After the three zk have been started, jps confirms whether the startup is successful or not, if all are successful, there will be these three services

service

At this point, the zookeeper cluster is built

Start to build Kafka cluster

  1. Enter the /opt directory and create a kafka folder

    mkdir /opt/kafka cd /opt/kafka

  2. Go in and download the kafka installation package

    wget https://archive.apache.org/dist/kafka/1.0.0/kafka_2.11-1.0.0.tgz

  3. Create three folders kafkalogs1, kafkalogs2, and kafkalogs3

    mkdir kafkalogs1 mkdir kafkalogs2 mkdir kafkalogs3

  4. Unzip the kafka installation package

    tar -zxvf zookeeper-3.4.10.tar.gz

At this point, your kafka directory should look like this

kafka_2.11-1.0.0
kafka_2.11-1.0.0.tgz
kafkalogs1
kafkalogs2
kafkalogs3

Start configuring kafka

Enter the config directory of kafka

cd /opt/kafka/kafka_2.11-1.0.0/config/

There is a server.properties file inside

Rename this file and copy two copies to get these three files (you don't need to rename it, anyway, there are three server configuration files, it looks comfortable after renaming)

mv server.properties server1.properties
cp -i server1.properties server2.properties
cp -i server1.properties server3.properties

Finally got three server configuration files

server1.properties
server2.properties
server3.properties

We need to modify the four attributes of the three server configuration files:

  • broker.id: node id, corresponding to the values of the three myid files of zookeeper above, write 1 for server1.properties, 2 for server2.properties, etc.
  • listeners: listening port, replace the following ip with the private address of your server (if you fill in 127.0.0.1 or localhost, you may not be able to access the Kafka service from outside)
  • log.dirs: log file storage directory zookeeper.connect: zk cluster ip: port

When making changes, note that this file is very large, so bear with it.

//server1
broker.id=1
listeners=PLAINTEXT://ip:9092
log.dirs=/opt/kafka/kafkalogs1
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183

//server2
broker.id=2
listeners=PLAINTEXT://ip:9093
log.dirs=/opt/kafka/kafkalogs2
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183

//server3
broker.id=3
listeners=PLAINTEXT://ip:9094
log.dirs=/opt/kafka/kafkalogs3
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183

Configuration is complete, start

Go to the decompressed kafka directory and start three kafka services

cd /opt/kafka/kafka_2.11-1.0.0

./bin/kafka-server-start.sh -daemon config/server1.properties
./bin/kafka-server-start.sh -daemon config/server2.properties
./bin/kafka-server-start.sh -daemon config/server3.properties

jps to see if the startup was successful

check status

If there is no kafka process, it means that the startup failed. For the specific reason, you can check the log in the kafkaServer.out file in the logs directory.

test

Go to the kafka installation directory

cd /opt/kafka/kafka_2.11-1.0.0

Create a topic first

./bin/kafka-topics.sh --zookeeper 127.0.0.1:2181 --create --topic test-topic --partitions 3 --replication-factor 3

Command explanation: On the zk at 127.0.0.1:2181, create a topic named test-topic, divide the message into 3 points, each with 3 copies * zookeeper: Specify zk service * topic: Specify the topic name * partitions: the number of partitions * replication-factor: the number of data copies

Start producer

./bin/kafka-console-producer.sh --broker-list youraddress:9092 --topic test-topic

Send a few messages casually

test

Common mistakes

The most likely possibility of failure is that the path in the configuration file is not configured properly. Check it carefully.

If you make sure that the above steps and configuration files are correct, it may be that the machine has insufficient memory. Kafka defaults to the minimum startup memory of 1g

Look at the startup log

tail -111f /opt/kafka/kafka_2.11-1.0.0/logs/kafkaServer.out

If the following information is displayed, it means that your machine has insufficient memory

memory problem

Simple, execute the following command to set kafka startup parameters

export KAFKA_HEAP_OPTS="-Xmx256M -Xms128M"

Start again, if there is still insufficient memory Clear machine cache

sync
echo 3 > /proc/sys/vm/drop_caches

If it is not for the above reasons, it can only be solved flexibly according to the startup log

Other commonly used kafka commands

View topic list

./bin/kafka-topics.sh --zookeeper localhost:2181 --list

View the details of a topic

./bin/kafka-topics.sh --zookeeper localhost:2181 -describe --topic test-topic

check

created at:01-02-2022
edited at: 01-02-2022: