Today I share a set of notes for building a set of Kafka cluster from scratch. I almost helped you step on all the pits. You only need to follow the steps and have your hands.
Kafka relies on jdk and zookeeper environment
Before you start, let’s say a word about it. In the same tutorial, some people fail and others succeed. Most of the small partners who fail are path problems. So, be careful and don’t panic.
Check if your server has jdk, as shown in the figure below, it is installed
[root@localhost ~]# java -version
openjdk version "1.8.0_302"
openJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)
If not installed, you need to install them.
Although zk
has been integrated in kafka 0.5.x
and above, it is better to deploy one set separately for two reasons
Installing zookeeper is actually not complicated, but there will be a lot of pitfalls. I don’t miss every step below. The most problematic place is the path. It is recommended that the path is the same as mine, so that you can basically copy my commands directly.
opt/
directory, create a zookeeper/
folder, and go incd /opt
mkdir zookeeper
cd zookeeper
zk
installation packagewget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz
If you are prompted that the wget command does not exist wget: command not found
yum -y install wget
server1/
, server2/
, server3/
Because I only have one server, I can only set up a pseudo-cluster. The so-called pseudo-cluster means to open three ports on one machine to simulate three servers (the same is true for the real cluster. Generally speaking, the difference lies in the same port of the pseudo-cluster ip. Different, the real cluster ip but the port is the same)
mkdir server1
mkdir server2
mkdir server3
At this point, your zookeeper directory should look like this
server1
server2
server3
zookeeper-3.4.10.tar.gz
Create two folders data
and datalog
in the three server directories respectively
And extract the zk
installation package into three server directories respectively
mkdir server1/data
mkdir server1/datalog
tar -zxvf zookeeper-3.4.10.tar.gz -C server1
mkdir server2/data
mkdir server2/datalog
tar -zxvf zookeeper-3.4.10.tar.gz -C server1
mkdir server3/data
mkdir server3/datalog
tar -zxvf zookeeper-3.4.10.tar.gz -C server1
So far, every server directory should be like this
data
datalog
zookeeper-3.4.10
Now start to configure the zk cluster, the key steps are here
Create a myid
file under the data/
folder of the three server directories. The content of the file is a number. 1
for server1
, 2
for server2
, and 3
for server3
.
# create a new file
vi /opt/zookeeper/server1/data/myid
# Press i to enter the editing mode, enter the number 1, esc, colon, wq to save and exit
vi /opt/zookeeper/server2/data/myid
# Press i to enter the editing mode, enter the number 2, esc, colon, wq to save and exit
vi /opt/zookeeper/server3/data/myid
# Press i to enter the editing mode, enter the number 3, esc, colon, wq to save and exit
Then go to the conf
directory of zookeeper respectively
There is a file called zoo_sample.cfg
, don't touch it, it's useless
We copy a copy to the current directory and call it zoo.cfg
, the name is actually not important, but everyone takes it like this
cp -i zoo_sample.cfg zoo.cfg
Current directory structure
configuration.xsl
log4j.properties
zoo.cfg
zoo_sample.cfg
Then modify the zoo.cfg
file
Mainly modify several places in the red box below, other defaults or modify according to your own situation
If you follow my directory throughout the entire process, just configure it as follows
#/opt/zookeeper/server1/zookeeper-3.4.10/conf/zoo.cfg
dataDir=/opt/zookeeper/server1/data
dataLogDir=/opt/zookeeper/server1/datalog
clientPort=2181
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
#/opt/zookeeper/server2/zookeeper-3.4.10/conf/zoo.cfg
dataDir=/opt/zookeeper/server2/data
dataLogDir=/opt/zookeeper/server2/datalog
clientPort=2182
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
#/opt/zookeeper/server3/zookeeper-3.4.10/conf/zoo.cfg
dataDir=/opt/zookeeper/server3/data
dataLogDir=/opt/zookeeper/server3/datalog
clientPort=2183
server.1=localhost:2887:3887
server.2=localhost:2888:3888
server.3=localhost:2889:3889
The configuration is complete, enter the bin
directory of zookeeper to start the zk
service
cd /opt/zookeeper/server1/zookeeper-3.4.10/bin
./zkServer.sh start
cd /opt/zookeeper/server2/zookeeper-3.4.10/bin
./zkServer.sh start
cd /opt/zookeeper/server3/zookeeper-3.4.10/bin
./zkServer.sh start
Startup information
After the three zk have been started, jps
confirms whether the startup is successful or not, if all are successful, there will be these three services
At this point, the zookeeper cluster is built
Enter the /opt
directory and create a kafka
folder
mkdir /opt/kafka
cd /opt/kafka
Go in and download the kafka installation package
wget https://archive.apache.org/dist/kafka/1.0.0/kafka_2.11-1.0.0.tgz
Create three folders kafkalogs1
, kafkalogs2
, and kafkalogs3
mkdir kafkalogs1
mkdir kafkalogs2
mkdir kafkalogs3
Unzip the kafka installation package
tar -zxvf zookeeper-3.4.10.tar.gz
At this point, your kafka directory should look like this
kafka_2.11-1.0.0
kafka_2.11-1.0.0.tgz
kafkalogs1
kafkalogs2
kafkalogs3
Start configuring kafka
Enter the config
directory of kafka
cd /opt/kafka/kafka_2.11-1.0.0/config/
There is a server.properties
file inside
Rename this file and copy two copies to get these three files (you don't need to rename it, anyway, there are three server configuration files, it looks comfortable after renaming)
mv server.properties server1.properties
cp -i server1.properties server2.properties
cp -i server1.properties server3.properties
Finally got three server configuration files
server1.properties
server2.properties
server3.properties
We need to modify the four attributes of the three server configuration files:
broker.id
: node id, corresponding to the values of the three myid files of zookeeper above, write 1 for server1.properties, 2 for server2.properties, etc.listeners
: listening port, replace the following ip with the private address of your server (if you fill in 127.0.0.1
or localhost
, you may not be able to access the Kafka service from outside)log.dirs
: log file storage directory
zookeeper.connect: zk cluster ip: portWhen making changes, note that this file is very large, so bear with it.
//server1
broker.id=1
listeners=PLAINTEXT://ip:9092
log.dirs=/opt/kafka/kafkalogs1
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
//server2
broker.id=2
listeners=PLAINTEXT://ip:9093
log.dirs=/opt/kafka/kafkalogs2
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
//server3
broker.id=3
listeners=PLAINTEXT://ip:9094
log.dirs=/opt/kafka/kafkalogs3
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
Configuration is complete, start
Go to the decompressed kafka directory and start three kafka services
cd /opt/kafka/kafka_2.11-1.0.0
./bin/kafka-server-start.sh -daemon config/server1.properties
./bin/kafka-server-start.sh -daemon config/server2.properties
./bin/kafka-server-start.sh -daemon config/server3.properties
jps to see if the startup was successful
If there is no kafka process, it means that the startup failed. For the specific reason, you can check the log in the kafkaServer.out
file in the logs directory.
Go to the kafka installation directory
cd /opt/kafka/kafka_2.11-1.0.0
Create a topic first
./bin/kafka-topics.sh --zookeeper 127.0.0.1:2181 --create --topic test-topic --partitions 3 --replication-factor 3
Command explanation: On the zk at 127.0.0.1:2181, create a topic named test-topic, divide the message into 3 points, each with 3 copies
* zookeeper
: Specify zk service
* topic
: Specify the topic name
* partitions
: the number of partitions
* replication-factor
: the number of data copies
Start producer
./bin/kafka-console-producer.sh --broker-list youraddress:9092 --topic test-topic
Send a few messages casually
The most likely possibility of failure is that the path in the configuration file is not configured properly. Check it carefully.
If you make sure that the above steps and configuration files are correct, it may be that the machine has insufficient memory. Kafka defaults to the minimum startup memory of 1g
Look at the startup log
tail -111f /opt/kafka/kafka_2.11-1.0.0/logs/kafkaServer.out
If the following information is displayed, it means that your machine has insufficient memory
Simple, execute the following command to set kafka startup parameters
export KAFKA_HEAP_OPTS="-Xmx256M -Xms128M"
Start again, if there is still insufficient memory Clear machine cache
sync
echo 3 > /proc/sys/vm/drop_caches
If it is not for the above reasons, it can only be solved flexibly according to the startup log
Other commonly used kafka commands
View topic list
./bin/kafka-topics.sh --zookeeper localhost:2181 --list
View the details of a topic
./bin/kafka-topics.sh --zookeeper localhost:2181 -describe --topic test-topic