zk starts normally but zkServer.sh status reports Error contacting service

created at 08-31-2021 views: 51

environment

  • zk: 3.4.9 (3 nodes)
  • kubernetes: 1.17.2
  • The container image is self-packaged using the CentOS7 + zk installation package.

Problem Description

All 3 nodes of zk are started, the cluster is also set up, and provides services to the outside world normally. But the execution of zkServer.sh status to query the node status is abnormal, as follows

$ ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.9/bin/../conf/zoo.cfg
Error contacting service. It is probably not running

Use jps to see the zk process

$ jps
26 QuorumPeerMain
120300 Jps

Investigation process

modify bin/zkServer.sh to view the detailed process of the status subcommand.

STAT=`"$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \ -cp "$CLASSPATH" $JVMFLAGS org.apache.zookeeper.client.FourLetterWordMain \ $clientPortAddress $clientPort srvr 2> /dev/null \ | $GREP Mode`

Remove 2> /dev/null of the above statement and the subsequent pipeline filter statement, and check the output of the srvr command, and get the following

Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.net.httpserver.ServerImpl.<init>(ServerImpl.java:100)
at sun.net.httpserver.HttpServerImpl.<init>(HttpServerImpl.java:50)
at sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130)
at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.<init>(HTTPServer.java:179)
at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31) ... 6 more
FATAL ERROR in native method: processing of -javaagent failed

It's all Address already in use, but it doesn't say which port it is...

And zk is running normally, 2181/3888 is not a normal thing to be occupied...

The following is the process of trying your luck. According to the reference article 2 mentioned, the dataLogDir directory was not successfully created, or the myid path is incorrect...Can these configurations be incorrect to start successfully?

Reference article 3 mentioned a firewall, um...there is no firewall at all in the container. In addition, /etc/hosts is not necessary. The nodes of the address in server.X can communicate with each other (of course I also tried it, as expected ).

Solution

In order to monitor the zk container in the container environment, we used jmx_exporter, so the JVMFLAGS variable in zkServer.sh was modified.

JVMFLAGS="$JVMFLAGS -javaagent:$JMX_DIR/jmx_prometheus_javaagent-0.15.0.jar=19105:$JMX_DIR/zookeeper.yaml"

Comment out this line, and then execute zkServer.sh status, it's fine.

In addition, in addition to this reason, there is another reason that our zoo.cfg configuration is not standard, as follows

clientPort:2181
server.0:zk-ha-test-busi-kafka-0.zk-ha-test-busi-kafka-svc.zjjpt-zk.svc.cs-hua.hpc:2888:3888
server.1:zk-ha-test-busi-kafka-1.zk-ha-test-busi-kafka-svc.zjjpt-zk.svc.cs-hua.hpc:2888:3888
server.2:zk-ha-test-busi-kafka-2.zk-ha-test-busi-kafka-svc.zjjpt-zk.svc.cs-hua.hpc:2888:3888
dataLogDir:/data/zk-ha-test-busi-kafka-0/log
dataDir:/data/zk-ha-test-busi-kafka-0

With the colon : instead of the equal sign =, zk can run normally 😱.

This leads to problems in parsing the clientPort variable from zoo.cfg when executing the status subcommand in zkServer.sh.

clientPort=`$GREP "^[[:space:]]*clientPort[^[:alpha:]]" "$ZOOCFG" | sed -e's/.*=//'`

The following sed command is divided by =, so the result obtained is clientPort: 2181, which is an illegal value.

After modifying the format of zoo.cfg, it will be fine.

created at:08-31-2021
edited at: 08-31-2021: