hbase ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet

created at 08-09-2021 views: 64

error message

Exception in thread "main" java.lang.IllegalArgumentException: Failed to find metadata store by url: kylin_metadata@hbase
    at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:99)
    at org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:111)
    at org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:99)
    at org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:43)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:92)
    ... 3 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046524833, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046524855, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server recbd5.hwwt2.com,16020,1628044355182 is not running yet
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1501)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2440)
    at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)

Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046524855, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server recbd5.hwwt2.com,16020,1628044355182 is not running yet
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1501)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2440)
    at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)


Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046524833, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
Wed Aug 04 11:08:45 CST 2021, RpcRetryingCaller{globalStartTime=1628046525453, pause=100, maxAttempts=2}, org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server recbd5.hwwt2.com,16020,1628044355182 is not running yet
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1501)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2440)
    at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)

Solution

1) check found that hadoop is in safe mode at this time, you need to let hadoop exit safe mode

hadoop dfsadmin -safemode leave 

2) Restart hbase, and it is found that Hbase still does not work normally, Hmaster is abnormal, Regionserver is abnormal, and the abnormal log is as follows:

Hmaster critical exception log

2018-05-25 10:19:12,737 DEBUG[hadoop001:60000.activeMasterManager] wal.WALProcedureStore: Opening state-log:FileStatus{path=hdfs://beh/hbase/MasterProcWALs/state-00000000000000036689.log;isDirectory=false; length=45760804; replication=3; blocksize=536870912;modification_time=1527123981127; access_time=1527165673882; owner=hadoop;group=hadoop; permission=rw-rw-r--; isSymlink=false} 2018-05-25 10:19:12,742 INFO  [hadoop001:60000.activeMasterManager]util.FSHDFSUtils: Recover lease on dfs filehdfs://beh/hbase/MasterProcWALs/state-00000000000000036690.log 2018-05-25 10:19:12,742 INFO  [hadoop001:60000.activeMasterManager]util.FSHDFSUtils: Recovered lease, attempt=0 onfile=hdfs://beh/hbase/MasterProcWALs/state-00000000000000036690.log after 0ms 2018-05-25 10:19:12,742 DEBUG[hadoop001:60000.activeMasterManager] wal.WALProcedureStore: Opening state-log:FileStatus{path=hdfs://beh/hbase/MasterProcWALs/state-00000000000000036690.log;isDirectory=false; length=45761668; replication=3; blocksize=536870912;modification_time=1527123982242; access_time=1527165673883; owner=hadoop;group=hadoop; permission=rw-rw-r--; isSymlink=false} 2018-05-25 10:19:12,767 INFO  [hadoop001:60000.activeMasterManager]util.FSHDFSUtils: Recover lease on dfs filehdfs://beh/hbase/MasterProcWALs/state-00000000000000036691.log 2018-05-25 10:19:12,768 INFO  [hadoop001:60000.activeMasterManager]util.FSHDFSUtils: Recovered lease, attempt=0 onfile=hdfs://beh/hbase/MasterProcWALs/state-00000000000000036691.log after 1ms . . . 2018-05-25 10:29:29,656 DEBUG[B.defaultRpcServer.handler=31,queue=13,port=60000] ipc.RpcServer: B.defaultRpcServer.handler=31,queue=13,port=60000:callId: 301 service: RegionServerStatusService methodName: RegionServerStartupsize: 46 connection: 172.33.2.22:38698 org.apache.hadoop.hbase.ipc.ServerNotRunningYetException:Server is not running yet   <br>  at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2296)<br>     atorg.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361) <br>    atorg.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)<br>     atorg.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)<br>     atorg.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) <br>    atorg.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) <br>    atorg.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) <br>    atjava.lang.Thread.run(Thread.java:745)

Regionserver key exception log

20180525 10:31:14,446 WARN [regionserver/hadoo031730 regionserver .HRegionServer: reportForDuty failed; sleeping and then retrying.
201805-25 10:31:17 446 INFO [regionserver/hadop030 regionserver .HRegionServer: reportForDuty to master=hadoop001 , 60o00, 1527214458906 with port=60025, startcode=1527214459823
20180525 10:31:17 447 DEBUG [regionserver/hadoo03730 regionserver .HRegionServer: Master is not running yet
20180525 10:31:17 447 WARN [regionserver/hadoo03730 regionserver .HRegionServer: reportForDuty failed
  sleeping and then retrying
20180525 10:31:20,447 INFO [regionserver/hadoo031730 regionserver .HRegionServer: reportForDuty to master=hadoop001 60000, 1527214458906 with port60025, startcode1527214459823
20180525 10:31:20,448 DEBUG [regionserver/hadoop003173 regionserver .HRegionServer: Master is not running yet
20180525 10:31:20,448 WARN [ regionserver/hadoop003173 regionserver .HRegionServer: reportForDuty failed
  sleeping and then retrying.
201805-25 10:31:23,448 INFO [regionserver/hadop030 regionserver .HRegionServer: reportForDuty to master=hadoop001 , 60000,1527214458906 with port=60025, startcode=1527214459823
20180525 10:31:23,449 DEBUG [regionserver/hadoop003/173 regionserver .HRegionServer: Master is not running yet

Datanode critical exception log

2018-05-25 11:04:20,540 INFOorg.apache.hadoop.hdfs.server.datanode.DataNode: Likely the client has stoppedreading, disconnecting it (hadoop028:50010:DataXceiver error processingREAD_BLOCK operation  src: /172.33.2.17:39882dst: /172.33.2.44:50010); <br>java.net.SocketTimeoutException: 600000 millistimeout while waiting for channel to be ready for write. ch :java.nio.channels.SocketChannel[connected local=/172.33.2.44:50010remote=/172.33.2.17:39882] 2018-05-25 11:04:20,652 INFOorg.apache.hadoop.hdfs.server.datanode.DataNode: Likely the client has stoppedreading, disconnecting it (hadoop028:50010:DataXceiver error processingREAD_BLOCK operation  src:/172.33.2.17:39930 dst: /172.33.2.44:50010);<br> java.net.SocketTimeoutException:600000 millis timeout while waiting for channel to be ready for write. ch :java.nio.channels.SocketChannel[connected local=/172.33.2.44:50010remote=/172.33.2.17:39930] 2018-05-25 11:04:21,088 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:Likely the client has stopped reading, disconnecting it(hadoop028:50010:DataXceiver error processing READ_BLOCK operation  src: /172.33.2.17:40038 dst:/172.33.2.44:50010);<br> java.net.SocketTimeoutException: 600000 millis timeoutwhile waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connectedlocal=/172.33.2.44:50010 remote=/172.33.2.17:40038]

3) problem analysis

Before solving, to eliminate the hdfs problem, the abnormal information of the datanode is caused by the hbase Hmaster not being able to start normally, 172.33.2.17 is the active (zk determined) Hmaster node;

According to the logs of Reginserver and Hmaster 

org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is notrunning yet

Master is not running yet

It is determined that the Hmaster service cannot be started normally;

According to the Hmaster exception log: 

2018-05-25 10:19:59,868 WARN [hadoop001:60000.activeMasterManager] wal.WALProcedureStore: Unable toread tracker for hdfs://beh/hbase/MasterProcWALs/state-00000000000000040786.log- Missing trailer: size=11 startPos=11 

View the directory hdfs://beh/hbase/MasterProcWALs, the total size of the directory is 1.3T

Reason

The Hmaster state becomes active, and it will have many different logs to recover, lease, read; but the huge amount of logs puts a lot of pressure on the namenode, exhausts the tcp buffer space, and leads to a long service recovery time .

4) the solution: delete the log file in the hdfs://beh/hbase/MasterProcWALs directory, and then restart the hbase cluster

from: https://www.cnblogs.com/laoqing/p/15112134.html

created at:08-09-2021
edited at: 08-09-2021: