After configuring the four files core-site.xml
, hdfs-site.xml
, yarn-site.xml
, and mapred-site.xml
, start the cluster and run the wordcount
sample program.
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output
The result is running error
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Enter the following command under the command line and copy the returned address.
hadoop classpath
Edit yarn-site.xml
vim yarn-site.xml
Add the following content
<configuration>
<property>
<name>yarn.application.classpath</name>
<value>Enter the Hadoop classpath path just returned</value>
</property>
</configuration>
Perform the above settings on all Master
and Slave
nodes
But I later thought about it. I have already configured environment variables in the configuration of yarn-site.xml
. Why do I need to configure the path of the class here? The probability is that there is a problem when copying the configuration information. Observing the yarn-site.xml
carefully, it turns out that there are two linebreak characters in the middle of the environment variable, which causes the environment variable to not be read normally, so the error that the class cannot be found before appears.
After removing these two newlines, save and exit, then synchronize the yarn-site.xml
distribution to all machines, then restart the cluster, and then run wordcount
to run successfully.
This draws a lesson: If you need to paste long content in the configuration file, it is best to open the command line window to full screen first, so that you can easily find out whether there are hidden newlines or spaces in the copied content, especially those Content that exceeds one line in a row