Install Hadoop Yarn Flink Anaconda on Mac

created at 12-15-2021 views: 13

Set SSH password-free login

cd ~/.ssh
cp id_rsa.pub authorized_keys
# Test whether a password is required
ssh localhost

Install hadoop using homebrew

brew install hadoop

# Installation successful identification
🍺 /usr/local/Cellar/hadoop/3.3.1: 22,487 files, 1GB

Modify the configuration file

  • core-site.xml

vi /usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop/core-site.xml

# Add under the configuration tab
<property>
     <name>hadoop.tmp.dir</name>
     <value>file:/usr/local/Cellar/hadoop/3.3.1/libexec/tmp</value>
</property>
<property>
     <name>fs.defaultFS</name>
     <value>hdfs://localhost:8020</value>
</property>
  • hdfs-site.xml

vi /usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop/hdfs-site.xml

# Add under the configuration tab
     <property>
          <name>dfs.replication</name>
          <value>1</value>
     </property>
     <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:/usr/local/Cellar/hadoop/3.3.1/libexec/tmp/dfs/name</value>
     </property>
     <property>
          <name>dfs.namenode.data.dir</name>
          <value>file:/usr/local/Cellar/hadoop/3.3.1/libexec/tmp/dfs/data</value>
     </property>
     <property>
         <name>dfs.permissions</name>
         <value>false</value>
     </property>

\==The following two xmls are the configuration of yarn==

  • yarn-site.xml

vi /usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop/yarn-site.xml

# Add under the configuration tab
     <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
     <property>
         <name>yarn.nodemanager.env-whitelist</name>
         <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
     </property>
  • mapred-site.xml

vi /usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop/mapred-site.xml

# Add under the configuration tab
   <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
   </property>

Add hadoop environment variables

export HADOOP_HOME=/usr/local/Cellar/hadoop/3.3.1/libexec
export HADOOP_COMMON_HOME=$HADOOP_HOME
export PATH="$PATH:$HADOOP_HOME/bin"

Initialization is required for the first installation (only required for the first time)

cd /usr/local/Cellar/hadoop/3.3.1/bin
./hdfs namenode -format

Start yarn

# Start hdfs
cd /usr/local/Cellar/hadoop/3.3.1/sbin
./start-dfs.sh

# Check whether the startup is successful
# jps
6306 SecondaryNameNode
6069 NameNode
6392 Jps
6170 DataNode

# Start yarn
cd /usr/local/Cellar/hadoop/3.3.1/sbin
./start-yarn.sh

hdfs page http://localhost:9870/dfshealth.html#tab-overview

yarn page http://localhost:8088/cluster

If you can't get success, specify the java environment variable: reference

vi /usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop/hadoop-env.sh


export JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.3.1/libexec
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
case ${HADOOP_OS_TYPE} in
  Darwin*)
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "
  ;;
esac
export HADOOP_ROOT_LOGGER=DEBUG,console
export HADOOP_DAEMON_ROOT_LOGGER=DEBUG,RFA

WordCount verification

hadoop jar /usr/local/Cellar/hadoop/3.3.1/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /output
# The last three parameters mean: wordcount is the name of the test case, /input represents the directory of the input file, and /output represents the directory of the output file. The results of the operation are as follows


Note⚠️: The output file must be a non-existent file. If you specify an existing directory as the output of a hadoop job, the job will not run. If you want Hadoop to store the output in a directory, it must be a non-existent directory, it should be a security mechanism of Hadoop to prevent Hadoop from rewriting useful files)


Finally, check the program output and storage location. Enter hadoop fs -ls /output on the terminal, you can see


The result is stored in the part-r-00000 file, enter hadoop fs -cat /output/part-r-00000 on the terminal

Download the latest version of flink from the official website and unzip flink-1.14.0-bin-scala_2.11.tgz

Copy the Hadoop configuration file to the flink conf directory

cd /usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop
cp hdfs-site.xml yarn-site.xml core-site.xml /usr/local/develop/flink-1.14.0/lib

Add environment variables for hadoop and flink

export PATH="/usr/local/develop/flink-1.14.0/bin:$PATH"
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.3.1/libexec
export HADOOP_COMMON_HOME=$HADOOP_HOME
export PATH="$PATH:$HADOOP_HOME/bin"

export HADOOP_CLASSPATH=$(find $HADOOP_HOME -name '*.jar' | xargs echo | tr ' ' ':')

Execute WordCount command

flink run -m yarn-cluster /usr/local/develop/flink-1.14.0/examples/batch/WordCount.jar

Install Anaconda

brew search anaconda
# Without numbers is the latest version
brew install --cask anaconda
# Configure anaconda environment variables
echo'export PATH="/usr/local/anaconda3/bin:$PATH"' >> ~/.zshrc
# View version
conda --version
conda 4.10.1

enviroment

conda env list

conda create -n learn

conda activate learn

conda deactivate

Install TensorFlow

pip install --upgrade pip
pip install tensorflow
created at:12-15-2021
edited at: 12-15-2021: