During the research process of the OCR project, an open source tool, gosseract, was found, and the recognition effect was good;
Prepare the environment step by step, first install tesseract in the mac environment (gosseract dependency):
brew install tesseract
The first installation was smooth and successful.
As business needs increase, language training is required, so you need to install training tools, choose uninstall and reinstall:
$ brew install --with-training-tools tesseract Usage: brew install [options] formula|cask [...] Install a formula or cask. Additional options specific to a formula may be appended to the command. ... Error: invalid option: --with-training-tools
Prompt that this installation method is deprecated. So choose the compile and install method:
# Packages which are always needed. brew install automake autoconf libtool brew install pkgconfig brew install icu4c brew install leptonica # Packages required for training tools. brew install pango # Optional packages for extra features. brew install libarchive # Optional package for builds using g++. brew install gcc
Download and unzip
cd tesseract-5.1.0 ./autogen.sh mkdir build cd build # Optionally add CXX=g++-8 to the configure command if you really want to use a different compiler. ../configure PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig:/usr/local/opt/libarchive/lib/pkgconfig:/usr/local/opt/libffi/lib/pkgconfig make -j # Optionally install Tesseract. sudo make install # Optionally build and install training tools. make training sudo make training-install
After installation, compile the project and report an error:
2022/03/31 15:32:10 ERROR ▶ 0004 Failed to build the application: # ocr /usr/local/go/pkg/tool/darwin_amd64/link: running clang++ failed: exit status 1 Undefined symbols for architecture x86_64: "tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool)", referenced from: Init(void*, char*, char*) in 000023.o _Init in 000023.o _GetDataPath in 000023.o "tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)", referenced from: _GetBoundingBoxesVerbose in 000023.o _GetBoundingBoxes in 000023.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation)
I only observed the content of the error report, but did not find that it was a version problem. After many uninstalls and reinstallations, it was found that the version was too high, so after reinstalling version 4.1.3, the service compiled normally.
The uninstall method can manually delete the installation file, or use the command:
brew uninstall tesseract
However, there will be various problems in the subsequent installation of
tesseract, as follows:
$ brew install tesseract==4.1.3 Warning: No available formula with the name "tesseract==4.1.3". Did you mean tesseract? ==> Searching for similarly named formulae... This similarly named formula was found: tesseract To install it, run: brew install tesseract ==> Searching for a previously deleted formula (in the last month)... Error: No previously deleted formula found. ==> Searching taps on GitHub... Error: No formulae found in taps. liumeng@liumengdeMacBook-Pro Pictures % brew install tesseract ==> Downloading https://ghcr.io/v2/homebrew/core/tesseract/manifests/4.1.3 Already downloaded: /Users/liumeng/Library/Caches/Homebrew/downloads/9597a8ae2cb676cd25c79cf252f4eb8759b9cf3d472c57f7c764e086c5f8f6e2--tesseract-4.1.3.bottle_manifest.json ==> Downloading https://ghcr.io/v2/homebrew/core/tesseract/blobs/sha256:1b67091dce98b42c6c561981a01738fe01c19ac69a1dc4de6d8e43fe885177f0 Already downloaded: /Users/liumeng/Library/Caches/Homebrew/downloads/cf8d3fbb1aea1cc629c6873a25b11d732c90ff23bfa4c44ba23d0ce5c24e907a--tesseract--4.1.3.big_sur.bottle.tar.gz ==> Pouring tesseract--4.1.3.big_sur.bottle.tar.gz Error: The `brew link` step did not complete successfully The formula built, but is not symlinked into /usr/local Could not symlink include/tesseract/apitypes.h /usr/local/include/tesseract is not writable. You can try again using: brew link tesseract ==> Caveats This formula contains only the "eng", "osd", and "snum" language data files. If you need any other supported languages, run `brew install tesseract-lang`. ==> Summary 🍺 /usr/local/Cellar/tesseract/4.1.3: 65 files, 29.7MB
To view the error message, you need to do the following:
$ brew link tesseract Linking /usr/local/Cellar/tesseract/4.1.3... Error: Could not symlink include/tesseract/apitypes.h /usr/local/include/tesseract is not writable.
At this point you need to delete some files first:
$ sudo rm -rf /usr/local/include/tesseract
Proceed as follows:
$ brew link tesseract Linking /usr/local/Cellar/tesseract/4.1.3... Error: Could not symlink share/tessdata/configs/alto Target /usr/local/share/tessdata/configs/alto already exists. You may want to remove it: rm '/usr/local/share/tessdata/configs/alto' To force the link and overwrite all conflicting files: brew link --overwrite tesseract To list all files that would be deleted: brew link --overwrite --dry-run tesseract
Three methods of operation are given.
Do as follows:
$ sudo rm -rf /usr/local/share/tessdata/configs/alto $ brew link --overwrite --dry-run tesseract Would remove: /usr/local/share/tessdata/configs/ambigs.train ... /usr/local/lib/libtesseract.dylib -> /usr/local/lib/libtesseract.5.dylib /usr/local/lib/pkgconfig/tesseract.pc liumeng@liumengdeMacBook-Pro Pictures % tesseract -v zsh: command not found: tesseract liumeng@liumengdeMacBook-Pro Pictures % brew install tesseract Updating Homebrew... ==> Auto-updated Homebrew! Updated 1 tap (homebrew/cask). ==> Updated Casks Updated 7 casks. Warning: tesseract 4.1.3 is already installed, it's just not linked. To link this version, run: brew link tesseract $ brew link --overwrite tesseract Linking /usr/local/Cellar/tesseract/4.1.3... Error: Could not symlink share/tessdata/configs/alto /usr/local/share/tessdata/configs is not writable.
Continue to delete:
$ sudo rm -rf /usr/local/share/tessdata/configs $ brew link --overwrite tesseract Linking /usr/local/Cellar/tesseract/4.1.3... Error: Could not symlink share/tessdata/tessconfigs/batch /usr/local/share/tessdata/tessconfigs is not writable. $ sudo rm -rf /usr/local/share/tessdata/tessconfigs $ brew link --overwrite tesseract Linking /usr/local/Cellar/tesseract/4.1.3... 12 symlinks created.
$ tesseract -v tesseract 4.1.3 leptonica-1.82.0 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.1 : libopenjp2 2.4.0 Found AVX2 Found AVX Found FMA Found SSE