Document Prediction¶
Pre-requisite¶
Processing document data depends on the optical character recognition (OCR) package tesseract.
For Ubuntu users, you can install Tesseract and its developer tools by simply running:
sudo apt install tesseract-ocr
For macOS users, run:
sudo port install tesseract
or run:
brew install tesseract
For Windows users, installer is available from Tesseract at UB-Mannheim. To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables.
For additional support, please refer to official instructions for tesseract
Quick Start¶
AutoMM for Scanned Document Classification
How to use AutoMM to build a scanned document classifier.
Classifying PDF Documents with AutoMM
How to use AutoMM to build a PDF document classifier.