Document Prediction¶
Pre-requisite¶
Processing document data depends on the optical character recognition (OCR) package tesseract.
For Ubuntu users, you can install Tesseract and its developer tools by simply running:
sudo apt install tesseract-ocr
For macOS users, run:
sudo port install tesseract
or run:
brew install tesseract
For Windows users, installer is available from Tesseract at UB-Mannheim. To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables.
For additional support, please refer to official instructions for tesseract
Quick Start¶
:gutter: 3
AutoMM for Scanned Document Classification
:link: document_classification.html
How to use AutoMM to build a scanned document classifier.
Classifying PDF Documents with AutoMM
:link: pdf_classification.html
How to use AutoMM to build a PDF document classifier.