Document Prediction¶

Pre-requisite¶

Processing document data depends on the optical character recognition (OCR) package tesseract.

For Ubuntu users, you can install Tesseract and its developer tools by simply running:

sudo apt install tesseract-ocr

For macOS users, run:

sudo port install tesseract

or run:

brew install tesseract

For Windows users, installer is available from Tesseract at UB-Mannheim. To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables.

For additional support, please refer to official instructions for tesseract

Quick Start¶

:gutter: 3

AutoMM for Scanned Document Classification

:link: document_classification.html

How to use AutoMM to build a scanned document classifier.

Classifying PDF Documents with AutoMM

:link: pdf_classification.html

How to use AutoMM to build a PDF document classifier.