iDocuments Intelligent Capture Best Practice
First published on: 20 March 2024
Introduction
Documents will be processed more efficiently/effectively if they conform to certain qualities.
The following are best practices for documents to be passed via the Intelligent Capture (also known as Optical Character Recognition or OCR) process:
- PDF type - These should ideally be vector PDFs (wherein text can be highlighted when viewing the PDF), as they're created electronically and easier for the system to read.
- Structured documents - These documents are easier to capture successfully because they have clear data in columns with spacing between values.
- Document only - These PDFs should contain only the information to be captured (i.e., the invoice or order without extraneous documentation such as packing lists and timesheets).
- Important pages only - Use the fewest number of pages possible for processing the documents.
- You can add attachments/associated documents later and separately via a non-OCR mailbox or the website.
- One invoice a week with fewer lines is better than one invoice a month with lots of lines.
- Steady stream of documents - Suppliers and customers should send documents as they're processed.
- Users shouldn't collate these and forward them to the mailbox all at once to avoid peaks when processing.
-
Documents for processing only - Any PDF deleted after OCR should have a follow-up with the sender or an exclusion added so it doesn't continue to occur.
-
E.g., if a statement is sent to the invoice mailbox:
- Inform the sender not to send these to the OCR mailbox.
- Set the mailbox to automatically mark them as read so they're not processed.
- Add an exclusion in the iDocuments mailbox settings (e.g., Subject contains 'Statement' to be ignored).
-
Document Filtering
The expectation is that PDFs from any company should be processed in under one hour (although that's dependent on the OCR not being overloaded at peak times).
The public cloud environment has document filtering for the OCR that allows certain document types to be skipped. This could restrict the usage of other clients, as the system will wait for the difficult files to be processed.
Filter Settings
Mailbox extraction - There's a limit of 25 emails per cycle to reduce the workload for the email extraction.
Large file sizes - Files over 3 MB for purchase invoices and sales orders will be skipped for OCR and go immediately to Rapid Entry/FastTrack.
These typically would be non-vector PDFs or PDFs that contain images that aren't part of the document to be captured via OCR.
Large number of pages - 40 or more pages for purchase invoices and sales orders will be skipped for OCR and go immediately to Rapid Entry/FastTrack.
These typically would be extraneous pages that are not to be captured.
Medium number of pages - 6-40 pages for PI/SO/GRN will be set to process the first 4 pages and the last page (so a maximum of 5).
These typically contain extraneous pages that are not to be captured, with most information on the first few pages and totals on the last.
Memory usage - When a document takes more than the allotted setting for memory, the relevant OCR service will be restarted, and that document will go to the back of the queue.
These are typically documents that block the OCR processing for others and so are moved aside.
Frequently Asked Questions
Do my documents need to be reprocessed if the filter process has skipped the OCR?
No - Any documents that are skipped and go directly to Rapid Invoice/FastTrack can be processed as normal from that point onwards with manual entry of any details.
Can I resend my document to the OCR mailbox or manually upload it?
Yes - But if OCR is preferred, you would need to alter the document to conform to the requirements above (e.g., You could reduce file sizes by altering the PDF to contain only the invoice page(s) versus scanned copies with extraneous attachments. Otherwise, the filter process will skip that document again).
Are the filter settings applied per company?
No - These are best practice filter settings based on average processing and are applied across all the public cloud clients, so fair processing limits can be applied to all documents.
Last modified: 06/03/2025/12:12 pm |