Utilising IBM Datacap for Document Data Capture

Written By Luke Minors, Edited By Lewis Fogden
Mon 19 December 2016, in category Business intelligence

Document Capture, IBM


Having recently added IBM’s document capture system, IBM Datacap, to our client offerings, this post will briefly summarise its features and capabilities. Datacap enables the automation and streamlining of data extraction and organisation from a vast array of document types and formats, ranging from physical paper forms and invoices to PDF files and Microsoft Office documents.

Adding this functionality into an existing data processing pipeline can greatly ease the task of amalgamating data from such sources; an offering that can benefit clients by reducing both labour and paper costs, improving process efficiency, and therefore increasing the speed of decision making or customer service.

Datacap supports input from multiple channels including scanners, mobile devices using the iOS or Android applications, fax, and email, and makes use of natural language processing, text analytics, and machine learning technologies in order to identify and register the contents of documents.


Barcode recognition, Optical Character Recognition (OCR), Intelligent Character Recognition (ICR) and Optical Mark Recognition (OMR) are all put to use in order to tackle the various difficulties that document capture presents. These include issues such as differing handwriting styles, signature or check box detection, and even page and section organisation.

The separate elements of the capture process include image enhancement, page identification, data location, and validation and verification. The vital processes of verification and validation are handled by an isolated service which allows for connection to other data streams, giving universal ruling across the board. Any uncertainties are flagged by Datacap so that the user can manually verify them to ensure accuracy.

As Datacap reads documents it refines and saves the formats in the form of Fingerprints so that it is able to easily repeat the identification and capture of similar or equivalent inputs. ‘Accounts Payable’ and ‘Medical Claims’ are examples of standard templates that can be used to automate data entry from invoices and business documents or medical claim forms.

For more information, visit the official Datacap website here.