Much less space is needed to store and especially archive digital data, and it is far easier and cheaper to create and retain additional copies in case disaster strikes. Digital workflows also mean the same document can be put into a processing queue that’s served by any number of suitably qualified or authorised people, even if they are spread across multiple locations. These benefits are non-trivial, but the real magic starts when you transform the information in the document into actual data.

For example, when an invoice arrives in your office it is just a piece of paper until a person looks at it and extracts the data – the supplier’s name and address; invoice number; order number; the description, price, and quantity of each line item, and so on. Similarly, a scanned version of that invoice is just an image file until technology is applied to perform OCR (optical character recognition) to turn text into letters and numbers, and then to attach meaning to those characters. Once that is done, processes that previously involved a lot of tedious human labour can be automated.

This is a here-and-now technology, not a vision of the future. For example, even some small-business accounting systems costing just a few tens of dollars a month are able to take an image of a bill and automatically create the corresponding accounting transaction. Once the paper documents have been digitised and loaded into a document management system, the possibilities, as they say, are endless.

At the simplest level, the ability to search documents by their content rather than via predefined indexes can be a big timesaver. That’s easy for in-house documents, but OCR makes it just as easy for documents that arrive on paper or as image files (e.g. PDFs that only contain the bitmapped image, not the corresponding text data). Importantly, because you’re dealing with the documents’ content, you’re not locked into a particular filing scheme – nobody has to decide upfront how to categorise each one of them.

It’s not difficult to imagine how useful this could be to a law firm, for example. All documents (including emails, PDFs and scanned items) from a particular time period and mentioning a particular person or company can be retrieved in seconds.

Furthermore, storing all documents in one repository makes them equally accessible to everyone in your organisation – providing they have the appropriate permissions – regardless of their location. This is especially important for distributed businesses, or those whose employees routinely work off-site (e.g. at home or at clients’ premises). In addition, more than one person can use a digitised document at a time, so nobody gets held up simply because a colleague has a particular piece of paper on their desk.

The next and perhaps the most powerful step is to assign meaning to the text and data on any given document. This is relatively easily done when dealing with your own forms, but the more advanced software works in a more generalised way, often using machine learning of metadata extracted from the document. The idea, in broad terms, is to be able to determine that a particular document is, say, a quotation and that a quotation comprises various elements including the potential supplier, the items included in the quotation, and their price. This can be seen as turning unstructured data into structured data.

Structured data is comparatively easy for software to handle, so even if 1000 vendors lodged quotations, it would be a snap to identify the five with the lowest total cost that should go through for human evaluation. That’s certainly much less laborious than physically sorting the quotations, or transcribing their details into a spreadsheet and then sorting on the ‘Total’ column.

One of the hurdles historically for organisations to implement document processing and management software, are the costs and complexities involved but with the boom of cloud computing and as a service pricing structures, it is now more affordable and accessible for businesses to begin their journey to digitisation. The first step is to ask experts, so how can we help?

