Share with your friends
tax documents

Key challenges of Intelligent Document Processing Key challenges of Intelligent Document Processing

What is Intelligent Document Processing (IDP)?

At KPMG Belgium we understand ‘Intelligent Document Processing’ as an umbrella term for various activities related to the automated extraction, interpretation, classification and handling of digitized documents. The Intelligent part is that instead of working with the usually- well-structured meta-data about documents, IDP goes further and works extensively with the contents of these documents. The difficulty, and need for ‘intelligence’, comes from the typically unstructured nature of these documents and the increased need for some cognitive capabilities to understand the contents. Advanced activities that extract semantic meaning from documents, rather than just the content, require advanced tooling. Conceptually, we consider IDP a subset of unstructured data processing, with other classifications being audio, images and video, to name a few. 

Even though it might sound like a challenging exercise, there are various reasons to start this journey, if you haven’t already. To start off, the rise of certain core technologies and their recent improvements (cloud, OCR, NLP) puts us at an unprecedented moment in time where one can actually automatically process their piles of semi-structured documents. While most of these technologies have already existed for a while, the difference is that they originate from being highly specialized applications, with dedicated vendors focusing on specific workflows or document types, such as invoice processing. If your use-case didn’t fall within known boundaries, alternative options were limited or expensive. Today, a lot of these technologies are commoditized in the form of cloud services, or even open sourced and maintained by a large community with diverse needs. The specialized efforts still exist, and they are more accurate than ever, but the automation and analysis of the long tail of non-standard documents is being unlocked by these recent changes in technological capabilities.

Despite the digital times, companies still have tons of physical documents, often containing a wealth of information. Even though immense efforts are being put into the development of API’s and automated integrations, many processes still depend on (physical) document exchanges. Real world examples include KYC verifications, invoice processing, contracts, order forms and other insurance paperwork. In an ideal world, everything would be neatly integrated and there would be no issue processing the structured data. However, this form of integration typically requires a solid mutual understanding between two communicating ends, or maybe a central authority which imposes standards. As you might know though, there can be more than two parties involved and mutual understanding is non-trivial to achieve as the number of parties involved increases. Central authorities don’t magically fix the issue either. Furthermore, a lot of legacy systems are not prepared for these kinds of integrations and generate their share of documents. And then there might also be a number of regulatory requirements which dictate the use of physical documents.

So we can’t get around it. We are stuck receiving and sending these ancient documents for the foreseeable future. But it’s not the end of the world. On the contrary, we humans, with all our cognitive capabilities, usually have no issues with parsing those documents, so the world keeps on turning. What is a weakness though, is the limited cognitive bandwidth we have, which restricts us in parsing high volumes documents an hour, all day long.

The challenges of Intelligent Document Processing

So we arrive at IDP, the “silver bullet” to solve all of our issues. But is it? As you might have guessed from the title of this article, there are some obstacles here as well. Based on our experiences, these include:

Not defining a clear strategy and actionable business goal

When starting off with IDP, it might sound tempting to immediately start playing around with the technology and build a proof-of-concept / pilot. The risk with this is that after a year, you could still be tweaking your models to get to that >95% accuracy, or you end up with a lot of extracted data with no actionable purpose, which gets out-of-date and useless sooner than you’d think. Based on our experience, we advise to start off with a solid business case that builds on the technology. It can still be a one-off analysis, maybe a new step in an already automated process or a new automated process altogether. This clear end-goal will help in defending the investment, and having this definition of success also will ease the process of developing a roadmap and building a solution.

Human-in-the-loop as an afterthought

When automating processes with the help of IDP, one must keep in mind that these technologies are still evolving and the human cognitive capabilities remain largely unparalleled by any existing technology. Completely offloading the document processing from the usual human into the automated system might not give the expected results. On the contrary, bootstrapping an architecture that keeps the Human-in-the-Loop (HITL) at critical steps has a great impact on the overall quality of the process. It effectively designs your processing pipeline for adaptability, enables iterative improvements to the flow and introduces detailed checking and monitoring of the accuracy and quality of the system. Typical HITL interfaces are review/correction GUIs and dashboards. Low-code application platforms or workflow systems are nowadays considered as good candidate technologies for this step.

Misjudging the technological possibilities

It is clear that with the wrong expectations of IDP technology, results can suffer. Underestimating the technology, and not being ambitious enough, will eventually hinder the potential innovation and conclude in unsurprising results. Equally, overestimating the technology bears the risk of establishing unrealistic goals which cannot be achieved. Asking advice from experienced partners, and exploring the technology to learn its boundaries can help in this case. As difficult as this balancing act might be (when are you ambitious enough?), this learning curve is something you can adopt in your IDP roadmap as well. It makes sense to start with small steps and grow into (and with) the technology. There again, it’s important to pick solid business cases which support this road to Intelligent Document Processing.

What can we do?

Apart from these highlights, there are many more reasons to investigate opportunities for Intelligent Document Processing, just as there are more challenges to discuss. At the KPMG Belgium Lighthouse, we cover many different aspects of this journey; from exploring and implementing the solution with Intelligent Automation and Advanced Analytics, to achieving Business Intelligence and gaining insights out of unstructured data. In addition, the governance and broader perspective on the IDP operating model are more organizational aspects where we can also help you to maximize the value of the insights and information gained.

With all the perspectives mentioned above, we support our clients from ‘A to Z’ with extracting value from their unstructured data and documents. The Global KPMG platform KPMG Ignite, which specializes in unstructured data processing, and our alliances with software vendors like Appian and BluePrism, help us and our clients in accelerating their path to Intelligent Document Processing and achieving successful outcomes.

Are you eager to start exploring IDP, or are you ready to scale up your current ventures? Do not hesitate to contact us.


Connect with us