• Mark Meuldijk, Partner |
  • Francesco Guidi, Expert |

Data quality has always been important but is increasingly critical given the continuous expansion and evolution of data and analytics. We share two approaches to improving data quality that will strengthen a data-driven organization's reporting and decision-making.

Every day seems to deliver new solutions and promises in data analytics, data science and artificial intelligence. Before investing in that shiny new proposition or exciting emerging technology, however, ask yourself one question: is my data quality fit for purpose? We all know the phrase 'garbage in, garbage out' and that robust data is critical to achieving the right outcomes, but how should you ensure it is of an appropriately high quality?

Setting reliable values

First, let us define what we mean by data quality. In simple terms, it is how reliable are the values and content in the data fields used by an information system. We know from experience that it is not always easy to define what a reliable value looks like, and that a wrong value can have a severe impact on other aspects of a business's activities or processes. For instance, a lack of reliable values can produce incorrect information on product weights and storage needs, or the profitability of an investment.

Defining quality rules

The building blocks of good data quality are the quality rules you define that are specific to your organization. This involves defining a precise condition for a certain field and the conditions to generate a tabular outcome, which serves as the basis for your reports and dashboards. Keep in mind that quality rules are classified in seven dimensions that are often used to tailor dedicated reporting: relevance, accuracy, credibility, timeliness, accessibility, interpretability and coherence.

Not a one-off activity

We would also note that getting your approach to data quality right is not a one-off activity, rather it should be part of your daily routines. It should be applied by consistently repeating automated analyses at a suitable frequency based on the quality of the target data quality and subject of the rule. 

Taking two approaches to improve data quality

We suggest there are two interconnected approaches you can adopt. There is no single right answer as to the balance between the two, as this will depend on your organization, the pain points that trigger your need to look at data quality, and how severe are the problems the lack of data quality is causing. 

Approach one: monitoring

This approach focuses on consulting and checking the charts or dashboards that are generated by applying the automated rules, providing the right dimensions and splits tailored to the report's audience. An operational audience will need a granular and specific information, while a grouped, consolidated and less specific view tends to be more acceptable for senior management.

Approach two: error resolution

This approach focuses on resolving issues detected by your audience after having defined a proper correction process. The correction process should 1) provide the right level of detail on each record that has failed the rule, 2) state what is wrong, and 3) possibly provide an input on what should be corrected. At a more advanced stage, there may also be KPIs on the (average) resolution time by a given department and/or a geographical area and/or certain kind of object etc.

Combining the two approaches is best in class

A focus on only the first approach will provide excellent reports, but will leave the same errors in the systems, causing the same problems in your daily work. A focus on only the second approach will not give any visibility of the effort spent and will give the false impression that everything is going well automatically.

Building a cycle with increased numbers of rules

We therefore recommend adopting both approaches at the same time, emphasizing one or the other depending on your specific needs. Progressing both will support continuous improvement, tracking the efforts by operational stakeholders and encouraging actions such as increasing the number of rules, extending the areas over which the rules run, or triggering corrective actions. This will produce a cycle where the number of rules increases, causing output quality to drop temporarily while the rules are being implemented, then the correction will kick in and uplift the data quality, allowing you to introduce new rules, and so on.

Viewing data quality an ongoing effort

Establishing a robust process of constantly checking and encouraging data quality is recommended, ideally with a C-level sponsor but certainly with dedicated resources who are responsible for setting up the process and its technical execution, in addition, selected stakeholders should be responsible for the good or poor quality of content and results that flow into reports. These stakeholders should be determined through strong data governance managed by a designated data steward.

We operate in a world in which data and analytics are increasingly central and companies aim to become data driven. This is possible only if the quality of your data is based on a solid foundation.