Businessman using tablet

Trusted analytics and the trust gap

  • Peter Van den Spiegel, Author |

A survey performed by KPMG in 2016 reported that only 34% of organizations are very confident in their operational Data and Analytics (D&A) and around the same number trust their D&A in driving customer insights. The survey also reported that only 10% of organizations are confident in their data quality, tools and methodologies. People cannot be confident in their data and analytics if they do not understand them, do not trust the researchers doing the analytics or even the techniques used.

As introduced in the first blog post of this series, unexpected behaviors or results from AI initiatives can lead to mistrust in data and analytics. AI systems do not always act in the way for which they were programmed (see for instance the example of Facebook in 2017 who obtained accidently two chatbots communicating with their own developed language). Even if business processes are conducted by AI systems, the organization must be able to react to and manage situations if the AI system breaks down or shows unplanned behavior. Furthermore, there may be unanticipated consequences if an AI system learns certain decision-making functions by having access to and learning from data not considered by the AI designer. For instance, the Microsoft chatbot launched on Twitter in 2016 had unexpectedly learned racist vocabulary from previous offensive tweets posted by other Twitter users. We need to be conscious of the data provided to the model.

 When looking at prescriptive analytics, and more specifically, optimization problems (e.g. optimally assigning patients to hospital rooms, packages to delivery vehicles, students to schools), a correct formulation of your objective is key. Take the example of assigning students to schools, a challenging optimization problem, which has been the subject of more than 50 years of scientific research. It is clear that one ideal algorithm does not exist and design decisions have to be taken, such as: which objective(s) do we take into account and what weight is given to any of them? There is a wide range of (potentially conflicting) objectives. Do we want to maximize the “benefit” for society as a whole from a regulatory/socio-demographic point of view (e.g. benchmarks with respect to the percentage of indicatorleerlingen[i], as is the situation in certain parts of Flanders and Belgium) or the “benefit” for parents by minimizing their driving distances, for instance? Formulating the objectives differently, or even slightly adjusting the weight of an objective can significantly influence the outcome of the algorithm.

In the field of predictive analytics, decisions on the evaluation of the performance of your (classification) model are crucial. Depending on the context, you either want to minimize the false negatives or the false positives by making a trade-off. Particularly in the medical field, we generally minimize the false negative rate at the expense of the false positive rate. Indeed, it is considered safer not to miss a diseased patient and, thus, potentially diagnose patients with the disease even though the patient is actually healthy. When a patient is diagnosed as positive, there is still a probability (the so called “false positive rate”) that the model provided an incorrect evaluation. Conversely, alcohol tests are generally designed to minimize false positives. People can always ask for a second test if the test has indicated that they are drunk although they are not. However, following this rule, the number of truly drunk people on the road will be minimized.

In the coming years, the key differentiators between companies will no longer be their model performance, but the trust in analytics they have ultimately established amongst their employees, customers and other stakeholders. Organizations will not be able to leverage automated decisions if their employees do not trust the tools that support those decisions. Customers will not agree to provide their data if they are uncertain that the algorithms are operating in their best interest or if they do not trust the purpose of data collection.

Authors: Peter Van den Spiegel and Annelies De Corte