Share with your friends
30 Voices finalist: Team eleven - microphone

30 Voices finalist: Team eleven - 'Data Lake'

30 Voices finalist: Team eleven - 'Data Lake'

Nothing can stop a customer making a bad decision with investments. But AI- enabled robo-advisers will be able to call attention to the risks they run. Conversely, ‘black box’ decision-making systems in banks and insurers are accused of bias against certain groups, neglecting individual circumstances.

In what other ways might technology help investors articulate and reach their financial outcomes in ways that also fit their risk profile? How far can we personalise interactions while at the same time eliminating bias based on groups they belong to or behaviours they exhibit?

Team ‘Data Lake’ have developed a concept…


As financial services institutions rebuild their trust with consumers, ethical considerations of implementation of AI should be a priority. Team Data Lake have developed a tool to facilitate a clear and complete data set - key in machine learning.
Shamus Rae, Partner and Head of Digital Disruption, KPMG in the UK

The solution:

There are three key areas where bias can occur within machine learning: the input data used to train and develop the model; the algorithm itself; and finally the outcome. Our recommendation focuses on the initial step, the data. Ultimately if you have rubbish going in, no matter what the algorithm is, you will have rubbish out. 

Our solution is to create a data lake contributed to by financial institutions (FI) to provide a large and varied data set. By focusing on the core data used, we can begin to influence existing decision making, as well as setting the industry up for success with future developments in machine learning to ensure nobody is excluded or overcharged for financial products or services due to a bias, such as ethnicity. This creates the opportunity to accelerate change in bias by addressing the root-cause of the issue, alongside longer-term, more incremental change.

Solution Overview:

Each FI would anonymise their data in accordance with regulatory approved guidelines. This data would be anonymised so there are no individual identifiers to ensure privacy. The data would be pushed from the FI to the data lake and cleansed centrally as this mitigates security risks in relation to open APIs and ensures GDPR compliance. A regular upload of this data would occur, either weekly or monthly to ensure it is relevant and reflective of current trends.

A cloud platform would be used to host the data lake as this provides scalability, flexibility, mobility, storage size, ease of use all economically and at speed. The solution would start with a few standard products, for example debit and credit card application details and the outcomes (whether these were approved or rejected). This is to ensure a stable and consistent progress, service, and to provide base lines to measure KPIs against. As the solution scales up and more products are included, the data lake would be hosted on a multi-cloud platform.

The regulator would have access to the data lake, enabling them to create a tangible framework for data use and AI best practice. Due to the data lake, they would be able to regulate both the training data and the outcome permitting a more accurate validation of the algorithm used to determine the customer decision, ensuring no biases was used in the financial outcome, e.g. gender impacting car insurance premiums.

FI’s would then not only have access to the data they contributed but also to all the data provided by other FI organisations. In addition, they would have access to tangible guidance from the regulator as to how to best use data and machine learning ensuring a fair financial outcome for the customer. 

Overall, this enables them to have a much more representative, timeliness, comprehensive and complete data set with regulatory guidance to conduct the analysis on for both machine learning as well as existing business processes.