We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for sensible conversations and exciting networking opportunities. Register today!
For those who understand its real-world applications and its potential, artificial intelligence is one of the most valuable tools we have today. From disease detection and drug discovery to climate change models, AI provides consistent insights and solutions that are helping us meet the most important challenges of our time.
In financial services, the main problem we face when it comes to financial inclusion is inequality. Although these inequalities are driven by many factors, each case is likely to have common denominator data (or lack thereof). Data is the lifeblood of most organizations, but especially for organizations seeking to implement advanced automation through AI and machine learning. So, it comes down to the financial services organizations and the data science community, to understand how the model can be used to create a more comprehensive financial services landscape.
Lending is an essential financial service today. It generates revenue for banks and loan providers, but it also provides core services for both individuals and businesses. Loans can provide a lifeline in difficult times or be a necessary incentive for new start-ups. But in each case, the loan risk must be assessed.
Today most loan default risks are calculated by automated tools. Increasingly, these are provided by automation algorithms that greatly speed up the decision-making process. The data providing information on these models is extensive, but like any decision-making algorithm, there is a tendency for the majority group to give specific results, which deprives certain individuals and minority groups depending on the model used.
This business model is, of course, unsustainable, which is why loan providers should consider the more subtle factors behind making the “right decision”. With the rise in demand for loans, especially point-of-sale loans such as buy-now-after-back offers new and flexible ways to obtain financing, the industry now has a wealth of competition with traditional lenders, challengers and the like. Fintech strives for all market share. With regulatory and social pressures mounting around fairness and equal outcomes, organizations that prioritize and codify these principles in their business and data science models will become increasingly attractive to consumers.
Building for fairness
When the loan risk model rejects applications, it is possible that many unsuccessful applicants will clearly understand the rationale behind the decision. They may have applied knowing that they might not meet the acceptance criteria, or may have miscalculated their eligibility. But what happens when a member of a minority group or individual is rejected, based on the fact that they are outside the majority group on which the model was trained?
Consumers do not need to be data scientists to understand when injustice has occurred – algorithmic or otherwise. If a small business owner has the means to repay their loan, but is denied it for no apparent reason, they will be duly offended by their misconduct and may seek out a competitor to provide them with the services they need. Furthermore, if customers from the same background are also unfairly rejected, there is probably something wrong with the model. The most common explanation here is that bias has somehow gotten into the model.
Recent history has shown insurance companies using machine learning for insurance premiums that discriminates against the elderly, discriminates in online pricing, and product personalization drives minorities to higher rates. The cost of these vague errors has seriously damaged the reputation, the customer’s trust has been incredibly lost.
This is where it is now necessary to refocus on priorities in the data science and financial services communities, enhancing the same results for all of the above high-performance models working for the majority. We should try to prioritize people in addition to model performance.
Eliminate bias in models
Although there are rules that properly prevent the use of sensitive information for use in decision-making algorithms, the use of biased data can increase injustice. To illustrate how this is possible, here are five examples of how data bias can occur:
- Missing Data – This is where data sets are used that are missing specific areas for specific groups in the population.
- Sample bias – The sample datasets selected to train models do not accurately indicate what population users would like to model, meaning that the models will be largely blind to certain minority groups and individuals.
- Exclusion bias – This is when data is deleted or not included because it is considered unimportant. That is why strong data validation and diverse data science teams are essential.
- Measurement bias – This occurs when the data collected for training does not accurately represent the target population, or when faulty measurement data results in distortion.
- Label Bias – A common problem at the data labeling stage of a project, label bias occurs when similar types of data are labeled inconsistently. Again, this is one more recognition issue.
While no point in this list can be described as malicious bias, it is easy to see how bias can find its way into models if a strong structure that builds competence is not included from the very beginning of the data science project.
Data scientists and machine learning engineers are used to very specific pipelines that have traditionally favored high-performance. Data is at the heart of modeling, so we start every data science project by exploring our data sets and identifying relationships. We go through exploratory data analysis so that we can understand and explore our data. Then it’s time to move on to the pre-processing stage where we scrutinize and clean up our data before we begin the intense process of feature generation, which helps us create more useful descriptions of the data. We then experiment with different models, tune parameters and hyperparameters, validate our models and repeat this cycle until we complete our desired performance metrics. Once this is done, we can produce and use our solutions, which we will maintain in the production environment.
It is a lot of work, but a significant problem that has not been addressed under this traditional model. At no stage in this cadence of activity is model appropriateness assessed, nor is data bias heavily explored. We need to work with domain experts, including legal and governance, to understand what is justified for the problem and to try to reduce the bias from the root of our modeling i.e. data.
Understanding how bias can find its way into models is a good start when it comes to creating a more inclusive financial services environment. By examining ourselves against the above issues and re-evaluating how we approach data science projects, we can try to create models that work for everyone.
Adam Lieberman heads Artificial Intelligence and Machine Learning at Finastra
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including tech people working on data, can share data-related insights and innovations.
If you would like to read about the latest ideas and latest information, best practices and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing to your own article!
Read more from DataDecisionMakers