The House Financial Services Committee created its Task Force on Artificial Intelligence to determine how to use AI in the financial services industry and examine issues surrounding algorithms, digital identities, and combating fraud. On February 12, the task force held its latest hearing, “Equitable Algorithms: Examining Ways to Reduce AI Bias in Financial Services.”
Congressman Bill Foster (D-IL), chairman of the task force, used his opening remarks to set out the hearing’s scope: what does it mean to design ethical algorithms that are transparent and fair, and how do we program fairness and have decisions explained to us?
In asking these questions, Chairman Foster also outlined some practical regulatory difficulties in regulating algorithmic fairness. To begin, many competing definitions of fairness exist, and policymakers need to specify what type of fairness they want. Next, applying analog laws (such as ECOA, the Equal Credit Opportunity Act) to machine learning is easier said than done. Further, AI models present new issues for resource-strapped regulators: because these models continue to train on new data, it is the models themselves that must adjust and change.
Solutions that Chairman Foster noted he had frequently heard were having third parties audit algorithms or outputs or requiring self-testing and benchmarking analyses that companies submit to regulators. These suggestions acknowledge that models are iterative and allow for changes.
Five panelists testified at this hearing. Below is a summary of each panelist’s testimony, followed by an overview of some of the post-testimony questions that committee members raised:
Dr. Philip Thomas, co-director of the Autonomous Learning Lab, College of Information and Computer Sciences, University of Massachusetts Amherst, spoke primarily on two topics: (1) a new type of machine learning algorithm, called Seldonian algorithms, that makes it easier for people who use AI to ensure that the systems created are fair; and (2) the need for a precise definition of fairness. Regarding the latter, Dr. Thomas provided two examples of potential fairness outcomes − that the average GPA predictions be the same for each gender, and that the average error of predictions be the same for each gender − and noted that it was not possible to satisfy both these definitions simultaneously. He stated that any system that produces the same average prediction for each gender necessarily over-predicts more for one gender. Dr. Thomas noted that fairness would need to be defined, with the decision likely being made by regulators, but that machine learning researchers could be useful in helping to determine which definitions can be enforced simultaneously.
Dr. Makada Henry-Nickie, a fellow at the Brookings Institution, addressed market interest in AI. She pointed out that market interest in AI is increasing – 44 percent of Generation Z and 31 percent of millennials have interacted with a chatbot − but that instances of algorithmic discrimination have occurred not only in financial services but in other domains, such as in hiring and image classification. With that said, Dr. Henry-Nickie noted that AI has the potential to improve the financial lives of consumers. She cited two examples: the use of micro-savings apps, and a fintech company’s lending algorithm and use of alternative data, which the Consumer Financial Protection Bureau found had increased loan approval rates by nearly 30 percent for some population segments while lowering the price of credit. Regarding bias, Dr. Henry-Nickie said the question of bias in AI could be difficult to untangle. She noted that machine learning research had established a clear link between biased outcomes and flawed training data, and machine learning bias could shift in response to changes in the underlying data or design processes, thus requiring a flexible system of safeguards to ensure that AI delivers on its potential. Because a solution that mitigates the harms of biased algorithms continues to elude researchers, she concluded, Congress should focus instead on strengthening federal consumer oversight. Dr. Henry-Nickle recommended that the task force encourage the CFPB to develop a consumer-focused model governance framework while monitoring HUD’s proposed rule change amending the disparate impact standard.
Dr. Michael Kearns, a professor in the Department of Computer and Information Science at the University of Pennsylvania, spoke on some of the dangers of using machine learning for algorithmic decision making, while also noting that there is help on the horizon as researchers explicitly seek to modify the classical principles of machine learning to reduce sources of discriminatory behavior. At the outset, Dr. Kearns noted that the potential dangers of machine learning include violations of fairness and privacy, but that, importantly, these harms are the result not of deliberate human malfeasance but as unintended consequences of the principles underlying machine learning. Machine learning, he said, proceeds by fitting a statistical model to training data; minority groups frequently bear the brunt of discrimination because they are less represented in the training data. To reduce such sources of discriminatory behavior, he said, one could add a constraint that the model must not have significantly different false rejection rates across different racial groups. This methodology requires specifying which groups we want to protect and which harms we wish to protect them from. Dr. Kearns noted that there are important caveats. First, “bad” definitions of fairness should be avoided, with one example being forbidding the use of race in a lending decision, hoping that doing so will prevent discrimination. Explicitly avoiding race is not possible because there are so many proxies for race. Unfortunately, he noted, when consumer financial law incorporates fairness considerations, they are of this flawed form that restricts model inputs. The focus should instead be on constraining output behavior. Dr. Kearns said that constraining models to be fair will make them less accurate and that stakeholders must decide what the right accuracy-fairness balance is. Further, different notions of fairness may compete with each other, and he provided the example that by asking for greater fairness in race, we will suffer less fairness by gender.
Bärí A. Williams, an attorney and startup advisor, began her testimony by stating that she sees five main issues with AI in financial services: (1) what data sets are being used – who fact checks the fact-checkers? (2) what hypotheses are being proven using this data – has the narrative that is being written been adequately vetted? (3) how inclusive is the team creating and testing the product – who are you building products with? (4) what conclusions are drawn from the pattern recognition and data that the AI provides – who are you building products for, and who may benefit, or be harmed? and (5) how do we ensure bias neutrality, and what is the benefit of neutrality? Ms. Williams noted that two techniques could also drive fair outcomes: leveraging statistical methods to resample or reweigh data to reduce bias and adding a “fairness regulator” (a mathematical constraint to ensure fairness) to existing algorithms.
Rayid Ghani, a Distinguished Career Professor in the Machine Learning Department and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University, began by noting that AI systems can benefit everyone and result in a better society, but that any AI system that affects people’s lives has to be not just optimized for efficiency but explicitly built to focus on increasing equality. An AI system designed to explicitly optimize for efficiency, he said, has the potential to leave “more difficult or costly to help” people behind, thus increasing inequities; it is critical for government agencies and policymakers to ensure that AI systems are developed responsibly and collaboratively, including incorporating input from all stakeholders. An AI system, Professor Ghani continued, requires us to define precisely what we want to optimize, which mistakes are costlier (financially or socially) than others, and by how much; it therefore forces us to make these ethical and societal values explicit. Professor Ghani concluded by recommending that, rather than creating a federal AI regulatory agency, we should expand the already existing regulatory framework to account for AI-assisted decision-making and that regulations themselves be updated to make them more outcome-focused.
While the post-testimony questions from committee members touched on numerous elements related to the use of AI in fair lending, the committee members particularly focused on a few essential items: what constitutes “fairness” (or by extension what counts as bias) and where regulators should focus.
For example, Chairman Foster asked Dr. Thomas whether the prohibitions against discrimination found within ECOA could be programmed in an AI algorithm. Dr. Thomas replied that yes, this could be done, but the ultimate output would simply reflect a high probability − not a certainty − of fairness. Dr. Kearns responded to that same question stating that it is the outputs that should be of concern, not the inputs. Dr. Henry-Nickie replied that constraining for one definition could lead to disparate impacts for other protected classes. Professor Ghani added that we could always achieve fairness, but, he asked, at what cost? Ms. Williams noted that she agreed with Dr. Kearns regarding the importance of outputs; frequently, she said, people trying to reach a desired output choose certain inputs which, she felt, creates a need to audit human decision making, choosing to solve for desired output while keeping watch on the human decision making process.
Congressman Sean Casten (D-IL) asked whether any panelist disagreed with Dr. Kearns’s statement that outputs are more important than inputs. With no panelist opposing that statement, Congressman Casten followed up by asking whether it is more useful to define bias in terms of the outputs or in terms of how the outputs are used. Dr. Kearns replied that it could not hurt to get outputs right because sometimes the output is the use - for example, a credit lending algorithm decides whether to grant credit. Professor Ghani, commenting that “this might be the most critical question asked today,” said it is not possible to get all outputs right; the AI system is going to make mistakes. He said that the crucial question is what mistakes are more important to guard against, which depends on how the outputs are going to be used. False positives can be worse if the output use is punitive (eg, more people going to jail); false negatives can be worse if the output use is beneficial (eg, fewer people receiving the benefit). Congressman Casten asked Ms. Williams for examples of bias that are not negative. She replied that a bias attempting to correct for past biases might be such an example.
Congressman Trey Hollingsworth (R-IN) asked the panelists what they meant by fairness. Ms. Williams stated that fairness means all similarly situated groups have an equal probability of being assigned favorable outcomes. Dr. Kearns noted that to achieve fairness, it is essential to identify what groups you want to protect and what harm you wish to prevent. Congressman Hollingsworth followed up by stating that AI requires us to make explicit what is right now is implicit; you must optimize for the outcome you’re seeking. Trading off fairness and risk for one group over another is something we are uncomfortable with, because we want fairness in every dimension.
Overall, this hearing highlighted that regulating algorithmic fairness requires careful consideration, not only over how fairness should be defined but what the goals should be.
Please continue to follow this space for future updates on the House Financial Services Committee Task Force on Artificial Intelligence as well as other topics related to the use of artificial intelligence.