Unequal Roots, Unequal Outcomes: The Deepening Bias in Modern AI and How to Uproot It

Abstract: Artificial intelligence (AI) is a transformative force in industries ranging from healthcare to finance, but its effectiveness and fairness depend heavily on the quality of the data it is trained on. Generative AI (GenAI) models, which learn from vast datasets, can unintentionally perpetuate biases present in data, reinforcing historical inequalities. This article explores the risks of biased data and the importance of ethical AI practices, particularly in sectors where biased decision-making can have severe consequences. It also provides practical suggestions, inspired by Bender, Gebru, et al.'s work on AI ethics, offering best practices for researchers, investors, and AI users to develop fair, transparent, and accountable AI systems.

Introduction: Data Integrity, Time, and Fair AI

“…. most of what goes wrong in systems goes wrong because of biased, late, or missing information.”

- Harvard scientist and system researcher Dana Meadows (1941-2001)

In today’s world, artificial intelligence (AI) is not just a promising tool—it’s a transformative force shaping everything from business operations to healthcare diagnostics, hiring practices, and financial services. AI systems, especially those powered by generative AI (GenAI) and advanced machine learning techniques, promise efficiency, innovation, and unprecedented predictive capabilities. Yet, these systems are only as objective, fair, and accurate as the data fueling them. The passage of time adds another layer of complexity to data integrity. The data AI models are trained on reflects historical realities, both their progress and their failings. When data is incomplete, biased, or non-representative, the algorithms risk perpetuating the very inequalities they were designed to eliminate, with potentially far-reaching consequences.

While traditional pre-AI predictive modeling techniques like decision trees and regression focus on minimizing error between historical data and predicted outcomes, modern AI—particularly neural networks—takes this a step further through more sophisticated approaches like backpropagation. These models can handle vast amounts of data and adjust themselves to learn more effectively. However, without the integrity and diversity of high-quality data, even the most powerful GenAI models can fall short, reinforcing biases instead of correcting them.

Understanding the relationship between data, algorithms, and bias is crucial to addressing the unintended consequences of AI-driven decision-making in today’s society. As we continue to rely on AI in areas as critical as lending, healthcare, hiring, and law enforcement, the stakes are higher than ever. Without recognizing the importance of the data used to train these algorithms, and the effects of time on data, we risk amplifying historical inequities and undermining the transformative potential of AI.

To assist researchers, investors, and AI users in navigating these challenges, this article provides a summary of practical suggestions later in the text. These guidelines, inspired by Bender, Gebru, et al. in their paper On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, highlight best practices for fair AI development, including careful data curation, addressing biases, and fostering transparent, accountable AI systems. We encourage readers to explore these key points to gain a deeper understanding of how to implement ethical AI practices and maximize the potential of AI while minimizing harm.

About the author: Jeff Hulett leads Personal Finance Reimagined, a decision-making and financial education platform. He teaches personal finance at James Madison University and provides personal finance seminars. Check out his book -- Making Choices, Making Money: Your Guide to Making Confident Financial Decisions.

Jeff is a career banker, data scientist, behavioral economist, and choice architect. Jeff has held banking and consulting leadership roles at Wells Fargo, Citibank, KPMG, and IBM.

Data: A Reflection of Our Past

At its core, data is not an abstract set of numbers and facts; it represents the history and reality of the world in which it was collected. Data is a mirror reflecting both the successes and the failings of the past. This becomes especially significant in areas where social, economic, and political inequalities have been entrenched for centuries. In many ways, data tells a story of historical patterns, including both progress and systemic biases.

For instance, credit scoring, a field heavily reliant on data, demonstrates this concept well. As I noted in my article Resolving Lending Bias, much of the data used in credit scoring reflects the racial and economic disparities that have long existed in the U.S. financial system. Even though the algorithms used in credit scoring are theoretically “color-blind,” the data they rely on is not. Historical patterns of unequal access to credit, shaped by systemic racism, are embedded in the credit data itself. As a result, even if an algorithm does not directly consider race, it can still perpetuate racial disparities in lending decisions because the data it processes carries the legacy of those inequalities.

In an ironic twist, by law, data about protected classes like race and gender is excluded from credit data. The intention was commendable—removing this information to prevent bankers from considering protected class indicators. However, this well-meaning approach led to an unintended consequence: removing these factors from the data has made it more challenging to identify and understand the impact of bias embedded in the system.

This issue is not unique to lending. It extends to many other domains, including hiring algorithms, predictive policing, and healthcare diagnostics. Wherever data from a biased system is used, the risk of reproducing and amplifying those biases is significant. As we move into an era of more advanced AI systems like generative AI, this risk becomes even greater. The time distance between the input data and the predicted outcome can cause instability or noise in predictive accuracy. These systems rely on vast amounts of historical data to train their models, and if data reflects a biased past or contains noise, the models themselves will inherit those imperfections. For more examples of long-term, life-impacting decisions vulnerable to biases, follow this link to see how AI integration across various sectors can amplify these risks. Understanding and addressing these issues is crucial to promoting fairer outcomes.

Algorithms: A Reflection of Data

Algorithms, whether simple or complex, are often seen as neutral tools. In reality, they are only as good as the data they are trained on. If the data is flawed, incomplete, or biased, the algorithm’s output will be equally flawed. This is particularly true for machine learning algorithms, where the goal is to find patterns in the data and make predictions based on those patterns. The problem is not the algorithm itself but the data it uses as its foundation. AI struggles to distinguish between differences in human behavior that are normal and socially acceptable versus those rooted in past injustices, which are socially unacceptable. As a result, biased data can lead to unjust outcomes that reinforce historical inequalities.

In traditional models like linear regression, the objective is to minimize the error sum of squares (ESS)—essentially, to draw the line best fitting the data by reducing the difference between actual and predicted values. Neural networks, which are commonly used in modern AI systems like generative AI, take this concept further. They use a process called backpropagation, which iteratively adjusts the model’s weights to minimize the prediction error. This technique, powered by the chain rule of calculus, allows neural networks to handle massive datasets and fine-tune their predictions more effectively than traditional models.

Yet, no matter how advanced the algorithm or how sophisticated the learning process, the model is only as good as the data it’s fed. If the data is biased or incomplete, the resulting predictions will be biased or incomplete. For example, if an AI system trained on historical lending data shows certain racial groups have had less access to credit, the algorithm may deny loans to individuals from these communities, not due to explicit racial bias, but because the data reflects a discriminatory past.

Moreover, neural networks and generative AI can exacerbate this issue by learning more intricate patterns from large datasets. The complexity of these models means biases embedded in the data can become even more difficult to detect and address. In effect, AI models, especially those using backpropagation and other sophisticated techniques, can institutionalize bias, embedding discrimination into seemingly neutral decisions.

The Role of Data in Generative AI: Heightened Risks and Responsibilities

As AI systems, particularly generative AI, become more advanced, their reliance on vast, diverse datasets increases. With this power comes greater responsibility. Generative AI models, unlike traditional models like regression or decision trees, learn from patterns in data to generate outputs, whether text, images, or predictions. However, when trained on biased data, these models risk perpetuating and even amplifying existing biases.

For instance, a generative AI model trained on biased healthcare data could produce diagnostic recommendations disproportionately harming marginalized groups. Similarly, a hiring algorithm based on resumes from predominantly white male candidates may continue to favor those candidates, reinforcing racial and gender disparities in hiring practices.

Another critical concern is the time-based instability in predictive modeling. While time may not significantly affect predictions in contexts like short-term consumer behavior, it can introduce serious risks in long-term decision-making, such as loan approvals or criminal sentencing. Over time, societal changes and environmental factors can distort predictions, adding noise and leading to biased outcomes with long-lasting consequences.

To some degree, AI-driven platforms like Amazon and Netflix have already picked off the low-hanging fruit by focusing on low-stakes, fast-feedback consumer goods. As generative AI expands into areas with higher stakes, such as lending, medical care, and education, the risks increase significantly. Longer feedback timeframes in these fields amplify the potential for bias to have a harmful impact.

Moreover, generative AI models often operate with minimal human intervention, which can make it difficult to detect or address these biases. As these systems become increasingly autonomous, the risks of biased decision-making grow, especially in high-stakes sectors like healthcare, finance, and law enforcement, where both accuracy and fairness are crucial and the impacts of bias can be profound.

In my "Unequal Roots, Unequal Outcomes: Examples of Long-Term Decisions Shaped by AI" article, I present more examples of high severity, long timeframe decisions highly impacted by bias.

In my Resolving Lending Bias article, I provide an example of how biased data can affect algorithmic decision-making in finance. I propose two possible solutions to address bias in credit scoring: one involves creating a “protected class FICO score” specifically accounting for the historical disadvantages faced by marginalized groups. Another recommendation involves incorporating non-traditional data, such as utility and rent payments, to create a more complete and accurate picture of creditworthiness. These solutions highlight the importance of critically examining the data used in AI models and taking proactive steps to mitigate bias.

For more insights into the ethical implications of large language models and a detailed discussion on mitigating biases in AI systems, I encourage readers to explore the appendix. It contains a summary of Bender, Gebru, et al.'s On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?—a key paper that outlines the risks of scaling AI models and offers practical recommendations for responsible AI development.

Breaking the Cycle: Addressing Bias in Data for GenAI

To break the cycle of bias, we must first recognize the inextricable link between data, algorithms, and history. If the data used to train AI models reflects a biased past, the resulting predictions and decisions will also be biased. This is true for all machine learning models, but it is particularly important for generative AI systems, which learn from vast amounts of historical data and generate outputs influencing real-world decisions.

Addressing bias in AI systems requires a multi-faceted approach. First, we must ensure the data used to train these models is as diverse and representative as possible. This involves identifying and mitigating biases in the data itself, as well as incorporating data from historically underrepresented groups. In credit scoring, for instance, my proposal to include non-traditional data sources, such as rent and utility payments, could provide a more comprehensive and fair assessment of creditworthiness for individuals who have been excluded from traditional financial systems.

Second, we need to ensure AI models are trained and tested in ways accounting for potential biases. This may involve implementing fairness constraints in the training process, using bias-detection algorithms, and regularly auditing AI systems to ensure they are not perpetuating harmful biases.

Finally, it is essential to rethink the broader social and economic systems producing the data in the first place. Bias in AI models is often a reflection of bias in society, and addressing these systemic issues will require a concerted effort across industries, governments, and communities.

Conclusion: The Future of Fair AI

There is no question—generative AI presents a massive opportunity to drive business efficiency and improve customer experiences across industries. From streamlining operations to delivering personalized services, the potential for AI to enhance business outcomes and customer delight is undeniable. However, alongside this promise comes a significant responsibility to ensure AI systems are built on a foundation of fairness and inclusivity.

As AI becomes an integral part of business strategies, the relationship between data, algorithms, and bias must not be overlooked. Data is not an objective snapshot; it reflects our history, with all its imperfections. Algorithms, trained on such data, can inadvertently perpetuate existing biases. As generative AI and advanced machine learning models continue to evolve, the risk of reinforcing historical biases grows, which could undermine the efficiency and customer satisfaction businesses aim to achieve.

To truly capitalize on AI’s potential, businesses must proactively address these biases. This involves scrutinizing the data used to train AI systems, implementing safeguards to detect and mitigate bias, and reevaluating the broader systems that produce skewed data. Taking these steps will not only ensure AI drives business growth but also that it fosters equity and fairness.

The future of AI is not just about technological advancements or operational gains. It's about building innovations that contribute to a more just, equitable, and customer-centric world, where the benefits of AI are felt by all, and business success is aligned with societal progress.

Appendix

In their paper On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell raise critical questions about the development and use of large language models (LMs). They argue that, while these models have the potential for widespread benefits, they also pose significant risks that demand careful planning and reflection by researchers. The authors emphasize the importance of ethical considerations, stakeholder engagement, and environmental impacts in the research process. Their paper invites readers to consider how larger language models affect marginalized communities and the broader societal consequences of their deployment. For a deeper understanding of these issues, we encourage readers to explore the full paper, which provides thoughtful insights and practical recommendations.

Their paper discusses the need for responsible research practices in the development of large language models (LMs). It advocates for shifting towards a more thoughtful, inclusive approach that considers the potential risks, costs, and impacts on marginalized populations.

In Section 7: Paths Forward, the authors provide a practical resource for AI users, developers, and investors to ensure long-term value in generative AI while minimizing bias. These key points serve as a checklist or set of guiding questions to help create fair and responsible AI systems, ensuring sustained societal benefits.

Planning and Resources: Researchers should prioritize careful planning across various dimensions before developing datasets or systems. They should treat research time and effort as valuable resources and focus on projects that promote a more equitable technological ecosystem.
Data and Bias: Massive dataset size doesn’t guarantee diverse viewpoints and often leads to failure in inclusion efforts. Instead, researchers should invest significant time in curating appropriate datasets, avoiding reliance on easily-scraped data.
Financial and Environmental Costs: The economic and environmental costs of model training are substantial. Efficiency in energy and computation should be incorporated into model evaluation to reduce inequities.
Documentation and Transparency: Researchers must document data selection and collection processes thoroughly, providing clarity on goals, values, and motivations. Potential stakeholders, especially those at risk of negative impact from model errors, should also be identified and considered.
Re-alignment of Research Goals: Instead of prioritizing larger models and leaderboard scores, efforts should focus on understanding how machines perform tasks and how they integrate into socio-technical systems. Techniques like pre-mortems can help researchers evaluate risks, failures, and alternatives before proceeding with a project.
Stakeholder Engagement and Value-Sensitive Design: Research should involve stakeholders and consider their values, aiming to design systems that support these values. This approach can identify and mitigate harms early in the development process.
Dual-Use Problem: The dual-use nature of LMs (e.g., both helpful and harmful applications) should be addressed by mitigating risks while preserving their beneficial uses, such as in automatic speech recognition (ASR) systems for marginalized communities like Deaf or hard-of-hearing individuals.
Temporal Instability and Long-Term Impact: The time distance between training data and its real-world application can cause predictive instability, especially in sectors with long-term effects like criminal justice and financial lending. Researchers should account for time-based environmental noise and evolving social norms in model development to minimize the risk of outdated or biased predictions over time. (Please note: this 8th point was not part of the original Bender, Gebru paper)

In summary, their paper urges researchers to center their work on the people most affected by the resulting technology, accounting for environmental impacts, engaging stakeholders early, and focusing on harm mitigation and ethical considerations.

Notes

[i] Donella Meadows, Harvard and MIT-trained systems researcher, on the systemic issues with biased, late, or missing information: “…. most of what goes wrong in systems goes wrong because of biased, late, or missing information.” From Thinking in Systems: A Primer.

[ii] Hulett, J. Resolving Lending Bias: A Proposal to Improve Credit Decisions with More Accurate Credit Data. The Curiosity Vine, Sep 7, 2021. Updated Oct 16, 2023. Discusses how biased data leads to biased credit algorithms and proposes solutions for resolving systemic lending bias.

[iii] Scarcity and decision-making: Mullainathan, S., & Shafir, E. Scarcity: Why Having Too Little Means So Much. Times Books, 2013. Explores how scarcity affects decision-making, including in financial situations, and contributes to systemic biases.

[iv] The Urban Institute. “Throughout this country’s history, the hallmarks of American democracy – opportunity, freedom, and prosperity – have been largely reserved for white people through the intentional exclusion and oppression of people of color.” From A Framework for Reducing Racial and Ethnic Disparities in the Juvenile Justice System.

[v] Ludwid, S. “Credit scores in America perpetuate racial injustice. Here’s how.” Quartz. Explains how the U.S. credit system structurally disadvantages people of color by embedding historical biases into modern data sets.

[vi] Blattner, L., & Nelson, S. “How Costly is Noise? Data and Disparities in Consumer Credit.” The Quarterly Journal of Economics. Discusses the effects of data noise and bias in consumer credit assessments, highlighting disparities in lending outcomes.

[vii] Heavan, W. “Bias isn’t the only problem with credit scores—and no, AI can’t help.” MIT Technology Review. Analyzes how bias in data, rather than the algorithms themselves, is the primary issue in AI-driven credit scoring systems.

[viii] Home Mortgage Disclosure Act (HMDA): Federal Reserve Board. A Guide to HMDA Reporting: Getting It Right! Defines data collection requirements under HMDA, which mandate the reporting of protected class data for the purposes of identifying lending disparities.

[ix] Fair Isaac Corporation (FICO) models and credit scoring methodologies: Can Machine Learning Build a Better FICO Score? Fair Isaac Corporation report. Discusses the evolution of credit algorithms and the role of nonlinear modeling techniques in improving credit score accuracy.

[x] In this article, we make a distinction between bias and noise, though the article's focus is mostly on bias. Like accuracy and precision, bias and noise are different and those differences are useful to understand. Please see this article for a primer on the differences between them:

Hulett, J. Good decision-making and financial services: The surprising impact of bias and noise. The Curiosity Vine, May 19, 2021.

[xi] Hulett, J, Unlocking Wealth in the Platform Economy: Strategies for Investors and Consumers, The Curiosity Vine, 2024.

Artificial Intelligence

Bender, Gebru, et al., On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, FAccT ’21, March 3–10, 2021. https://dl.acm.org/doi/10.1145/3442188.3445922

IBM. “What is Backpropagation?” IBM. https://www.ibm.com/cloud/learn/backpropagation .

Built In. “Backpropagation in a Neural Network: Explained.” Built In. https://builtin.com/machine-learning/backpropagation-neural-network .

DeepAI. “Backpropagation Definition.” https://deepai.org/machine-learning-glossary-and-terms/backpropagation .

SpringerLink. “Explainable Empirical Risk Minimization.” Neural Computing and Applications. https://link.springer.com .

Stay Curious.