Apr 4, 202333 min read

The Business and Science of Scale - How Data, Decision Insight, and Economics Drives Success

Updated: Nov 16, 2023

Many industries encourage scale. Scale is growth. Scale drives profitability by leveraging costs. Scale requires structure to support that growth. The question becomes:

"How do you add the necessary structure to support scale-related growth, but still enable the adaptability and creativity to change as the market environment requires?"

As a central theme of this article, we explore "The scale unit" as the answer to this "How to scale" question. Many companies, especially those in industries where scale is a common competitive feature, benefit from a scale unit. For this article, we borrow from banking and other consumer-centric industries to provide scale unit examples. But it relates to most industries. Please read this with an eye toward your favorite industry. The parallels are readily available and the lessons are powerful. We will show you how to make the scale unit a flywheel for innovation and adaptation.

This article examines customer operations, risk management, finance, and related functional organizations in the context of growing and improving information technology. We encourage enterprises to develop, nurture, and properly fund a scale unit. In summary,

The scale unit is:

An analytically focused organization that performs behavioral and automation-based testing and analytics. It applies scientific principles in the service of causality. The scale unit is a flywheel enabling long-term company success.

The scale unit's benefits include:

It helps enterprises understand the economic value of business activities intended to drive business success.
It recommends decisions to optimize risk and improve customer delight.
It results in long-term growth, profitability, and scale.
A well-functioning scale unit is a cultural enabler for creativity and ongoing adaptability.

About the author:

Jeff Hulett is a career banker, data scientist, behavioral economist, and choice architect. Jeff has held banking and consulting leadership roles at Wells Fargo, Citibank, KPMG, and IBM. Today, Jeff is an executive with the Definitive Companies. He teaches personal finance at James Madison University and provides personal finance seminars. Check out his new book -- Making Choices, Making Money: Your Guide to Making Confident Financial Decisions -- at jeffhulett.com.

Table of contents

Implementing a scale unit flywheel - Every scale unit needs Radar O'Reilly
1. The Challenge: How Incrementalism and the Pareto principle impact data science
2. The Solution: Building your scale unit flywheel like a M*A*S*H unit
The scale unit foundation: Risk Management and Data
1. Risk management thinking
2. Data thinking
The big convergence and the scale unit
1. Big Risks and Big Data
2. The scale unit defined
The scale unit - examples and essentials
1. The mortgage servicing example
2. Top 10 essentials for enterprise-level scale unit success
3. Compliance automation example
4. The economics of "I'm sorry"
Conclusion, Resources, Notes, and Appendix

1. Implementing a scale unit flywheel - Every scale unit needs Radar O'Reilly

We start at the end. Section 1 explains how to implement a scale unit. Sections 2 - 4 describe some of the essential enablers for a successful implementation. This article offers a variety of metaphors, analogies, and references to banking and a few other industries. The examples are meant as helpful success stories. A scale unit is a success enabler for almost any industry, especially those with a growth mandate.

The TV sitcom M*A*S*H ran in the 1970s and was a funny show about an Army hospital unit in the Korean War. (In case you are wondering, I only watched the re-runs....) This show still holds records for being one of the most-watched TV shows in history. What could data science possibly have in common with this T.V. show? Turns out, quite a bit.

But first, we explore common challenges impacting an organization's ability to scale. Then, in the section immediately following, we show how to solve these challenges with a scale unit flywheel. We will show how Radar O'Reilly is foundational to your scale unit.

a. The challenge: How Incrementalism and the Pareto principle impact data science

In this section, we provide scale-limiting examples of incrementalism and the Pareto principle.

Incrementalism: Our example is from the banking industry, though, virtually all industries are subject to incrementalism. Incrementalism becomes a challenge because of market uncertainty and volatility. Given a company's need for ongoing change, incrementalism may cause a company to lack the necessary tools and understanding to adapt or learn from its customers and markets. This occurs for 2 reasons:

A company did not start in a data-native environment, so transitioning legacy systems to a high-quality information-based strategy environment is challenging.
Data technologies change so quickly, that some companies may have started as data-native, but have fallen into incrementalism because of a failure to keep up with the latest data technologies.

A banking example: Banks become big banks mostly through consolidations. The consolidation catalyst may occur from many sources:

Often a big economic downturn is the cause,
Sometimes law changes are a cause, (Think of the reduced interstate banking restrictions in the 1990s) or,
It may be the regulatory change is a cause, resulting in different scale economies. (Think of the CFPB requirements that kick in at the $10B bank asset size. Once a bank goes over $10B in assets, they need to add significantly more risk management infrastructure.)

The following graphic shows consolidations for some of the biggest U.S. banks from 1990 to 2010. Certainly, a similar consolidation trend exists for most industries. "Eat or be eaten!" seems to be the mantra.

So, what does this mean to data science in banking? Banking, like most industries, suffers from "incrementalism." This occurs for a multitude of reasons, including:

Our human nature to think shorter term (e.g., Recency Bias, Availability Bias),
SEC registrant quarterly reporting requirements encouraging short-term reporting and related short-term thinking, and
the consolidation norm specific to most industries.

In the data science world, data is the raw material enabling analytical success. Data is simply defined as:

"Data is a representation of our past reality."

Access to data - our past reality - is critical. Unfortunately, in the incremental organizational context, data can be a challenge to locate, access, and utilize. Next is one of my favorite relevant aphorisms about data.

Where is the wisdom? Lost in the knowledge. Where is the knowledge? Lost in the information.

- T. S. Eliot

Where is the information? Lost in the data. Where is the data? Lost in the damn database!

- Joe Celko

This suggests if an organization wishes to grow from its past learnings and reality - to gain customer knowledge and wisdom - it starts with care and feeding strategies for organizational data. This includes strategies for integrating acquired company data. In the context of an acquisition due diligence, data quality and integrate-ability should be highly weighted criteria.

M&A Due Diligence decision-making example: The best M&A decisions start with the best decision-making process. It starts with the fundamentals of decision-making. The best practice is to consider company goals and the many questions necessary to resolve when preparing for the best decision. There are many benefits to a well-tuned decision process - culture, execution, and the best outcomes. The availability of a board decision framework and resources enables consistently making the best M&A decisions. In the context of building the scale unit flywheel, data quality and integrate-ability of the acquisition target should be highly weighted acquisition decision criteria.

See our article for context and solutions driving the best M&A decision process: Process Not Perfection - The foundation for better decisions

The '80 - 20 rule' for data science: Generally, in larger organizations, data can be very silo'ed in different operating groups, different operating systems (aka, systems of record), with various levels of care. Also, because of organizational incrementalism, acquired or some legacy company systems are not always fully integrated into central company systems. Today, with an increasing focus on information security, data accessibility is generally more restricted and may require special permissions. All this creates friction for the data scientist. Often, doing really interesting data analysis and driving actionable business insight is only about 20% of the data scientist's job, the remaining time is spent wrangling data and other administrative tasks. So, this is the data scientist's reality. Is it getting better?

Some days, yes --> better data warehousing, API's, or tool access occurs, Some days, no --> the next wave of consolidations or more info security rules occur.

If you are in a data science or economics group, especially groups focused on operational analysis and compliance testing, this reality is likely particularly acute. This occurs because you are closely tied to the operating system's data availability. The goal of a highly functioning scale unit is to flip the script and drive data efficiency. Your science and application organizations want to accelerate insight production. Insight production accelerates long-term organization value. This is a fight worth fighting! An organizational goal for data quality, care, and feeding sets the cultural expectation to treat data as one of its most valued assets.

The compliance testing example: Compliance Testing, especially specific to customer obligations, requires access to a core system of record data and documents. The gold standard is to directly test the customer's communication media (letters, texts, statements, online, auto agent, etc.) against the regulatory, investor, or related obligations. Because of organizational complexity, separate systems, third-party involvement, information security requirements, etc; automation-enabled testing of customer media may be challenging.

Please see our article AI and automated testing are changing our workplace for solutions to common testing challenges.

b. The Solution: Building your scale unit flywheel like a MAS*H unit

A practical solution to enhance data availability and enable scale unit success may be found in the following analogy. A M*A*S*H unit runs with a couple of primary operating groups. Those include the expert doctors, nurses, and orderlies that attend to the patients - think of Hawkeye or Margaret Houlihan. Also, the M*A*S*H unit includes leadership, like Colonel Blake or Colonel Potter. Naturally, economics-related data science shops also have both experts and leaders (the data scientists and the data science leadership) So far, so good. But frequently overlooked in data science shops is the single most important factor to make a M*A*S*H unit run. That is, Radar O'Reilly. Radar is not just a company clerk, he is the grease that makes the M*A*S*H unit run. Radar is the one that knows how to get things done, knows all the Army supply sergeants, and that knows the company clerks at the other Army M*A*S*H units. As such, Radar knows where to get the raw material to ensure the M*A*S*H unit effectively operates. Radar knows his way around the Army and how to work back channels.

In the data science context, the Radar-type team leader is essential. Radar knows:

Where to find organizational data,
The organization's legacy systems,
Whom to contact to get the data,
How to get the metadata/data dictionary/data ontologies,
The nuances of the infosec rules, and
How to stay ahead of the next big change affecting data availability.

Asking an economist or data scientist to run down data is like asking surgeons to buy their own sutures.

The scale unit flywheel

Framework to integrate economics into your organization

For whatever reason, data science organizations do not always hire the Radar O'Reilly types. An essential attribute is that the Radar O'Reilly types are homegrown. These are often long-time employees that know the data, know the business, and know "where the bodies are buried" when it comes to finding data. They often have deep relationships across the organization. If I was starting a new economics-related data science organization in a big enterprise, my first hire would be Radar O'Reilly. Sure, I would eventually hire a crack team of data scientists, economists, junior analysts, business analysts, relationship managers, application engineers, and license Definitive Pro, Python, R, SAS, RPA / OCR engines, or related tools and storage. But Radar and building out my data organization would come first - since it is hard to analyze something if your data raw material is elusive and regularly at risk.

The successful scale unit flywheel

To achieve the full and ongoing value of the scale unit flywheel, organization and culture are critical! Next, described are the primary organizations within a successful scale unit.

Leadership: This is the Office of the Chief Economist. Your chief economist and office practitioners are:

Competent business economists with connectivity to the academic world;
Possess real-world industry knowledge or the business context;
Are adept at building cross-enterprise relationships;
Have a seat at the table for senior leadership decision-making; and
Collectively, office practitioners have experience in all four flywheel disciplines - Data, Applications, Science, and Business.

Data: These are your data experts. These are often experienced resources hailing from business organizations. They understand the data needs and how to get around some of the data challenges. They understand legacy systems. Like Radar O'Reilly, a good data leader will have an intuition about future business challenges. They are also technically proficient with data, data storage, ETL transformations, data security needs, and the data requirements of upstream applications.

Enterprise relationships: Enterprise data management and legacy systems data

Applications: These are your most knowledgeable resources about the latest Artificial Intelligence, Machine Learning, Unsupervised learning, Robotic Process Automation, and required model validations. They are "tool people" with technical knowledge about common modeling software applications like R and Python, as well as decision platforms like Definitive Pro. They will also have a background in operational Customer Relationship Management solutions like Salesforce.

Enterprise relationships: Enterprise Risk Management and Technology organizations

Science: These are your economists and researchers. They are steeped in Randomized Control Trial (RCT) test design and the availability of natural experiments. They likely have backgrounds in statistics, theoretical mathematics, and some scientific discipline. This group determines the performance of tests and recommends scale opportunities. This group has close relationships with relevant academic research and may publish in academic journals and by partnering with universities. Process and product patents may also be managed through the science organization.

Enterprise relationships: External academic research and universities

Business Liaisons: These are relationship managers that work with senior operating leaders and their management teams. They develop testing opportunities with the science team and facilitate needed data. As a critical point, the 'voice of the customer' is facilitated by the business liaisons. This is where the scale unit 'keeps it real' to ensure a laser focus on customer and company goals. Scaling opportunities are recommended and planned with the business liaisons.

Enterprise relationships: First-line business organizations.

The idea is to think of the economics organization in terms of the flywheel effect. [ix]

"... the process resembles relentlessly pushing a giant, heavy flywheel, turn upon turn, building momentum until a point of breakthrough, and beyond."

- Jim Collins, Good to Great

A flywheel effect is NOT about a single momentous event, a single innovation, or a lucky break. The flywheel effect is about the business learning process and learning momentum. The ability to make smaller but cumulatively substantial good decisions via RCT-informed processes and leading to long-term profitable growth and scale. The RCT-based discipline and results are only the beginning. The economics organization is well-positioned to provide enterprise choice architecture and help the executive team make the best decisions. Your economics organization has the potential to be an organizational flywheel. A highly functioning scale unit will be instrumental and necessary for company success. We provide the five primary scale unit organizations in the earlier graphic. They are all important. Data and the Radar O'Reilly type are necessary flywheel catalysts.

2. The scale unit foundation: Risk Management and Data

Next, we explore risk management and data. We consider organizational challenges impacting functional organizations such as risk management. These are often teams organized outside first-line groups considered central to revenue growth. We also explore the opportunity to curate data for the scale unit's success.

a. Risk management thinking

For this article, we consider risk management on two levels.

Risk management as a company-wide culture. We consider risk management broadly, as a discipline in which all company associates participate regardless of their company group affiliation. Good risk management seeks the proper level of risk to optimize customer contentment, revenues, and business growth.
Risk Management as a corporate group. Related groups are sometimes known as "second-line" organizations, charged with overseeing the implementation of enterprise risk management needs. These groups will evangelize the broader risk management culture.

Many think of risk management as a cost center. Meaning, an important but necessary cost of doing business. Like a constraint, not an objective. Many do not think of overachieving in risk management as necessarily a good thing.

Business operations are different. They are associated with revenue generation. In a business context, overachieving is a code for making more money and making more customers happy. There is nothing wrong with that. Correct?

I have heard risk management and business operations compared in this way:

“The business is responsible for making the money and risk management is responsible for keeping the money.”

This is a reasonable description but it begs the question central to this article. Business operations generated revenue has a special advantage. When revenue increases - many are happy. Many may not care to closely inspect the cause of the revenue increase. As John F. Kennedy mentioned: "A rising tide lifts all boats" and many may be content to be part of the flotsam. For many, increasing revenue is all that matters. However, a cost increase to support a cost center like risk management is different. With respect to cost centers, people tend to inspect increased costs more closely. How do you know if a cost center action caused the organization to keep the money or if it would have happened anyway? What if risk-related budgets were more geared toward risk effectiveness? We will explore this more later in the article.

b. Data thinking

Data is critical to company success. There are important differences between non-curated data and curated information. Our modern world is drowning in non-curated data. All the while, making curated information more difficult to realize. It is getting more challenging to separate the signal from the noise. In fact, even our definition of censorship has changed. It is no longer the "book burning" related withholding of curated information as per past generations. Today it is the opposite. Censorship is associated with drowning others in non-curated data noise and purposefully creating a confidence-reducing, information-curation-challenged environment. As sociologist Zeynep Tufekci said [i]:

“The most effective forms of censorship today involve meddling with trust and attention, not muzzling speech itself.”

Banks and other information-intensive industries [ii], by their very nature, extensively use customer data for the manufacturing process. To overcome unintended "data drowning," the need to properly curate organizational data is more important than ever.

The bank data example: Just like a car manufacturer uses metal/metal alloys, plastics, and computer components to make cars, banks use credit, income, and wealth customer data to make financial products. The bank's financial product manufacturing process is, in a very real sense, the process of curating (structuring) non-curated customer data to create a curated information-based product. The banker's job is to separate the signal from the noise to confirm the best financial product for the customer. Banks and other information-intensive industries have significantly increased the use of data techniques for operational purposes. In some cases, an operational piece of data, like an unindexed or uningested customer voice recording, can be stored in a form only useful to the production process. However, the information contained in the voice recording may be valuable after the production process concludes. As such, the banks, along with many other industries, have a tremendous opportunity to leverage their own economic natural resource.

In the context of the scale unit, a key success enabler is accessing curated information AFTER the initial financial product has been provided to the customer. It is a matter of transitioning production data into useful curated information. The essential point is to recognize production data as a long-term asset, worthy of the care and feeding expected of all long-term assets.

Just add a few data scientists and you are good to go! Right? Maybe?.....Not really? There is certainly more to it, but this intention is directionally correct. We will explore data organization best practices later in the article.

Next, let’s summarize our perspective on the convergence of data and risk management. These are the building blocks of an operational scale unit.

3. The big convergence and the scale unit

The following passages "connects the dots" between the two converging risk management and data perspectives. This lays the groundwork for the behavioral economics-focused scale unit.

a. Big Risks and Big Data:

We appreciate it is difficult to quantify the true effectiveness of risk management. At a minimum, we take it on faith that risk management is important and enterprise leadership provides budget funding. The budget allocation is often judgmental, anchored in last year's budget amount. At the other extreme, organizations will react by potentially spending $billions to remediate regulatory mishaps. The number of headline examples is almost endless, certainly, the OCC's $25 Billion Mortgage Settlement as associated with the 2007-08 financial crisis is a good example. For those of us that have lived through these regulatory actions, we appreciate the headline regulatory cost is just a starting point. The actual cost, including operating, legal, consultants, etc., is some multiple of the headline cost.
Data is more available than ever. True, there is an increased focus on data security for minding customer data. Data availability has increased dramatically, especially as the organization's ability to manage data efficiently has improved. To wit:
1. Decreased cost of data storage and cloud technologies.
2. Increased bandwidth and data transportation technology like APIs.
3. Improved data analysis effectiveness with data science tools. (like Artificial Intelligence) and
4. The ability to convert unstructured data (like documents or voice recordings) into useful structured data with Optical Character Resolution (OCR) and Natural Language Processing (NLP) technologies.

The call for an economics-focused Scale Unit

b. The scale unit defined

The remainder of this article is about the big convergence and company scale unit(s). That is:

Utilizing the scale unit to solve the causal and related economic measurement challenge and
Leveraging fast-growing and improving data technology. All the while,
Driving economic and broader customer objectives.

First, this concept is not new to many industries, only the products and scale are changing. As a banking example, in the 1980s and 1990s, starting primarily with Citibank, Capital One, and other bank data pioneers, credit card companies began realizing the power of data. In fact, Capital One, the brainchild of Richard Fairbank and Nigel Morris, was built specifically to utilize an "Information-Based Strategy" or "IBS." Full disclosure, I previously worked for both organizations (or predecessor organizations). To some degree, the use of data in credit card companies is easier, since the card product is already "data and automation-ready." The change today occurs because traditionally more "data and automation-challenged” bank products, like mortgages, are more analytical and automation available today. This occurred because of the data management improvements mentioned earlier. In general, banking products fall in the following scale:

Please see our Automation Adaptability Framework as a way to identify your business' automation readiness in the context of common operating features. More context is provided in the appendix. Next, we show how this framework is generalizable across most industries. The framework shows that more complex and regulatory-controlled products are more challenging to apply data-focused techniques. The good news is, newer technologies and increased data availability make the use of automation and information-based strategies more available to these complex products.

What is a scale unit? These are company organizations that:

Quantify the value of risk management (or related organization) actions and
Drive economic effectiveness via the use of Randomized Control Trials (RCT) or other related automation and analytical strategies. [iii]

RCT is considered the gold standard for scientific studies, specifically for establishing causality. In the current data science world, causality is a big challenge. Done correctly, RCT is a full-proof way to determine that x caused y [X → Y]. In this case, a specific economic policy (x) caused the company to better serve a bunch of customers and/or save a bunch of money (y). As will be discussed in the next examples, RCT is not the only way to establish some level of causality, but is a good way AND is available today!

This article takes no position on exactly where such a unit should be organized. This depends on the specific organizational circumstances. With that said, next are considerations to maximize scale unit success chances:

Scope breadth - the successful scale unit will be more successful the broader it’s organizational scope.
Data resource access - data is the scale unit’s raw material. Broad data access is a success enabler.
Customer access - access to customer operations is a success enabler.
Analytical resources - the scale unit needs access to unique programming and analytical computing platforms. The scale unit will hire top analytical talent from a variety of disciplines.
Testing operations - the scale unit will need its own testing operations.

I do want to point out a significant data science challenge today as it relates to industries subject to regulatory scrutiny. Banking is certainly subject to the regulatory influence, but many other industries are as well. In today's platform company world, as led by companies like Google, Amazon, Netflix, etc., the need for causality is sometimes downplayed. In the case of selling, say, a consumer product via Google, a causal determination is not so important. For example, just because someone searched for a product before, does not cause them to buy the same product in the future, it is just a probabilistic correlation. That correlation is potentially enough to identify a group of customers more likely to buy a product when presented in a search engine advertisement. The marketer does not necessarily know whether some treatment caused an outcome or was only correlated with an outcome. Their incentives relate to the organization making more revenue and they can show positive marketing effectiveness metrics. Incentive-correlated outcomes may enable marketers to reach their goals and are likely easier to achieve when causation is not required. Marketing effectiveness measures are a standard way by which marketing budgets are allocated in large organizations.

For marketers, that may be fine. Industries subject to higher regulatory oversight may not have that luxury. Banking is one of those industries. This means banking is in the business of why. If a loan applicant is declined for a loan, by law, the bank must provide the customer a causal-based "why" adverse action explanation. If a customer’s security portfolio value drops, a customer is going to demand why. If a person applies for a mortgage, it is declined, and the applicant belongs to a protected class (like race, gender, age, etc), the bank is required by law to track why the applicant was declined in terms of confirming the decline decision was not a result of disparate treatment. [iv]

4. The scale unit - examples and essentials

In this section, by example, we will provide an explanation of how banks and other industries could use a scale unit (or similar) to drive economic effectiveness, in the causal context. We also provide scale unit essentials. These are characteristics of a successful scale unit based on decades of experience.

The mortgage servicing example
Top 10 essentials for enterprise-level scale unit success
Compliance automation example
The economics of "I'm sorry"

a. The mortgage servicing example

It is no secret, the post-pandemic world is likely to have its share of financial troubles. The Consumer Financial Protection Bureau (CFPB) has been vocal about lenders, and in particular mortgage companies, being ready to help their customers. On April 1, 2021, they issued a bulletin that made those expectations very clear.

"CFPB Compliance Bulletin Warns Mortgage Servicers: Unprepared is Unacceptable"

But sometimes, borrowers are not always easy to reach. Generally speaking, proactive customer contact can lead to a higher resolution rate and a lower loss rate. So, what can a mortgage lender do to improve its contact rates? The United States has something to learn from its neighbors across the pond.....

The Behavioural Insights Team (BIT), also known unofficially as the "Scale Unit", is a U.K.-based social purpose organization that generates and applies behavioral insights, to inform policy and improve public services. [v]

In the build-up to and aftermath of the 2008 financial crisis, Northern Ireland was hit particularly hard by a housing boom and bust. Many homeowners still face negative equity, delinquency, and ultimately the risk of foreclosure. One of the key behavioral challenges is encouraging homeowners at risk to engage proactively with their lenders so that effective solutions can be found. BIT was commissioned by the Department for Communities Northern Ireland to develop and test a range of behavioral interventions to increase loss mitigation-related customer contact and engagement. A report was created in June 2018 that outlines the results and is summarized in the next section.

Please note, BIT is using Randomized Control Trials (RCTs). In this case, the control group is important to confirm the baseline customer contact business environment, before the new customer contact intervention. That is, RCTs are necessary to determine causality. Meaning, to determine whether the new risk management tactics caused improvement as compared to the existing risk tactics. Please keep in mind, these results need to be validated in your unique environment. While likely a good starting point, these results are not a substitute for performing your own RCT testing.

Given the potential post-pandemic challenges to mortgage servicers globally, this is a particularly interesting study. Customers with payment challenges are often difficult to contact. Plus, many people no longer have a traditional landline-based home phone. As such, if a customer does not wish to be contacted, it is easier today for them to avoid calls from their lenders.

Communicating with collections customers is generally challenging and a critical step to encouraging a customer payment or developing a loss mitigation strategy. The tests utilized several different communication approaches to drive customer response. These approaches included:

Letters, email, and text reminders;
Behaviorally informed calls to action including personalization, loss aversion, and reciprocity; and
Handwritten notes to increase salience.

The response was measured by contact rate for both inbound calling and collections agent outbound calling. The first and third test results demonstrate a significant customer contact improvement over the baseline control results. While a higher contact rate does not automatically lead to a 1-to-1 decrease in loss rates, it will very likely make directional improvements. The results will inform ongoing collections effectiveness by:

Increasing collections of customer contact and
Effectively resolving customer delinquency.

b. Top 10 essentials for enterprise-level scale unit success

My experience includes behavioral economics and operating leadership in large banking and consumer products organizations. I have led behavioral economics integrated, scale unit teams. I also led operating divisions that integrated scale unit-like teams and cultures. Naturally, I was VERY fortunate to be surrounded by many talented, dedicated people! We used a wide range of analytical techniques (some AI-related), data sources, Randomized Control Trial (RCT) techniques, and other behavioral techniques to manage bank credit loss exposure and optimize lending program performance. We used the same techniques to optimize the customer experience and deliver long-term growth. We were large enough that we had our own testing operation, which meant we had dedicated customer agents trained for behavioral testing. We also had systems designed to quickly integrate results with the test design parameters. In my career, I have managed or overseen thousands of RCTs. These were some of the coolest jobs I ever had!

Top 10 List for Scale Unit Success

Listen to your customers! Go beyond ”The Matrix" data view of your customers. Have customer round tables and focus groups. Great testing ideas come from listening to your customers.
Data scientists should build "real" customer context. Personally, I think all data scientists and related should have some kind of regular customer interaction. This helps make the messy world of emotions and behavior real for the data scientist.
Seek unique data about your customers. This could be an insight from existing data or it could be new and unique data.
There is a balance between the data scientist and data collection. In general, you want to keep your data scientists focused on building customer insight via the data. The data scientist should not spend too much time collecting and preparing the data.
Some "data digging" is ok. Data Scientists often do not like data digging. Data digging is code for the messy ETL-related data processes needed for less structured data sets. It can be grinding work. I call this the "meta metadata." That is the story behind the data dictionary. It can be time-consuming and take away from primary data analysis. While I hope Data Scientists spend the majority of their time analyzing, some data digging can be both instructive and can lead to a "digging for gold" outcome by finding unique competitive insights.
Test new Artificial Intelligence techniques. My observation is, usually, new analytical techniques are not always better than "tried and true" techniques like Regression and Decision Trees. However, we always learn something new and useful in the process, beyond the fact that new AI techniques were not always effective. It is worth the exploration, just not for the reasons you may expect.
Test execution is critical. Commit the resources for proper test execution. Testing systems may include:
1. Testing program guides,
2. Coding to differentiate test and control groups,
3. Collecting performance results,
4. Scripting for agents or customers,
5. Availability of characteristic data and related testing information.
6. Analytical resources to analyze and provide post-test results and recommendations.
Test with a successful scaling outcome in mind. Meaning, assuming success, how will this test be rolled out and scaled in our base business? Unfortunately, I know successful tests that failed to impact business results. This occurred because of a failure to scale.
Causality is key! RCT is necessary to drive confidence in the causal nature of your results. It will also help business leaders understand the value of risk testing. Often, a small (but statistically significant) percentage risk test gain will lead to a significant bottom-line improvement. By the way, not every test is suitable for RCT. If you test without a control group, be very explicit about what you hope to learn and potential learning limitations.
Useful for many organizations. With today's information technology, scale units can be useful for many products or services companies.

c. Risk and Compliance automation examples

This example is specific to obligation compliance testing. This example will be different from the other article examples for a couple of reasons.

It is newer. Traditionally, analytics has been more focused on credit risk management than compliance risk management organizations, and
While Randomized Control Trials (RCT) are certainly possible, this example is focused more on automating compliance testing.

In the main, control groups are not always necessary or even desired for compliance.

Conceptually, a control in compliance testing could include being out of compliance. It would likely not make sense for an organization to purposely be out of compliance for the sake of an RCT! There are a few creative ways to overcome this, generally known as natural experiments.

Organizational grouping: Two different but similar enough organizations may be comparable from an operating and risk requirements standpoint. But circumstances may provide an environment where these similar organizations face different compliance needs or treatments. For example, what if one of the operating groups was doing a better job implementing certain laws concerning customer communication? You could then compare one group to another.
Data twinning: Within an organization, sometimes mistakes happen. These mistakes may create organizational risks. As organizations scale, the ability to evaluate the operational quality of every customer interaction becomes a challenge. Quality Control is the means by which organizations test the quality of those customer interactions. This testing often goes by the generic-sounding title such as: "QC testing," "transactional testing," or "transactional file testing." Almost all quality control requirements have some minimum quality threshold, like 95% error-free. That means that 5% of the customer transactions may have some risk or compliance error. A test group is a group with an error and the control group is the error-free or "good" group. "Data twinning" is where the customer transactions in each group are tailored to only keep those that are very similar. Properly constructed data twinning groups enable causal conclusions. Group similarity could include:
- age of customer relationship
- customer demographics
- risk experience (like the number of times delinquent)
- product type
- geographic location

In risk testing, it is important to discern potential type II errors (False Negatives) and type I errors (False Positives). Earlier, we suggested there are only 2 outcomes - "Good" or "Error" groups. In fact, there are actually 4 potential outcomes. The two additional outcomes occur because quality testing uses a model of operational reality and that modeled reality is sometimes wrong or incomplete. The "5% error group" discussed earlier contains the identified errors. If it turns out some of these errors are actually not errors, then this is known as a type I error. (False Positive) This occurs when errors initially identified by the risk testing group are challenged and overturned by business operations. Compared to manual human testing, good automation routines are generally able to appropriately identify errors, reducing the incidence of false positives.

But what if an error actually did occur but was not identified in the risk test? This is very different than the false positive. These are more dangerous errors and are known as type II errors or false negatives. In my experience in large companies, hidden errors occur all the time. Corporate testing regimes are good but not perfect. As we discuss next, incentives may discourage uncovering type II errors. Also, type II errors are likely fodder for costly regulatory actions. The discussion in the "Automated customer communication testing example" below provides an approach to uncover previously undiscovered errors.

All errors were not created equal! An organizational incentives case study:

Interpretation: You may think of the "operational reality" dimension as what actually happened regarding past customer interactions. You may think of the "testing outcome" dimension as the modeled estimate of what should have happened. Incentives have a way of impacting both what actually happened AND the testing of what should have happened.

Type I errors seem reasonable. Testing is not perfect and the validation of test errors generally turns up important learning about data, processes, testing routines, etc. However, business operations really dislike Type I errors. If risk management criticizes business operations, operational leaders may perceive false positives as unnecessarily "calling my baby ugly." Also, quality errors are often used as input to incentive compensation**. Thus, individual business participants have incentives to aggressively challenge these errors. They are literally fighting for their money.

[** "Incentive compensation" may be in the form of a direct incentive, like a bonus directly tied to certain quality goals. Also, "incentive compensation" may be an indirect incentive, where bonus compensation includes quality goals as one of a basket of goals evaluated for bonus payout.]

Type II errors are dangerous. These are the unseen errors that may lead to regulatory action or even existential challenges to a company. Type II errors can start small, but left unchecked, may metastasize like cancer. But there is generally little incentive to discover Type II errors. People do not like to spend energy on something they do not feel accountable for! In fact, the business unit has a plausible defense for not checking for false negatives, such as: "Hey, the risk management organization did not even find this error and that is their job! I was too busy taking care of customers and chasing the errors they did find."

Also, there could be an incentive misalignment, causing a disincentive to uncover type II errors. This incentive challenge anatomy looks something like this: Business participants may be paid on quality. That quality measure is generally only based on found errors. In this case, they may be incented to "look the other way" on errors not discovered in the formal risk management testing processes. From a practical standpoint, finding undetected type II errors takes work. Most operating folks are already over capacity and "filling a 5-pound bag with 10 pounds." Just from a work capacity standpoint, the addition of "one more thing" is often unwelcome.

Next, we cite a prototypical example of a smaller type II error that may have started small, but metastasized to become a near-existential enterprise challenge. At the time of this U.S. Department of Justice legal action, Wells Fargo was the largest bank in the United States.

Wells Fargo Agrees to Pay $3 Billion to Resolve Criminal and Civil Investigations into Sales Practices Involving the Opening of Millions of Accounts without Customer Authorization

Those of us that have been involved in responding to similar enforcement actions appreciate that "$3 Billion" is only the headline number. The final cost will be a significant multiple of $3 Billion, when you include employee, consulting, new technology, and other related costs.

The facts and circumstances of this scandal have all the trappings of type II error-based misaligned incentives. In a bank the size of Wells Fargo, there are a large and diffuse group of employees that “should have known” that millions of fake accounts over many years were opened for unsuspecting customers. It may seem unbelievable that the “should have known” bank employees were unable to put a stop to this fake account sales practice. It may seem unbelievable that risk management quality testing was unable to detect the fake account sales practice and escalate the issue to put a stop to it.

But such is the power of incentives. Misaligned incentives can be particularly nefarious in large diffuse organizations, where individual accountability is less clear.

Thus, as a rule of thumb, organizational incentives have a tendency to overemphasize type I errors and underemphasize dangerous type II errors. Next are best practice suggestions for overcoming incentive and testing-borne challenges:

Good testing automation processes help to reduce type I and type II errors.
Being VERY thoughtful about organizational incentives is really important. Misaligned incentives have a way of leading to "you get what you pay for" unintended consequences associated with type II errors.
Create a culture that rewards creative risk thinking. Type II errors are detected by "out of the box" thinking. Leaders should encourage the creative thinking necessary for detecting previously unknown challenges.
Finally, utilizing a structured risk portfolio decision process will optimize your limited risk testing resources across your risk portfolio. Optimized risk resources help to reduce the impact of potential type I and type II errors.

We discuss choice architecture in the resource section as a means to enhance your risk portfolio decision process.

Customer compliance communication testing example

In banking and many industries, disclosures are required to ensure that consumers understand key aspects of the economic transaction. Think of a mortgage or other loan you may have originated. The disclosures are the mound of paper or digital documents sent to you. If like many borrowers, you likely acknowledged receipt without completely understanding the disclosure.

As a testing example, what if communication effectiveness was an objective for compliance disclosures? I know! Crazy talk! Sadly, most consumer disclosures seem to be designed to not be read and to discourage understanding. Behavioral economists describe such “not to be understood disclosures” as containing sludge. Over time, it is expected regulators will get more aggressive with compliance disclosure expectations. If a progressive organization wanted to reduce disclosure sludge, it could do so via an RCT structure. Proactive testing would help the organization learn which communication techniques are best suited to increase long-term customer value. Also, sometimes natural experiments may occur across multiple operating groups performing a similar function. This could create a valid RCT - like test environment. [vi]

Automated customer communication testing example

The benefits of automated testing include:

increase compliance testing coverage,
decrease testing costs, and
improve testing quality.

By the way, this example is a composite of actual recent experiences across multiple company departments or divisions.

From a customer and regulator standpoint, customer communication and documents (letters, statements, emails, texts, promissory notes, disclosures, etc) are the customer's "system of record." That is, customer communication and documentation are the ultimate confirmation source that the company has met various regulatory, investor, and other obligations. Because customer communication is often stored as unstructured data, it requires cost-effective automation capabilities to interpret documents, ingest data, and evaluate regulatory obligations. See the following graphic to compare a bank or company to the customer's perspective.

Also, an operational complication could arise if third parties are involved in the creation and transmission process of customer communication and documentation. Given this, the ability to structure data and apply obligation tests is critical for testing the “customer view” and is the essence of compliance automated testing.

In general, automated testing is an updating process as communication, documents, and algorithms are validated. Below are key automation outcome categories, resolutions, and success suggestions depending on the nature of the automated testing outcomes.

For more information, please see our article Making the most of Statistics and Automation. [vii]

d. The economics of "I'm sorry"

John List is a University of Chicago professor and author. Dr. List focuses on using RCT-type field experiments across a variety of industries. Next is an example from the ride-share company Uber. Dr. List was Uber’s Chief Economist. This is an example of a testing approach for managing customer risks. [viii]

All companies make mistakes. This example describes how a company may apologize in an optimal way. That is a way that makes the customer happy AND costs the company as little as possible to achieve that customer happiness. The bottom line is cost and long-term value are NOT linearly related. A lower-cost apology may cost less today but may cost the company more value over time. Making the proper apology investment is critical!

As part of its experiment, List’s team analyzed data from 1.5 million Uber riders who had arrived at their destinations late. As part of the test design, the team used the “data twinning” technique. According to Dr. List:

“So what we did was find statistical "identical twins" in the data: two consumers who were identical up to a point in time, but then at that point one of them received a bad trip whereas the other received a good one. Because Uber executes nearly 15 million rides per day, there were plenty of statistical twins to explore.”

The team divided the customers into various twin-based test groups that all had less than satisfactory ride experiences. Each of them was emailed a different type of apology from the company or received no apology at all. Half of the groups also received a $5 coupon along with the email. The researchers found that money spoke louder than words.

Here is the important point:

Test Group A: The groups that received the $5 coupon, along with the apology, actually spent more money than usual on Uber rides over the next few months.

Test Group B: The groups that only received an apology went on to spend less than usual on future rides.

Outcome: Spending more on the appropriate apology was an investment leading to higher long-term customer value.

"What's important here is that the firm needs to make sure that when they apologize, they take proper discretion in that the consumer understands that there was a true cost to that apology," List said.

5. Conclusion

This article considers scaling organizations in the context of improving information technology and the economics sciences. Since companies are in the business of "why" we encourage organizations to have an economics-oriented scale unit. That is, an analytically focused organization that performs behavioral and automation-based testing and analytics. It helps you understand the economic value and make decisions to optimize risk and improve customer delight. We also suggested some of the organizational pitfalls and how to implement a data science organization in the context of a M*A*S*H unit, data-focus, and the flywheel effect.

Resources

Definitive Pro: For corporate and larger organizations - This is an enterprise-level, cloud-based group decision-making platform. Confidence is certainly important in corporate or other professional environments. Most major decisions are done in teams. Group dynamics play a critical role in driving confidence-enabled outcomes for those making the decisions and those responsible for implementing the decisions.

Definitive Pro provides a well-structured and configurable choice architecture. This includes integrating and weighing key criteria, overlaying judgment, integrating objective business case and risk information, then providing a means to prioritize and optimize decision recommendations. There are virtually an endless number of uses, just like there are almost an endless number of important decisions. The most popular use cases include M&A, Supplier Risk Management, Technology and strategic portfolio management, and Capital planning.

Next are a few whitepapers and examples of how to make the best organizational decisions:

Notes

[i] Please see the article It's the (Democracy-Poisoning) Golden Age of Free Speech. Tufekci points to the use of Twitter and other social media as an example of non-curated data drowning strategies used by some politicians. Also, see our article Information curation in a world drowning in data noise. This article provides insight and tools to be a responsible information curator in a world drowning in data noise.

[ii] In this article, the use of "bank" is meant broadly as a convenient synonym for all financial services, bank OR non-bank, companies. Certainly, there are nuances between financial product company regulatory charters that may drive the difference in the effectiveness of a scale unit.

[iii] There is much literature on Randomized Control Trials. Please see this article for a nice overview: Randomized Controlled Trials

[iv] See our article for the banking legal structure impacting causality and the use of data:

Hulett, Resolving Lending Bias - a proposal to improve credit decisions with more accurate credit data, The Curiosity Vine, 2021

Judea Pearl, in The Book Of Why does a nice job of describing the importance of causation in terms of a ladder. Causality needs to get to at least the second rung of the causation ladder, whereas correlation is only at the first rung.

[v] Utilizing behavioral economics and behavioral psychology theory, BIT helps companies, governments, and related organizations improve various socially important goals. The Behavioural Insights Team is headed by psychologist David Halpern. BIT is affiliated with the United Kingdom government. It was originally chartered as a cabinet-level office. Today, it is a UK-based social purpose limited company. BIT has performed over 500 RCTs and runs over 750 projects. The following Mortgage Servicing example is one of BIT's projects.

[vi] Thaler, Sunstein, Nudge, The Final Edition, 2021

Thaler and Sunstein make a great case and provide an approach to reduce disclosure-based sludge. They call it Smart Disclosure. While it is more intended for the government, it could be used in banking and applicable for common product disclosures. While they do not advocate for a particular approach, my head goes straight to blockchain-enabled technology. What if people who close financial products provided anonymized disclosures to a central "smart disclosure" engine? This engine converts anonymized documents into data able to be loaded on a blockchain. The data could be made available to consumer-friendly apps to let users know what to "really" expect in a manner that is much more consumer-friendly than the current disclosure sets.

[vii] Key automation outcome categories, defining False Positives and False Negatives -

False Positives: A false positive error, or false positive, is a result that indicates a given condition exists when it does not. For example, a cancer test indicates a person has cancer when they do not. A false positive error is a type I error where the test is checking a single condition and wrongly gives an affirmative (positive) decision. However, it is important to distinguish between the type 1 error rate and the probability of a positive result being false. The latter is known as the false-positive risk.

False Negatives: A false negative error, or false negative, is a test result that wrongly indicates that a condition does not hold. For example, when a cancer test indicates a person does not have cancer, but they do. The condition "the person has cancer" holds, but the test (the cancer test) fails to realize this condition, and wrongly decides that the person does not have cancer. A false negative error is a type II error occurring in a test where a single condition is checked for, and the result of the test is erroneous, that the condition is absent.

Implications

Depending on the test context, the error type has significantly different implications. The cancer example is closest to banking transactional testing. That is, a false positive can be annoying or provide patient/client unnecessary apprehension. A false negative can be deadly, that is, cancer remains and is undetected. In the case of bank risk testing, a false positive can create a customer service problem or a false risk signal. A false negative can enable the very risk it is trying to detect. That is, not identifying credit, compliance, or fraud risk when it exists. False negatives are often the basis for regulatory enforcement action.

- Excerpt from our article Making the most of Statistics and Automation

[viii] List, The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale, 2022

[ix] Per Jim Collins: "The Flywheel effect is a concept developed in the book Good to Great. No matter how dramatic the end result, good-to-great transformations never happen in one fell swoop. In building a great company or social sector enterprise, there is no single defining action, no grand program, no one killer innovation, no solitary lucky break, no miracle moment. Rather, the process resembles relentlessly pushing a giant, heavy flywheel, turn upon turn, building momentum until a point of breakthrough, and beyond."

Collins, Good To Great, 2001

Appendix

The following diagram describes typical loan products most adaptable to automation (on the right side of the axis), as opposed to those least adaptable to automation (to the left). Generally, higher volume, homogenous products will be more adaptable to automation. Below are the loan products and their related features. These features help dimension the products and their relationship to automation adaptability.

Today’s automation and AI-related tools make it easier to unlock adaptability in traditional “Lower Automation Adaptability” products.

Stay Curious.

The Business and Science of Scale - How Data, Decision Insight, and Economics Drives Success

1. Implementing a scale unit flywheel - Every scale unit needs Radar O'Reilly

a. The challenge: How Incrementalism and the Pareto principle impact data science