Spotting Accounting Shenanigans with AI

Hey, it’s Matt. You’re reading AI Stack, a bi-monthly interview series exploring how investors are adopting AI. This week:

🕵‍♂️ The CEO of Transparently.ai, Hamish Macalister, PhD, on identifying red flags with AI.

Forwarded this? Subscribe here. Join readers from Citi, the Fed, JPMorgan & more.

INTERVIEW

I spent six years writing about white-collar crime for Bloomberg News. In that time, I learned that accounting fraud cases were among the longest for the SEC to investigate and the hardest to bring.

I was surprised. I naively thought, “Well, if the company is cooking the books, eventually folks will find out, right?” But that’s not always the case. Bad actors can use events outside their control—like COVID—to bury years of weak numbers.

It’s just hard to police accounting statements. And even if you latch on to what you think is a significant issue, sometimes it doesn’t matter because the company is massive. I once wrote about a company that stuffed six months of revenue into a quarter and the market basically shrugged. Granted, this detail does not inspire confidence.

Ambani’s Mobile Startup Packs 6-Month Sales Into a Quarter

A review of Jio’s unaudited results for the last year shows that the wireless venture and its parent relied on a series of accounting decisions that wound up portraying Jio’s financial performance in the best possible light.

To dig deeper into these challenges, I spoke with Hamish Macalister, co-founder and CEO of Transparently.ai, which uses traditional AI and large language models to assess signs of accounting manipulation.

Transparently.ai rates the accounting health of 80,000+ public companies on an A-to-F scale, flagging early signs of manipulation and potential failure. Founded in 2021, the Singapore-based company counts two of the Big Four auditors as clients and money managers overseeing $4 trillion in assets.

Macalister worked as a macro strategist at Citigroup, led quantitative strategy in Asia at Deutsche Bank, and later served as chief data scientist at Firth Investment Management. He also earned a PhD in finance, where his doctoral research on analyst forecasts laid the groundwork for Transparently.ai’s approach.

In this interview, you’ll learn:

Why accounting manipulation is more common than most investors think.
How avoiding high-risk companies based on these scores can generate meaningful alpha.
Why auditors and analysts miss red flags—and how AI can surface them.

This interview has been edited for clarity and length.

How does Transparently.ai help investors evaluate accounting nuances across industries?

This is a perfect problem for machine learning because it’s very complex and multidimensional, but also one for which there’s a great deal of data. That combination makes it well suited to machine learning.

There may be relationships a person would struggle to identify, but a machine can. Another advantage is that the machine isn’t wedded to traditional ways of thinking. For example, it might pick up on signals that an activist short seller would look at, but from a different angle.

One of the red flags might be unusually high margins—possibly a sign the company is faking revenue or hiding costs. That’s a classic example of what an activist short seller might look for.

From a machine learning or AI standpoint, the system might learn something similar: unusually high margins can be a warning sign. Our system does flag that from time to time. But it can also flag unusually low margins if it detects that certain combinations of features—low margins alongside other factors—may indicate a company is doing something unusual.

Machine learning can identify very complicated patterns that may not be intuitively obvious. The one thing I’ll add to that is it cannot just be a black box—unless you’re a quant and all you care about is the black box. In that case, all you want is the numerical output: the risk, the number, the indicator.

But for most of the users we deal with, they want some sort of explanation behind this. So it’s critical to design the system not only to provide an indication that something unusual may be happening in a company’s accounts, but also to explain why and how. It should guide what you need to do next: what questions to ask management, what areas to investigate, and what procedures to implement if you’re an auditor, given the specific features of that company.

What was the a-ha moment for you to start this company?

The a-ha moment came when I was a quant fund manager marketing my fund, talking to private wealth advisors and others. I would casually mention accounting manipulation, since to me it was a very small part of the process.

But the reaction across the table was visceral—people would literally stop me mid-sentence: “Wait, stop there. How do you do that? I didn’t know that was possible.”

I kept hearing it again and again: “I didn’t know it was possible to quantify aspects of accounting manipulation, or to quantify the quality of the accounts.”

I heard it so many times that I started thinking: first, this is amazing, because clearly nobody seems to know about it. And second, while there’s actually quite a significant body of academic research in this space, very few people are aware of it—because very few people read accounting journal articles unless they have serious sleeping problems.

How big is this issue?

It’s a multi-trillion-dollar-a-year problem, and forget about what we say: there’s academic research that shows it. It’s a monster pain point for which, as far as we could tell, nobody else had come up with a solution.

Independent academic research finds that, in the U.S., on average 40%—four zero—of companies manipulate their accounts every year. That’s astonishing. Manipulation can range from something mild and permissible to outright fraud. At the extreme, the same research found that 10% of companies commit securities fraud annually. That’s mind-boggling.

Now, let’s just take that 10%. If you knew in advance which companies were doing this, you wouldn’t touch them with a barge pole. The numbers don’t matter if you can’t trust them. And if you can’t trust the numbers, your investment analysis breaks down.

But what we realized was that there wasn’t much understanding of just how widespread this problem is.

If you’re talking to, for example, the audit assurance team of a Big Four auditor, they know how big a problem it is because they see it firsthand. But for your typical investor or bank asset manager, while they recognize it as a pain point, there isn’t necessarily an appreciation for just how large it really is.

That’s why we started producing research showing, for example, the magnitude of return differentials between high-risk and low-risk companies in our system using true point-in-time data. This isn’t about a backtest.

We generate our risk scores and ratings for companies, then track their performance over the next 1, 3, 6, 9, 12, 24, and 36 months. We looked across all these different periods and compared the performance of high-risk companies versus low-risk companies.

What we found was that the alpha was far larger than we expected. This system keeps surprising us—every time we look at it from a new perspective, the impact is even more dramatic.

Importantly, we didn’t design it to do that. We designed it to identify corporate collapse, or the likelihood of collapse, over a two-to-three-year lead time. But what we discovered is that there’s very significant alpha in our work even for one-month holding periods, which was really surprising.

Instead of comparing the best and worst companies, which mimics a long-short portfolio, the question is: what happens if a typical billion-dollar fund simply avoids the worst companies above a certain risk threshold? What difference does that make to return and risk performance? Once again, the numbers are ludicrous. And because it’s just a matter of not holding something, there’s no issue with trading costs or implementation.

How are the ratings calculated? Are they deterministic, traditional AI with a generative component?

It’s partly deterministic and partly non-deterministic. The predictive AI component definitely has an element of being non-deterministic, though not as much as the generative AI component.

I remember this *Journal story. They highlighted academic work showing that 0.4 doesn’t show up in earnings, because if you want it to be 0.5, you round up.***

This is a very good example, and it’s well known in the academic literature. If you think about the distribution of earnings surprises, imagine a bell curve. The peak is usually around a small positive surprise. Then you get long tails in both directions. That’s what you’d expect.

But in reality, the distribution looks like that with one exception: you don’t see small negatives. Instead, there’s a dive down and then a jump back up around small negative surprises. That goes to your point: companies know you don’t want to report a small negative surprise. If you’re going to miss, make it a big miss, because either way the market will react badly.

So you get these biases in earnings surprises that create very odd mathematical patterns—yet they make intuitive sense. Companies don’t want to disappoint, and if they have to, they’d rather do it in a big way.

Can you talk about the landscape now? Are there any particular sectors that the model flags as being higher risk?

We definitely see some variation over time in accounting quality or the extent of manipulation.

I'll give you two examples. During COVID, we saw that for most companies, their accounts really didn't change in terms of the extent of questionable activity that was going on. It made no difference whatsoever, with one key exception: anything that had exposure to travel. For example, we saw cruise liners, airlines, and similar companies - and this was a global phenomenon - whose accounts went haywire. You could simply attribute that to the fact that they were exposed to unusual and adverse circumstances.

I think there is an element of that. But our work also suggested that some of these companies, not all, but some, were using COVID as an opportunity to take a big bath, essentially using the pandemic as an excuse to clean house.

What was particularly interesting was what happened post-COVID. We looked at which airlines saw their accounting quality normalize—where the unusual activity we observed in their accounts returned to normal—and which ones didn't. We definitely saw some that didn't normalize. In fact, we saw companies where their accounts started deteriorating during COVID and continued to get worse afterward.

In terms of sectors, there's one sector that stands out in our system, and we saw this from day one as having a higher, on average higher risk signal in our system than we saw for any other group of companies.

And that's pharmaceuticals and biotech.

Firstly, I should explain that our system explicitly takes into account certain sector features, certain country features, and certain accounting standard features. All of these are embedded, so it's not like every company is being compared in exactly the same way as every other company. But even then, pharmaceuticals and biotech were standing out with higher-than-average risk signals.

So we looked into this and decided that there was no valid argument for us to make some additional adjustment feature for pharmaceuticals and biotech. Why? Because we felt the system was correctly identifying that the accounts of these companies are simply much riskier and much harder for an investor—for an end user—to interpret and to really understand what is fundamentally going on with that company.

What do I mean by that? For example, R&D expenditure: Are they expensing it upfront? Are they spreading it out over time? Are they holding onto it? Revenue: Are they booking revenue for some long contract—the whole lot upfront? Or are they recognizing it over time, or allocating it differently?

Our system has learned, for good reason over several decades of analysis, that when you see companies exercise lots of the flexibility that's available to them—because accounting standards have some degree of flexibility—that this is an indication that the company's accounts are much riskier. It's much harder to understand what's going on.

So consequently, yes, we see on average higher risk signals for pharmaceuticals and biotech. We do not see that as a problem with the system. We see that as the system correctly identifying that they are just harder for you, the end user, to assess what is really fundamentally going on with this company.

Anything I should have asked you about but didn’t?

Yes, there is something that comes to mind. It goes back to you asking me what was the a-ha moment. So, we sometimes get people expressing surprise that we're able to do what we do because they say, oh, you must need access to special data, like the general ledger of the company.

And the answer is no. All we need is the final financial statements. Yes, there's a few additional bits and pieces, but mostly we're just using final financial statements, and yet we can identify these companies with an incredibly high level of accuracy. We have virtually zero in the way of false positives once you start getting into the higher risk levels of our system.

The flip side to that is we don't guarantee we get every manipulating company. So we actually created an inherent bias in our system in that it minimizes false positives. We can't get every manipulating company. We get most of them, but we can't get every one. But if the system says it is manipulating, it does with a very high level of probability, virtual certainty.

Now, I'm not here to be rude about auditors. But the standard audit process does not do what we do, even though I think most people think it does. If you are hiring an auditor to do a deep forensic dive, a forensic investigation on a company, that's a different story. But that's not what the standard audit process does.

Independent research found that when there is wrongdoing in the accounts, the external auditor finds it 3% of the time. They missed 97% of this activity. This is my all-time favorite statistic.

A lot of analysts take a company’s earnings numbers as gospel, right? I think part of the reason these sorts of shenanigans persist is that the market's not very good at ferreting them out.

I'm not here to bag analysts. I spent a large part of my career working with analysts. And I've been an analyst in the sense I've been a macro strategist and a quant.

When it comes to stock analysts or debt analysts we know there's lots of research that shows most analysts’ expectations are anchored around the company guidance.

There are not that many analysts who go out on a limb. Now actually the academic research shows that analysts who do go out on the limb typically have high levels of value to the end user in some way, shape or form relative to those that don't.

But they usually have such strong anchoring tendency around company guidance, whereas a system like ours couldn't care less what company guidance is. We don't use company guidance in any way, shape, or form. We are just looking at what these accounts look like. Are they a completely valid, normal set of accounts that you would expect from the company?

Or does it look like there's some sort of funny business? And if it looks like there's some sort of funny business, what is it? Where is it in the accounts and how is it that they're doing this?

Spotting Accounting Shenanigans with AI

INTERVIEW

How does Transparently.ai help investors evaluate accounting nuances across industries?

What was the a-ha moment for you to start this company?

How big is this issue?

How are the ratings calculated? Are they deterministic, traditional AI with a generative component?

I remember this *Journal story. They highlighted academic work showing that 0.4 doesn’t show up in earnings, because if you want it to be 0.5, you round up.***

Can you talk about the landscape now? Are there any particular sectors that the model flags as being higher risk?

Anything I should have asked you about but didn’t?

A lot of analysts take a company’s earnings numbers as gospel, right? I think part of the reason these sorts of shenanigans persist is that the market's not very good at ferreting them out.

RECAPS

In Case You Missed It*

How did you like today's newsletter?

Reply

More From AI Street

AI Street Media

Spotting Accounting Shenanigans with AI

INTERVIEW

How does Transparently.ai help investors evaluate accounting nuances across industries?

What was the a-ha moment for you to start this company?

How big is this issue?

How are the ratings calculated? Are they deterministic, traditional AI with a generative component?

I remember this Journal story. They highlighted academic work showing that 0.4 doesn’t show up in earnings, because if you want it to be 0.5, you round up.

Can you talk about the landscape now? Are there any particular sectors that the model flags as being higher risk?

Anything I should have asked you about but didn’t?

A lot of analysts take a company’s earnings numbers as gospel, right? I think part of the reason these sorts of shenanigans persist is that the market's not very good at ferreting them out.

RECAPS

In Case You Missed It*

How did you like today's newsletter?

Reply

More From AI Street

AI Street Media

I remember this *Journal story. They highlighted academic work showing that 0.4 doesn’t show up in earnings, because if you want it to be 0.5, you round up.***