• AI Street
  • Posts
  • The Open-Source Project That’s Making SEC API Calls Cheap

The Open-Source Project That’s Making SEC API Calls Cheap

Hey, it's Matt. Welcome to AI Street Markets, where I highlight AI investing tools. Was this email forwarded to you? Sign up below:

MATT’S NOTE

Today’s edition is about using an open-source tool, DataMule, to extract data cheaply from the SEC.

I had planned to include a detailed how-to section, but juggling two kids and Easter kept me from that. 😅 

So, in the next Markets edition, out on May 6, I’ll go deeper into how to leverage DataMule.

Hope you’re enjoying your Sunday!

DATAMULE

Making SEC Data Cheap

John Friedman wants his financial data platform, DataMule, to make SEC data dirt cheap.

Friedman, who used to work for MIT economist Simon Johnson, is building the ambitious open-source project with no staff, no pitch deck and no business model.

"I just want it to exist," he tells me.

The idea started when one of his PhD classmates at UCLA couldn’t pursue a research project because the dataset they needed cost $35,000.

So Friedman built a 1-million-row executive officers dataset from 20 years of SEC filings using just $5 of Google Cloud credits.

His approach blends traditional coding with AI.

“I write algorithmic parsers because they are very fast and cheap,” he told me. “Then I feed the parsed data into a LLM if I need context.”

Using DataMule’s SEC archive, you can download every 10-K since 1993 for about $2. For comparison, the SEC’s API is about $350 for a similar pull. Friedman thinks he can cut costs another 1000x by next year.

“Once data is cheap,” Friedman says, “I want to build an analytics layer to answer questions like: What companies are affected by tariffs on Canada?”

Unlike many companies pushing AI as the front-end product, Friedman’s stack treats LLMs like a secondary tool—useful, but not essential. He’s more interested in making the data accessible and reproducible first. The analytics layer, he says, comes second.

That’s the idea behind Indicators, a kind of minimum viable product for building structured economic signals from unstructured filings. Right now, it’s basic. But Friedman says he’s planning a full rewrite that will incorporate a wider range of form types beyond just 10-Ks.

He’s also working on LLM-based search for regulatory filings. The plan: build a simple tool where a query like executive departures citing strategic disagreements triggers an LLM to generate keywords and search a NoSQL database of 8-K Item 5.02s. Estimated cost per query? Roughly $0.00001.

Friedman isn’t looking to raise money at the moment. But he’s open to sponsorships or credits, particularly for LLM APIs and database infrastructure.

His users today are mostly PhDs, retired engineers, hedge fund quants, and technical researchers.

If you’re sitting on any unused API credits, feel free to reach out to John via LinkedIn or I’m happy to make an introduction.

ICYMI

On the first and third Sundays of the month, I publish this Markets edition, which goes through the latest AI tools* for investors.

I’m always looking for more platforms and open-source tools, if you have any ideas, please reach out: [email protected].

Check out the last few editions:

*Not investment advice

How did you like today's newsletter?

Login or Subscribe to participate in polls.

Reply

or to participate.