AI Street
Posts
Vibe Coding SEC Filings

Vibe Coding SEC Filings

Matt Robinson
May 04, 2025

Hey, it's Matt. Welcome to AI Street Markets, where I highlight AI investing tools. Was this email forwarded to you? Sign up below:

VIBECODING

When The Vibes Aren’t Vibing

In the last Markets edition two weeks ago, I planned to try some “vibe coding” as the kids say. (Vibe coding = AI that writes code based on plain English requests)

Well, the vibes were not with me.

Despite spending most of the last two days in front of the computer, I don’t have a ton of analysis to show for it. I feel like Will Ferrell in Old School blacking out in the middle of a debate.

So I’m going to walk through what I planned on doing and what I was actually able to get done.

To recap, two weeks ago, we talked about DataMule, an open source project for getting organized SEC data cheap. (Yes, SEC filings are technically free, but organizing tens of thousands of documents — some in SGML, some in XBRL, many with inconsistent formatting — has historically been slow, expensive, and messy.)

My goal, I thought, was straightforward:

Download every 10-K filed last week
Extract the “Management’s Discussion and Analysis” (aka Item 7)
Drop all of it into an Excel or text file
Feed it to a large language model to compare themes across companies

For example:

What are consumer companies saying about inflation?
Which tech firms are citing competition from AI vendors?

Sounds simple enough — but in practice? Not quite.

DataMule is an early-stage software project run by John Friedman. He is one guy and I majored in Philosophy/English with virtually no coding experience, so it’s not terribly surprising that I hit some roadblocks. I couldn’t even get the example scripts to execute properly at first. After a few hours of back-and-forth with AI, I broke down and emailed John.

He responded quickly, confirming that the documentation was a bit out of date — a side effect of how fast he’s iterating on the project. He shared a working download script and a few pointers, and that helped me finally get a clean pull of the week’s 10-K filings.

(If you want to try this yourself, install the library first: pip install datamule)

from datamule import Portfolio

def main():
    # Create a Portfolio object
    print("Creating Portfolio object at folder 'tenk_last_week':")
    portfolio = Portfolio('tenk_last_week')

    # Download 10-K submissions from last week
    print("Downloading submissions:")
    portfolio.download_submissions(
        filing_date=('2025-04-28', '2025-05-03'),
        submission_type=['10-K']
    )

if __name__ == "__main__":
    main()

So after downloading the right filings to my computer, I needed to parse them (read: organize them).

Historically, this has been a tricky problem. Even though SEC filings follow a loose structure, there’s enough variation to make simple rule-based parsing brittle. You might see “Item 7,” “ITEM 7,” “Item: 7,” or “Management’s Discussion and Analysis.” You can see why a parser would struggle with all of this ambiguity.

That’s where large language models come in. Unlike traditional parsers, LLMs can handle ambiguity. They don’t need exact formatting. They can infer what you’re asking for, even when the filing structure is inconsistent — which, in practice, it often is.

So I signed up for Cursor, which might be one of the hottest companies in tech right now.

What is Cursor?

Cursor is a coding tool built on top of VS Code, with AI features including smarter autocomplete, bug fixing, and project-aware assistance. It’s taken off, reportedly generating $300M just two years after launch.

You can sign in with GitHub, open a folder or clone a repo, and press Cmd+K to start prompting the AI directly. I used it to walk through writing a parser, fixing import bugs, and structuring output for export ← all things I definitely could not have done by myself.

Then I connected it to Gemini 1.5 Pro, Google’s latest model, using my free Google Cloud credits.

Gemini handled the actual parsing step: extracting the Item 7 section from each 10-K. I wrote a script that looped through the downloaded filings, pulled out candidate HTML files, and passed the full text to Gemini with a prompt like:

“Extract the Item 7 (Management’s Discussion and Analysis) section from the following 10-K filing.”

It worked pretty well. I was able to identify the right sections in many filings, but not all. Some outputs were empty, and others were tripped up by formatting issues.

So, the idea that non-coders can do complex work with the help of AI is not true — at least not for me and not right now.

On a more positive note, I tried learning Python like a decade ago when I was at Bloomberg. I was just getting through the basics when I realized that it would take tons more work to really excel with it. With a full-time reporting job, I just didn’t have enough time to really build those skills.

This week, with AI, I “accomplished” a lot more and I’m actually motivated to learn how to use these tools since AI is basically the most patient teacher ever. You can keep asking questions and you’ll mostly get correct answers.

I’m going to keep at it and share how this progresses.

DELIVERY

Click here if you’d like to change your settings.

ICYMI

Check out the last few editions on using AI for investment analysis,* creating customized news feeds and tracking earnings call mentions:

*Not investment advice

How did you like today's newsletter?

Reply

or to participate.