Building My Personal Operating System — One Workflow at a Time
On trust, design boundaries, and quietly outgrowing software you're still paying for.
Every year, I do the same thing.
I spend a weekend — sometimes two — reconstructing twelve months of financial decisions I already lived through. Digging through email. Searching for receipts. Piecing together context from memory because no system I pay for actually captured it the way it happened.
Not because the data is missing. Because accounting software assumes a version of my financial life that doesn’t exist.
QuickBooks is great — if your finances already look like a QuickBooks demo. Dedicated business card. Single-purpose accounts. Predictable expense flows. Clean inputs, rigid workflows, neat little categories.
Mine don’t work that way. Some expenses go on a business card. Some go through PayPal — deliberately, because it gives you an extra layer of cancellation control when companies make unsubscribing difficult — or a personal card that QuickBooks never sees. Subscriptions renew quietly across two email accounts. Tools get purchased through whichever payment method has the least friction at checkout — and not always the one connected to QuickBooks.
That’s the reality of running a solo business. And if you’ve done it, you know the friction isn’t dramatic — it’s ambient. It accumulates quietly until one weekend a year you pay for all of it at once.
So instead of forcing my behavior into accounting software, I built infrastructure around how I actually operate
“The friction isn’t dramatic — it’s ambient.”
The Problem
Nothing was technically broken. QuickBooks worked. My taxes got filed. But the process — receipts scattered across two Gmail accounts, subscriptions renewing without structured tracking, manual reconciliation fueled by caffeine and a deadline — assumed inputs that were already clean. Mine only sometimes were.
The real cost wasn’t any single expense. It was the compounding overhead of reconstructing context that no system captured.
The Build
Instead of manually reconstructing tax year 2025, I designed a reusable workflow.
Using Claude CoWork with Gmail MCP and the Chrome Extension — one integration per email account — I queried both inboxes for travel, software subscriptions, annual fees, and equipment purchases.
The agent extracted structured JSON line items, normalized vendors, dates, and amounts, grouped them by category, and generated accountant-ready outputs:
A categorized Excel spreadsheet
A companion PDF summary
Receipt screenshots
Python scripts that regenerate both as new receipts arrive
34 categorized line items. Thousands in reconciled expenses. Output contracts are explicit — subscriptions auto-total, price increases are handled at the line-item level, SUM logic is tested. Nothing is assumed correct by default.
It runs monthly now. No more annual scramble.
“A human searches for what they remember. An agent searches for what matches the criteria, regardless of whether you remember it exists.”
How I Learned to Trust It (Gradually)
I started with travel expenses over a tight timeline — a narrow enough scope that I could verify every single line item myself. If something was wrong, I’d catch it immediately. If everything checked out, I’d earned a data point about what the workflow could handle.
It checked out. So I expanded to software and tools.
The agent found subscriptions I’d forgotten about — tools I was still paying for, no longer needed, and that I never would have manually searched for. A human searches for what they remember. An agent searches for what matches the criteria, regardless of whether you remember it exists. That distinction matters more than people realize.
But I also made a deliberate design choice: the agent works from Gmail, not from bank accounts or financial systems.
Using email as the data source was intentional — it gives the agent everything it needs to extract and categorize expenses without introducing access to anything financially sensitive. The agent produces a bulk upload file for QuickBooks, but the actual import is a human step. That’s the handoff point.
The question isn’t “should agents have financial access?” It’s “does the job require it?” Scope the access to the data source that fits the job — not to what’s technically possible.
“Where you build in a checkpoint instead of trusting the vibes.”
Precision Over Blind Autonomy
For anything touching financial data, I operate with one rule: surface what you know, flag what you don’t, and never guess.
Amounts that can’t be confirmed from email get marked TBD with clear next steps. One subscription renewal appeared without a payment amount — the system flagged it rather than estimating it. Known gaps are surfaced, not papered over.
For the handful of items that need verification, I use Monarch Money — searching for a specific transaction there takes seconds. The agent gives me exactly what to look for at a glance. At my scale, there’s no reason to build an integrated workflow between the two. That would be engineering for engineering’s sake.
Most “I automated my life with AI” posts skip this part. The interesting design work isn’t in the automation — it’s in deciding where the automation stops. What gets a human review. What earns a TBD instead of a hallucinated number. Where you build in a checkpoint instead of trusting the vibes.
The goal isn’t automation for its own sake. It’s trusted infrastructure.
“At what point am I paying for a spreadsheet with a login page?”
The QuickBooks Question
QuickBooks still runs my books. I’m not canceling it tomorrow. But I’m increasingly aware that the functionality I actually use — the part that earns the subscription fee — is shrinking.
The messy upstream work that QuickBooks was never designed for? That’s what the agent handles now. Receipt extraction, vendor normalization, expense categorization, subscription tracking — all of that happens before QuickBooks ever sees the data. QuickBooks gets a clean upload. It does what it was always meant to do: be a ledger.
But if the agent is doing all the work that used to justify the software... at what point am I paying for a spreadsheet with a login page?
To be responsible, I’m running both in parallel at this point. The agentic workflow is proven for my use case, but “proven for me” and “reliable enough to bet my tax compliance on” are two different thresholds. I’d love to cut the cost. But I’m not going to optimize my way into an audit.
That tension — between what’s working today, what’s safe to depend on, and what might change next week when the model or tooling updates — is the real conversation about agentic workflows replacing SaaS. It’s not a binary. It’s a migration you earn through iteration and understanding what you’re building on.
The Operating System
This expense workflow isn’t a one-off project. It’s one module in something larger.
I’ve been building what I think of as a personal operating system — a set of agentic workflows designed around how I actually live and work, not how any single piece of software assumed I would.
So far, the system includes:
File organization — agents that keep my hard drives, local files, and cloud storage catalogued and navigable.
The vinyl project — an agent that reconciled my physical record collection with purchase history and Discogs listings. (I wrote about that one here.)
Birthday party logistics — an agent that handles planning and booking for my kid’s birthday party. If you think expense tracking is messy, try coordinating multiple skating rinks’ availability, handling children’s dietary restrictions, a bakery order, and twelve conflicting parent schedules.
Coming soon: the Hi-Fi schedule reconciler. I host Hi-Fi deep-listening nights at Shibuya Hi-Fi, and, like any venue, there are multiple schedules that need to stay in sync for the experience to come together. When they drift, the guest experience is what’s at risk. An agent that monitors for discrepancies and flags them early means I show up prepared. And no off-the-shelf tool solves it, which is exactly what makes it an agentic use case and not a software purchase.
Each one follows the same pattern. Start narrow. Prove it. Expand scope. Define output contracts. Build in checkpoints. Make it repeatable.
When Is a Workflow “Ready”?
I keep coming back to five criteria:
1. Context — the agent clearly understands the domain it’s operating in
2. Intent — the purpose of the workflow is well-defined, not vague
3. Output contracts — what the system produces is explicit, testable, and versioned
4. Proven trust — the workflow has been iterated on enough that you’ve seen it handle edge cases
5. Repeatability — it runs without you babysitting it, and produces consistent results
If any one of those is shaky, it’s still a prototype. Which is fine — prototypes are how you get there. But calling a prototype “production” is how you end up debugging your tax return.
What This Is Really About
Most software is designed for the average case. Agentic workflows are designed for your case.
That’s the shift. Not “AI replaces software.” But for solo operators whose lives don’t fit neatly into product assumptions, well-designed agents flip the dynamic. Instead of adapting to the software, the system adapts to you.
Not by prompting and hoping. By designing with boundaries, contracts, and iteration built in.
One workflow at a time.
What’s the first workflow you’d build?
I’d love to hear — reply here or find me on LinkedIn.
Workflow built with Claude CoWork, Claude Chrome Extension, Claude MCP, and Python.



