Work · 2023 to 2026

End-to-end AI systems. Shipped. Running.

Pipelines that turn briefs into deliverables. Agents that replace one-off code with reusable skills. Prompt systems behind apps with $25M+ ARR.

Worked with

Anthropic Red Team

Tested safety systems on unreleased Claude models. Autumn 2024 plus Spring 2025 cohorts.

Got invited to Anthropic's red teaming program where I tested safety systems on unreleased Claude models. I was part of both the Autumn 2024 and Spring 2025 cohorts.

The job was straightforward. Try to break Constitutional Classifiers and other safety mitigations Anthropic built for ASL-3 deployment. I focused mostly on finding universal bypass vectors across CBRN domains (chemical, biological, radiological, nuclear).

Had access to internal research models before they went public, including prototype safety classifiers. Tested multiple model iterations and defense layers across program rounds. Work fed into Anthropic's broader effort to stress test frontier AI safety before release.

AdEngine

End-to-end automated ad pipeline. Brand brief in, statics plus videos plus compliance out, in about an hour.

AdEngine is a production pipeline I built at SynthWeave. It takes a brand or product brief, including what's being advertised, target audience, voice, and key value props. From there it runs autonomously to produce a complete ad deliverable set.

The output is a full media-buy-ready package: static image ads in multiple creative variations, production-ready video ad scripts with shot direction and pacing, fully rendered video assets, compliance checks, and structured taxonomy data for downstream ad operations.

Once kicked off, it doesn't need a human between steps. The largest single run I've benchmarked produced 100+ static creatives plus 25+ rendered video ads in roughly an hour. Work that traditionally takes a creative team days or weeks now moves in a single hour.

Statics per run: 100+
Video ads per run: 25+
Time per run: ~1 hr

Video Creation Skill

A skill that turns any input source into a complete output. Built once, used everywhere.

A skill I built that takes any input source and produces a complete output. The hero video on this page is one of its outputs.

Agentic Workflow Builder

Replaces custom-coded AI apps with skill-based Claude Code workflows. Lower cost. Tighter quality. Faster iteration.

Take a task description as input. The builder outputs a complete Claude Code setup: all the skills it needs, a CLAUDE.md, supporting files. The team then runs the workflow on the agent's usage subscription instead of building a custom application against the API.

The strategic shift mattered more than the tool itself. We were building one-off vibe-coded apps with ongoing API costs that scaled with usage. Moving the same work onto an agent's fixed-cost subscription dropped spend, tightened quality (skills are tighter than ad-hoc prompts), and made the work reusable across projects.

Pipeline thinking applied at the meta-level: building tools that build tools.

Prompt Engineering

Years of prompts encoded into a system that generates, tests, and benchmarks faster than I ever could by hand.

I spent years writing prompts manually across SynthWeave, Avenor, FlowGPT, Width.ai, and all other clients on Upwork. That accumulated expertise is now a pipeline: generate prompts for new use cases, test them automatically across edge cases and quality dimensions, benchmark models against the prompts, and build internal evals tailored to specific systems.

The narrative shift mirrors where I am professionally: I used to write prompts. Now I build systems that write prompts. The pipeline iterates faster than a human can and produces evals that actually map to real production conditions, not cherry-picked benchmarks.

Slides Generator

Skipped the template entirely. Generated each slide as a single image. 15 minutes per deck.

The team needed a fast way to visualize new pipelines and system designs so others could absorb them without reading walls of text. The default solution most engineers would propose: generate slide content with an LLM, inject it into a designed slide template. Templates handle layout, LLM handles text.

That approach can't deliver what real presentations need: different visual treatments per slide, mixed assets, art direction, style consistency without uniformity. Template-injection always looks like template-injection.

My approach: skip the template. Generate each slide as a single image directly through the image model. Let the image model handle content, layout, styling, and visual elements together, in one pass per slide. Then assemble. The first slide acts as a style reference for every subsequent slide, so the deck holds together visually. Slide generation runs in parallel so the whole deck takes roughly 15 minutes.

This is the kind of edge that comes from knowing the full surface of what a model can do. Most people see one path and follow it. I'd rather route around the obvious approach when there's a better one.

Time per deck: ~15 min
Style consistency: Held by reference

Vert Ventures Apps

Production prompt engineering behind a portfolio of $25M+ ARR mobile apps.

Built and shipped the prompt systems behind multiple high-growth bootstrapped iOS apps. Each app is live on the App Store; the prompts inside them ran in production at scale.

Astra (life advice) and Haven Bible Chat are both public. As publicly stated in a LinkedIn post by someone associated with the project, Haven Bible was generating around $300,000/month in revenue.

Prompts were tuned for quality, speed, cost efficiency, and scale. Each app had its own user base, voice, and constraints; the prompt systems had to fit each one.

“You cannot go wrong with hiring him. Not only does he create fantastic prompts, he also does a rigorous examination of the LLMs to figure out which one is best for you. Would absolutely hire him again.”

Verified on Upwork ↗

More work has shipped under NDA. Happy to chat and find a solution that fits yours.

Start a chat