Spellbook

Summer 2026

Timeline

4 Months, January – May 2026

Team

Scott Stevenson Mitch Hynes Matthew Stenback Patrick Frost Jack Harrhy Ethan Denny Marty Whelan+ many others

Tools

TypeScript · Next.js · tRPC · MongoDB · Mixpanel · Anthropic SDK

Overview

As a Software Engineering Intern at Spellbook with a focus on full product ownership, I shipped features end-to-end from design through release. I worked across the stack on AI-powered legal workflows, and the projects I touched were small in surface area but pretty deep in impact!

The team was small and senior, so over the course of the coop I was treated less like an intern and more like a full-time engineer. I shipped projects independently and got real ownership over product surfaces used by thousands of legal professionals.

Spellbook is an AI platform for legal work. It powers contract drafting, review, and negotiation for 4,400+ legal teams, including in-house counsel, enterprise law firms, and mid-sized practices.

It sits at an unusually interesting crossroads. Large enough that I worked alongside long-tenured engineers and product managers I learned a lot from, but lean enough that the things I shipped reached a massive base of legal users and shaped features they spent a real portion of their day inside.

The competitive sphere is sharp. Harvey, Legora, Ivo, and others. But Spellbook is building toward something broader: AI for anyone who touches contracts. Contracts run the world. The bet is that the workflow and speed of contracts should match that of commerce.

Contracts at the speed of commerce.

I had genuinely great managers. The bar for code review and product thinking was high, and my judgment was trusted early. By the back half of the coop I was scoping and shipping features that closed enterprise deals. That's a kind of ownership I didn't expect to have as a coop.

Multi-Step Workflows

Lawyers live in repeated workflows. Clause comparison runs, NDA reviews, table population, redline rounds. Spellbook was great at one-shot AI work, but it had no structured way to express "do this, then this, then this." On top of that, the agent's tool-firing at decision points was non-deterministic, which produced unreproducible failure logs in Datadog.

Auditing how power users actually built their templates, three patterns kept showing up. They wanted the AI to Ask for Files, Ask Conditional Questions, or Ask for Variable Inputs to populate documents.

Ask for Files

Pause the workflow until the user uploads a constrained set (exact-count, at-most, at-least, or any) across five file types.

Ask Conditional Questions

Render structured options as chips with a custom-answer escape hatch. The model branches based on the answer with no parsing ambiguity.

Ask for Variable Inputs

Collect form-style inputs from the user that get threaded into downstream steps as template variables.

I designed the data model with templates separated from run state, so workflow execution is always queryable and resumable. The reliability work centered on a single agent tool: a step-completion call that advances state and returns the next step's typed config. That means the frontend renders the right interaction zone from a structured response instead of guessing. That single change collapsed ~50% of the dominant Datadog error class!

The shape the model returns when it finishes a step looks roughly like this. The frontend reads nextStep.kind and renders the matching interaction zone, no guessing:

type StepResult =
  | { status: "in_progress"; nextStep: NextStep }
  | { status: "workflow_complete" };

type NextStep =
  | { kind: "autonomous"; prompt: string }
  | { kind: "ask_for_files"; constraint: FileConstraint }
  | { kind: "ask_conditional"; options: ChoiceOption[] }
  | { kind: "ask_variable_inputs"; fields: InputField[] };

Workflows were cited as a deal-closing feature in enterprise sales calls.

Swipe to see the steps in action

Ask for Files step in chat — Ask for Files. The user is given the chance to upload the documents the workflow needs.

Conditional question input in chat — Conditional Question Input. The agent renders structured options so the workflow can branch based on the user's answer.

AI agent making identifications and edits — The Agent Acts. The AI then makes the identifications and edits across the document.

Natural Language Workflow Builder

Workflows were powerful but intimidating to configure manually. Even Legal Solution Advisors had a hard time demoing them. Reading through support logs and the feedback our LSAs were channeling internally, I realized the highest-leverage move wasn't more workflow features. It was making workflows describable!

This one I scoped and built on my own initiative. I traced the request through logs, Slack signal, and what LSAs were quietly asking for, then pitched it. It shipped behind a feature flag as a premium onboarding accelerator, and was attributed to the closure of a $100k+ enterprise contract later that quarter!

A conversational builder that takes plain-English descriptions of a workflow, generates a complete schema-validated plan via Claude Haiku in a single structured-object call, and lets users iterate the proposal in natural language before it becomes a live workflow. The system prompt encodes the semantics of agent-mode step execution and the composition rules a useful legal workflow needs. For example, the last step has to produce a deliverable, not leave the user hanging on input. Defaults are filled server-side so the LLM only authors what it should be authoring.

Company Profiles

Spellbook needed to understand the company using it. Not in a vague personalization way, but in a structured legal-context way. What industry are you in? What's your regulatory environment? Who are your customers? Without that, the AI was generic. With it, the AI suddenly fit.

Zero to GA in ~3 weeks. Became the anchor feature for the in-house enterprise push!

A company profile document model covering name, description, address, industry, size, revenue, customer type, regulatory context, and legal focus areas. Profiles auto-attach to new projects via a pre-save hook, so users never have to wire context manually. A full CRUD surface, an AI-powered autofill that uses structured tool-calling to bootstrap profiles from minimal input, and an agent tool that can request a profile switch mid-conversation when the model thinks the user's query needs a different lens. Telemetry attaches the profile ID to every LLM event downstream.

Personalization

Not ticketed, not requested by management. I audited support escalations and saw a pattern: one of our enterprise accounts (a massive company in Korea) wanted the AI's behavior to change in account-scoped ways. Korean outputs. Less verbose English. A more assertive legal tone. Rather than patching with one-off prompt hacks per customer, I designed a generalized personalization system.

Three preference axes: verbosity, tone register, and a freeform custom-instructions field. Stored in a dedicated MongoDB document separate from the user record, upserted so each user has exactly one. Threaded into the system prompt pipeline at agent init and into the document-editing subagent, so style settings apply whether the user is chatting or asking the AI to edit a doc directly. Closed an entire category of recurring support escalations and quietly created the infrastructure for international expansion. No model retraining required!

Analytics & Event Tracking

Across every feature I shipped, I instrumented the analytics layer underneath it. Before this work, the team had no reliable signal on whether workflows were being started or completed, where the company-profile wizard converted, or how the natural-language builder was being iterated.

Two layers. A backend analytics service wired into the tRPC middleware that resolves identity from request-scoped storage, so events fired from inside LLM tool execution carry the right user. And a frontend hook proxied through a first-party domain to dodge ad blockers. I instrumented the full workflow funnel: step completion, workflow completion, and a reliability sentinel event that fires when the agent ends a turn without completing its step, carrying enough diagnostic context to reproduce the failure class.

I authored ~25 unique events tied specifically to the work I shipped (workflows, the NL builder, company profiles, personalization). The Datadog dashboards built on top of those events caught 3 critical drop-offs before they reached users, which felt like the most concrete proof that the instrumentation was actually useful and not just bookkeeping!

I also authored the team's analytics reference document, covering Mixpanel funnels, formulas, event-to-layer mappings, and named gaps, so the team could iterate confidently after my term ended.

Other Features

Outside the four core systems above, I shipped a steady stream of surface work that lifted the product's perceived polish. The Review Tables redesign got a slide-over panel, fullscreen toggle, inline rename, desktop push notifications on run completion, and three-format export. It became one of our most-demoed surfaces. Compare Documents got a four-column PDF export and significance-based filtering implemented at the prompt level, so the model just doesn't generate low-signal output. Advanced search across project names and message content, taskbar status indicators with collapsible per-file progress, and project auto-naming all chipped away at friction in the core chat-and-review loop.

Featured In

A couple of Spellbook's monthly product spotlights where the work I shipped went live for all teams.

Spellbook · March 2026

What's New in Spellbook · March 2026

The release that introduced Company Profiles to the product.

Spellbook · April 2026

The Revision · Spellbook's April Product Release

Covering the Tone & Style settings page and the Multi-Step Workflows file and question step updates.

The Crew

Some fun moments with the team.

What I'll Remember

Spellbook is sitting at an unusual moment. Large enough that you're surrounded by senior engineers and PMs worth learning from, lean enough that the things you ship reach a massive base of legal teams who actually depend on them. That balance is rare, and the months I was there were genuinely formative.

I'm proud of what I shipped, but more than that I'm grateful for the trust I got. By the end I was scoping work independently, owning systems end-to-end, and seeing my features cited in sales calls. That's not the coop experience I expected, and it's not one most coops get!

Thanks to Mitch for the mentorship and the bar he set, to Jack, Ethan, and Marty for the code reviews and the standards, and to everyone on the team who treated me as a peer earlier than I deserved.

Abeer DasHome