Four-Leaf shipped a Model Context Protocol server at four-leaf.ai/api/mcp and an MIT-licensed Skill wrapper at github.com/fourleafai/clover-public. Together they bring eleven job-search and interview-prep tools directly into Claude Code, Claude Desktop, ChatGPT Desktop, Cursor, Cline, Continue, Windsurf, Perplexity, and the OpenAI Codex CLI.
This post is the engineering write-up. Why the architecture looks the way it does, what was harder than expected, and the design choices another team building an OAuth-MCP would want to copy or avoid.
Why this exists
Most AI interview prep tools are trapped in their own app. A candidate opens a separate tab, signs in, pastes a job description into a form, and waits. The AI assistant they were already talking to (Claude, Cursor, ChatGPT) isn't part of that loop.
The Model Context Protocol changes the shape of the problem. If the tools live in the assistant itself, the candidate doesn't context-switch. They ask, the tools run, and the conversation continues. The right place for job-search and interview-prep tools is wherever the candidate already is.
That's the product thesis. The architecture follows from it.
The pipeline
Eleven tools, nine free, two paid. The free ones are the read and compute path that makes the MCP earn its install.
search_jobs hits a nightly-scraped pool of 100,000+ active postings from Greenhouse, Lever, Ashby, and Workday. A natural-language parser pulls role, level, location, employment type, and remote-only flags out of the query before the SQL runs.
get_role_intelligence and list_roles expose a structured catalog of twenty-three roles, each with a pipeline description, scoring rubric, and resume guidance.
get_interview_questions pulls from a curated question bank. generate_practice_questions produces fresh questions on demand using Claude Haiku with role-calibrated prompts that include tips and key points.
match_score runs a real scoring algorithm against a resume and a job description, returning a 0-100 fit number plus skills, experience, and role-alignment breakdowns. It penalizes bare skills-list mentions that don't show up in bulleted work history, which is the right behavior.
explain_interview_format synthesizes role intelligence plus the candidate's specified seniority and optional company into a grounded walk-through.
comp_coach and comp_benchmarks are the comp pillar. Both are described in detail below. The two paid tools, start_voice_mock_interview and tailor_resume, return deep-links that open the corresponding pages in the four-leaf.ai app with the candidate's context already pre-filled.
The chain is the moat. Find a job, know the interview, practice it, tailor a resume, run the mock, decode the offer. No general-purpose AI does that end-to-end without dedicated infrastructure.
OAuth 2.1 + PKCE + Dynamic Client Registration
API key copy-paste is the standard MCP authentication pattern. A user generates a key in a dashboard, pastes it into their MCP client config, and hopes they don't accidentally commit it. The candidate audience Four-Leaf serves is not that audience.
The MCP server uses OAuth 2.1 with PKCE and Dynamic Client Registration instead. The flow:
- The user installs the MCP. Their AI client opens the browser to
four-leaf.ai/api/mcp/.well-known/oauth-authorization-server. - The client registers itself via DCR. No pre-registration, no app store, no waitlist. Any MCP-aware client can connect.
- The user authenticates with their existing Four-Leaf account. PKCE protects the authorization code exchange.
- The MCP client stores a bearer token. The Four-Leaf account is the source of truth for paid status going forward.
The payoff is that the same account that powers the consumer product (subscription tier, voice mock history, saved resumes) is what's authenticated when the MCP tool fires. No double-billing, no second login.
The cost is implementation. OAuth 2.1 with PKCE and DCR is more code than a static API key check. Standard server-side OAuth libraries don't always handle the DCR endpoint correctly. The .well-known discovery endpoint has to be precisely formatted.
Server-side web search for comp benchmarks
The hardest tool to get right was comp_benchmarks. The brief sounds simple: a user asks "what's a good salary for a senior backend engineer in Austin" and the tool returns a cited band.
The first implementation tried client-side web search. The MCP tool would respond with an instruction telling the AI client to run its own web search using levels.fyi, Glassdoor, and Payscale. This failed in two ways.
First, MCP clients use their own web search inconsistently. Claude Code with web search enabled would sometimes run a search and sometimes ask a clarifying question instead. Different sessions, same prompt.
Second, MCP clients without web search couldn't do anything. The MCP tool delegating to a capability the client doesn't have just produces dead-end conversations.
The fix was server-side. The new comp_benchmarks tool attaches Anthropic's web_search_20250305 tool to a Sonnet call so the server runs the searches on Four-Leaf's API key, then returns a structured response with cited salary bands, named sources (levels.fyi, Glassdoor, Payscale), and a confidence rating per claim.
The trade-off is real money per call. Web search bills per query, the Sonnet wrapper costs more than Haiku, and the typical call runs 3-5 searches in 30-60 seconds. A 20-call-per-day per-user cap bounds the cost and is more than enough for a candidate working through one or two competing offers.
The architectural lesson generalizes. When an MCP tool needs a capability the client may or may not have, the reliable path is providing it server-side. Don't outsource correctness to whatever the client decided to install.
The 60-second client timeout
Some MCP tools fail in a particularly silent way. The server-side test rig runs the tool, the response comes back, everything looks fine. The tool then ships and fails for every real user.
The culprit is the client's tool-call timeout, which sits around 60 seconds in most MCP clients. This is not the same as the server's function timeout (Vercel allows up to 300 seconds, and Four-Leaf's MCP route is configured at 120). The client gives up on the tool well before the server gives up on the response.
The comp_coach tool ran into this. It's a full negotiation analysis. An offer goes in, a structured memo comes out with total compensation math, market comparison, component-by-component analysis, red flags, and a counter strategy with specific talking points. The original implementation used Sonnet at 8192 max_tokens. Generation took around 103 seconds and produced a clean response every time when tested against the API directly.
In Claude Code, the same call failed every time. The MCP client timed out at 60 seconds and reported the tool unavailable.
The fix had three parts. Switching from Sonnet to Haiku, which generates the same structure several times faster. Tightening the prompt with explicit caps on talking points, red flags, and prose length. Cutting max_tokens from 8192 to 4096 as a hard guard against a runaway generation.
The new tool returns in around 38 seconds on a fully loaded offer (base, equity, signing bonus, competing offers, priorities, constraints). That leaves about 22 seconds of margin under the client timeout for network jitter and rate-limit retries.
The lesson: verify end-to-end through an actual MCP client every time, not just server-side smoke tests. The 60-second client budget is the real constraint.
Single-use stash for heavy text handoff
The MCP tools that return deep-links into the four-leaf.ai app need to pass context across the boundary. The candidate paste a job description into Claude, the tailor_resume tool builds a URL, and the user clicks. The landing page should pre-fill the form with the same JD, not ask for it again.
URL query parameters are the obvious mechanism. Light context (role, level, interview type) rides in the URL without issue. Heavy text breaks. Most browsers tolerate URLs up to about 2,000 characters, but a real job description plus a real resume routinely runs to ten or twenty thousand characters. Stuffing that into a query string is unreliable across email clients, social link previews, and analytics pipelines.
The solution is a server-side stash. A new Postgres table, mcp_handoff_stashes, with the following constraints:
user_idforeign key with row-level security ensuringauth.uid() = user_id- A
consumed_attimestamp that's null on insert and gets stamped on first read - A 15-minute
expires_atTTL that protects against stale data - A
contextJSONB column for the actual payload
When an MCP tool needs to hand heavy text to a landing page, it inserts a row, takes the returned UUID, and appends ?stash=<id> to the deep-link URL. The landing page consumes the row by id under RLS, marks it consumed, and pre-fills the form. The combination of single-use consumption and short TTL means a browser refresh after the first consume keeps the user's edits rather than re-applying the original MCP context.
The RLS policy is what makes this safe. The MCP tool runs with the admin client (bearer-token auth) and bypasses RLS for the insert. The landing page runs with the user's session client and is bound by the policy. Even with a leaked stash UUID, another user couldn't read someone else's context.
The whole thing is about forty lines of SQL and two TypeScript helpers. It's the kind of pattern that feels obvious in retrospect.
The open-source Skill wrapper
The MCP server is hosted. The Skill that wraps it is open source.
four-leaf-coach is a Claude-compatible Skill that installs into Claude Code, Cursor, OpenAI Codex CLI, and GitHub Copilot via one npx command:
npx four-leaf-coach add
The CLI detects which tool is in use (or accepts a --tool flag), copies the right bundle into the right place, and prints the MCP install command for live data. The bundle structure varies per tool. Claude Code and Cursor read directories of references. Codex reads AGENTS.md plus a references tree. Copilot reads a single flattened instructions file. The build script generates one bundle per target from a single source.
The Skill itself is a routing and coaching layer. Each workflow (kickoff, find jobs, prep for a role, practice, analyze a JD, negotiate, interview strategy) has a reference file that describes when the workflow fires, which MCP tool to call first, and how to coach around the response. The MCP returns structured JSON. The Skill translates it into a conversation.
The license is MIT. The code is at github.com/fourleafai/clover-public. Pull requests for new tool support, new workflows, or voice improvements are welcome.
Install
The MCP server.
claude mcp add --transport http four-leaf https://four-leaf.ai/api/mcp
The Skill.
npx four-leaf-coach add
A free Four-Leaf account works for the read tools. Daily-limited compute tools (resume scoring, practice question generation, comp analysis, comp benchmarks) are free up to a generous cap. The two paid surfaces (voice mock interviews with rubric-scored feedback, and full AI resume tailoring) are gated by any active Four-Leaf paid plan, including the three-day free trial.
The full surface area, with sample prompts for each tool, lives at four-leaf.ai/oss.
What this changes
When the tools that power a vertical product are accessible from any AI assistant, the assistant becomes the application surface. The candidate stops switching between their AI chat and yet another login wall. The product team stops building chat interfaces that are worse than the AI they already use.
That's the bet behind the Four-Leaf MCP. The four-leaf.ai consumer product still exists, still works, and still has the surfaces that genuinely benefit from a dedicated UI (voice mock interviews, application tracking). The MCP is what makes the rest of the stack feel like part of the AI assistant the candidate was already using.
For anyone building in the MCP space, the architectural patterns generalize. OAuth over API keys for human-facing tools. Server-side capability over delegation when reliability matters. The 60-second client budget as a hard constraint. Stash tables for heavy text handoff. None of it is novel, but the combination is what makes the developer experience actually pleasant.