NotchFit Methodology — How Our AI Builds Plans

We think people deserve to know how an AI is generating something as personal as their workout plan. This page is the technical version of "how it works" — what model, what guardrails, what we check before a plan reaches you.

What model powers NotchFit?

Plans are generated by a reasoning-class large language model. We chose a smaller-but-reasoning model deliberately: workout planning is a constrained domain where consistency and rule-following matter more than open-ended creativity. We test against larger models continuously and upgrade when quality justifies the cost.

How are user inputs sanitized?

Every free-text field you provide (injury notes, preferences, equipment descriptions) is sanitized before reaching the AI. We strip XML tags, role markers, markdown headings, and enforce length caps — standard prompt-injection mitigations. User-provided text is wrapped in clearly marked [USER_DATA] blocks so the model knows what's instruction (ours) and what's data (yours).

How does the AI decide weights?

Weights are required for any dumbbell, kettlebell, cable, or machine exercise. The AI follows location-based caps:

Home: max 25 lbs per dumbbell (a realistic home-gym constraint)
Outdoor / mixed: max 15 lbs
Gym: no cap

If you tell us during onboarding that you own heavier dumbbells, that overrides the default cap. Bodyweight and band exercises don't carry weight assignments.

What about the agentic review loop?

NotchFit ships with an optional generate-review-revise loop: a second model reviews the first model's output against the brief — variety, exercise count, recovery balance, equipment match — and asks for revisions until a quality threshold is met. It catches the kind of subtle drift that "looks right at a glance but isn't" — wrong rep ranges for the goal, the same exercise repeated three days in a row, missing weights.

In production we currently rely on the pre-deploy eval gate (below) rather than running the loop on every generation — a tradeoff between latency, cost, and the quality the evals already enforce. We can turn the loop on per-user or globally when the cost/quality tradeoff changes.

How do we keep exercise quality high?

NotchFit's exercise library is wger-derived and AI-evaluated. Each exercise carries a quality rating from 1-5. Exercises rated under 3 are excluded from plan generation. Admins re-evaluate the library periodically.

How is plan quality measured before deployment?

Every backend deploy passes through a live eval gate — nine canonical workout-generation fixtures must produce passing plans on the production model before code ships. The eval suite checks:

Goal match (right rep ranges, volume, intensity)
Equipment compliance (only uses what the user has)
Weight rules (weights present, within caps)
Variety (no repeated exercises within close days)
Modification preservation (user-locked workouts respected)

Drift in any of those fails the deploy.

What data does the AI see (and not see)?

The AI sees: your goal, fitness level, age range, available equipment, weekly availability, training history (last several weeks of completed workouts and feedback), and any explicit preferences or constraints you've shared.

The AI does not see: your email, name, payment information, IP address, or any identifying account data. Plan generation runs against pseudonymized profile data. Conversation history with Sage is stored per-user and used to maintain memory across sessions, but is not used to train any AI model.

Try a plan built this way

The fastest way to evaluate the methodology is to use it. First plan free, no credit card.

Build Your First Plan Free