The product
Fluency tests how people actually work with AI — through real scenarios, judgement calls, and verification tasks. Two ways to run it: an industry-standard baseline, or a fully bespoke sandbox built around your sector and scoring priorities.
The fixed test. Same modules, same scoring, every time. The right answer when you need consistency across hires, cohorts, or time.
The same engine, built around your org. Pick the modules that matter, anchor every scenario in your sector, and weight the scoring to your priorities.
Module library
Every module isolates a different way of working with AI — from live collaboration to verification to risk judgement. Run the baseline set, or pick the ones that fit your work.
Not exhaustive · new modules are added regularly
Candidates brief, refine, and direct an AI through a real workplace task.
Genuine judgement calls on when AI should and shouldn’t be used.
How candidates handle a constrained AI — without breaking the rules.
Order AI use-cases by risk. A read on real-world prioritisation.
How candidates brief an AI when budget and accuracy compete.
Spot the errors in AI-generated work — hallucinations, omissions, drift.
Critical review of AI output before it leaves the candidate’s desk.
Which parts of a workflow belong to AI and which belong to a human.
Find the prompt mistake that caused an AI to go wrong.
Tune temperature, length, and tone live to hit a target output.
One module, in depth
A representative example — the same depth applies to every module.
Scenario brief
A six-month engagement has just ended. The client’s COO has asked for a feedback report covering what the engagement achieved, where it fell short, and three recommendations for the next phase. You have an AI assistant — direct it.
AI · draft
Got it. Quick check before I draft — any specific outcomes the COO already knows about? Anything to under-play or emphasise for the board?AI · draft
Drafting… 380-word report incoming. Want me to keep the closing line warm, or punchy?What’s being scored
Did they brief with constraints, audience, and structure — or hand-wave?
When the AI asked a clarifying question, was the response useful?
Did they push back, override, or accept everything the AI suggested?
Did they direct toward a tone appropriate to the audience?
How quickly did the brief converge on something usable?
Scoring
Every score lands on a calibrated 100-point scale, split into four capability bands. The same scale across every module, every candidate, every time — so you can build a baseline that means something to your org.
Calibrated scale, not opinion.
Anchored by reference rubrics, not the mood of the reviewer.
Evidence with every score.
Each dimension links back to the exact moment in the candidate’s response.
Comparable across time.
A 62 today means the same a year from now. No grade inflation.
Capability band
Baseline · 44
The report
Review a candidate in under ten minutes. Annotated below.
Alex Rivera
Senior PM candidate · 14 May 2026
Overall
Baseline · 44
62
/100 · +18 vs. baseline
By module
Evidence · why this score
Sandbox
Same engine, same scoring rigour — re-shaped around the modules, scenarios, and priorities that matter to your org.
Choose any subset of the ten modules. Build a 20-minute focused test or a deep diagnostic — whatever fits the work.
Trust
Every score anchored to a reference rubric — not a reviewer’s mood.
Every assessment, response, and rationale stored. Re-scoreable on demand.
Candidate data isolated per org. Nothing trains models. GDPR-aligned.
Scoring tested across personas to surface — and reduce — systemic bias.
Get started
Request access and we’ll walk you through the platform — or scope a sandbox built around your sector and scoring priorities.