methodologyskill taxonomyAIO*NETembeddingsL&D

How We Mapped 141 Freeform Skills to 8 Validated Dimensions

Standing on the shoulders of industrial psychologists, powered by sentence embeddings.

Jay Stansell|Founder & CEO, ExecReps.ai|March 10, 2026|7 min read

How We Mapped 141 Freeform Skills to 8 Validated Dimensions

When we started building ExecReps, our workout library had a problem. Every workout had a "skill focus" — a freeform text field that described what the exercise trained. After 50 workouts, we had 141 unique labels. "Executive Presence." "C-Suite Alignment." "Persuasive Argumentation." "Stakeholder Buy-In." Some of these clearly overlap. Others don't. And no one — not even us — could tell you exactly how they all fit together.

This is a universal problem in learning platforms: your content grows faster than your taxonomy. And without a taxonomy, you can't build recommendations, track skill gaps, or show learners where they actually stand.

So we did what any self-respecting engineering team would do. We stole from the US Department of Labor.

Ready to develop your team's voice?

Start a free trial →

The Taxonomy Problem

Here's what 141 freeform labels look like in practice. You have obvious clusters — "Persuasion," "Persuasive Communication," "Persuasion & Influence" are clearly the same thing. But what about "C-Suite Alignment"? Is that Executive Presence? Strategic Communication? Audience Awareness? The answer depends on context, and context is exactly what a flat text field doesn't give you.

The naive approach is to sit in a room and manually sort 141 cards into piles. We did that. It took hours and produced arguments. The second-naive approach is to throw an LLM at it and ask for categories. We did that too. It produced different categories every time.

What we needed was a ground truth — an established, peer-reviewed framework for categorising communication skills that existed before we did.

Standing on the Shoulders of Industrial Psychologists

Two frameworks turned out to be exactly what we needed.

The O*NET Content Model, maintained by the US Department of Labor, is the most comprehensive occupational skills taxonomy in existence. It decomposes "Communication" into precisely measurable sub-skills: Speaking, Persuasion, Negotiation, Active Listening, Social Perceptiveness. Each has a formal definition, measurement scale, and decades of validation data across every occupation in the US economy.

The World Economic Forum Global Skills Taxonomy takes a different angle — 93 skills across 5 hierarchical levels, designed to map the skills that matter for the future of work. It gives us "Persuasion and Negotiation," "Empathy and Active Listening," "Leadership and Social Influence" as distinct, measurable categories.

By synthesising these two frameworks, we derived 8 canonical dimensions that cover the full space of executive communication skills:

141 freeform skills mapped to 8 canonical dimensions showing Executive Presence as the largest cluster

Each dimension has a formal definition grounded in O*NET and WEF language. "Executive Presence" isn't just a vibes check — it maps to O*NET's "Social Perceptiveness" and WEF's "Leadership and Social Influence." "Clarity & Structure" maps to O*NET's "Written Expression" and "Information Ordering." These aren't categories we invented. They're categories that industrial psychologists have been validating for decades.

Turning Frameworks into Code

Having 8 well-defined dimensions is step one. The hard part is mapping 141 messy, inconsistent, human-written labels to them — automatically, accurately, and in a way that handles new labels without human intervention.

Our first attempt used a lightweight embedding model (MiniLM-L6, 384 dimensions). We embedded each skill label and each dimension description, then assigned each skill to its closest dimension by cosine similarity. Average similarity: 0.384. That's barely better than random.

The problem wasn't the maths. It was the descriptions. When you embed "C-Suite Alignment" against a generic description of "Executive Presence," the model doesn't know that these concepts live in the same neighbourhood. The embedding space is too sparse to make the connection.

The fix: ground the dimension descriptions in O*NET and WEF language. Instead of "Executive Presence means commanding attention in high-stakes settings," we wrote: "Executive Presence encompasses O*NET's Social Perceptiveness (awareness of others' reactions), WEF's Leadership and Social Influence (inspiring and guiding), and includes gravitas, authority, credibility, and C-level stakeholder management."

Then we upgraded to Gemini's embedding model (3,072 dimensions). The results were dramatic:

Side-by-side comparison showing Gemini embeddings achieving 117% improvement over MiniLM across all skill classifications

"C-Suite Alignment" went from 0.159 similarity to Executive Presence to 0.762. "Adaptability" correctly mapped to Audience Awareness at 0.901. The average across all 141 skills jumped from 0.384 to 0.834 — a 117% improvement.

Why Grounding Matters More Than Model Size

The most surprising finding wasn't the model upgrade. It was the grounding.

From The Champion's Playbook

Spot the communication patterns holding your team back.

Read the guide →

Engineering

Why We Built a Personalization Engine (And Why It Doesn't Use an LLM)

Mar 11, 2026

Most professional development platforms treat every user the same. We built a personalization engine grounded in Bayesian Knowledge Tracing, spaced repetition, and Zone of Proximal Development research — using a deterministic 5-factor scoring model instead of an LLM.

10 min read

Science

The Dual-Axis Problem: Why Every Communication Assessment Has Been Lying to You

Mar 10, 2026

62% of professionals score 30+ points higher on content knowledge than delivery performance. Traditional assessments average these into a meaningless middle number. Here's what two axes reveal that one never could.

9 min read

Get insights from The Lab

Weekly research on voice science, executive communication, and leadership development. No spam, unsubscribe anytime.

Production Confidence

Classification accuracy matters, but confidence distribution matters more. A system that's right 83% of the time is useless if you can't tell which 17% it's wrong about.

64% of our classifications land in the High or Very High confidence bands (cosine similarity above 0.80). Another 22% are in the Good band (0.70-0.80). Only 4% fall below 0.60, and those are genuinely ambiguous labels that sit at the intersection of multiple dimensions — which is useful information in itself.

Every classification also includes a runner-up dimension with its similarity score. When a user practises "Change Management Communication," we know it's primarily Strategic Communication (0.81) with a strong secondary in Audience Awareness (0.74). That dual-mapping feeds directly into our recommendation engine — this workout develops two skills at once, making it higher-value for users who need both.

The Runtime Problem

A pre-computed mapping handles the 141 skills we know about. But what happens when a content creator adds workout #142 with a skill focus we've never seen?

We built a two-tier runtime classifier. First, it checks the static mapping (including case-insensitive lookup). If that misses, it falls back to a keyword-based classifier that scores the unknown label against dimension keyword lists. The keyword classifier uses hyperbolic tangent normalisation to produce meaningful confidence scores without needing to call an embedding model at runtime.

It's not as accurate as the pre-computed Gemini embeddings — but it's instant, free, and right about 75% of the time. Good enough to show a reasonable recommendation while we batch-update the mapping with proper embeddings on a weekly cadence.

What This Unlocks

A validated skill taxonomy is the foundation for everything that comes next:

Personalised recommendations. "You've completed 4 workouts in Persuasion & Influence but none in Emotional Intelligence. Here are three that would round out your skill profile."

Skill gap analysis. Show a radar chart of a user's 8 dimensions. Instantly visible: where they're strong, where they're developing, where they haven't started.

Team-level insights. An L&D manager can see that their sales team is heavy on Persuasion but light on Clarity & Structure — and assign workouts accordingly.

Spaced repetition. Track skill decay per dimension. If someone hasn't practised Executive Presence in 3 weeks, surface a refresher.

Content gap identification. With only 9 skills mapped to Confidence & Authority, we know exactly where to invest in new workout content.

None of this is possible with 141 freeform labels. All of it is possible with 8 validated dimensions.

The Methodology, Summarised

If you're building a skill taxonomy for a learning platform, here's what we'd recommend:

Start with established frameworks. O*NET and WEF are free, open, and validated. Don't invent categories from scratch.

Ground your embeddings. Write dimension descriptions that reference the frameworks explicitly. This matters more than model size.

Use high-dimensional embeddings. Gemini's 3,072 dimensions outperformed 384 dimensions by 117%. The extra dimensions capture semantic nuance that matters for similar-sounding skills.

Track confidence, not just classification. Know which assignments to trust and which need review.

Build a runtime fallback. Your taxonomy will encounter unknown labels. Have a fast, reasonable classifier ready while you batch-update the static mapping.

Validate with domain experts. The numbers look great, but an L&D professional should spot-check the borderline cases. Automated doesn't mean unsupervised.

The best skill taxonomies aren't invented. They're discovered — in the decades of occupational research that most EdTech companies have never bothered to read.

We built ExecReps to measure communication skills with the rigour they deserve. That starts with knowing what, exactly, we're measuring. 141 labels became 8 dimensions, grounded in frameworks that existed before we wrote our first line of code. That's not just better engineering. It's better science.

How We Mapped 141 Freeform Skills to 8 Validated Dimensions

The Taxonomy Problem

Standing on the Shoulders of Industrial Psychologists

Turning Frameworks into Code

Why Grounding Matters More Than Model Size

Related Posts

Why We Built a Personalization Engine (And Why It Doesn't Use an LLM)

The Dual-Axis Problem: Why Every Communication Assessment Has Been Lying to You

Get insights from The Lab

Get insights from The Lab

Production Confidence

The Runtime Problem

What This Unlocks

The Methodology, Summarised

How We Built a Coaching Match Algorithm That Actually Works