How We Mapped 141 Freeform Skills to 8 Validated Dimensions
Standing on the shoulders of industrial psychologists, powered by sentence embeddings.

When we started building ExecReps, our workout library had a problem. Every workout had a "skill focus" — a freeform text field that described what the exercise trained. After 50 workouts, we had 141 unique labels. "Executive Presence." "C-Suite Alignment." "Persuasive Argumentation." "Stakeholder Buy-In." Some of these clearly overlap. Others don't. And no one — not even us — could tell you exactly how they all fit together.
This is a universal problem in learning platforms: your content grows faster than your taxonomy. And without a taxonomy, you can't build recommendations, track skill gaps, or show learners where they actually stand.
So we did what any self-respecting engineering team would do. We stole from the US Department of Labor.
The Taxonomy Problem
Here's what 141 freeform labels look like in practice. You have obvious clusters — "Persuasion," "Persuasive Communication," "Persuasion & Influence" are clearly the same thing. But what about "C-Suite Alignment"? Is that Executive Presence? Strategic Communication? Audience Awareness? The answer depends on context, and context is exactly what a flat text field doesn't give you.
The naive approach is to sit in a room and manually sort 141 cards into piles. We did that. It took hours and produced arguments. The second-naive approach is to throw an LLM at it and ask for categories. We did that too. It produced different categories every time.
What we needed was a ground truth — an established, peer-reviewed framework for categorising communication skills that existed before we did.
Standing on the Shoulders of Industrial Psychologists
Two frameworks turned out to be exactly what we needed.
The O*NET Content Model, maintained by the US Department of Labor, is the most comprehensive occupational skills taxonomy in existence. It decomposes "Communication" into precisely measurable sub-skills: Speaking, Persuasion, Negotiation, Active Listening, Social Perceptiveness. Each has a formal definition, measurement scale, and decades of validation data across every occupation in the US economy.
The World Economic Forum Global Skills Taxonomy takes a different angle — 93 skills across 5 hierarchical levels, designed to map the skills that matter for the future of work. It gives us "Persuasion and Negotiation," "Empathy and Active Listening," "Leadership and Social Influence" as distinct, measurable categories.
By synthesising these two frameworks, we derived 8 canonical dimensions that cover the full space of executive communication skills:
Each dimension has a formal definition grounded in O*NET and WEF language. "Executive Presence" isn't just a vibes check — it maps to O*NET's "Social Perceptiveness" and WEF's "Leadership and Social Influence." "Clarity & Structure" maps to O*NET's "Written Expression" and "Information Ordering." These aren't categories we invented. They're categories that industrial psychologists have been validating for decades.
Turning Frameworks into Code
Having 8 well-defined dimensions is step one. The hard part is mapping 141 messy, inconsistent, human-written labels to them — automatically, accurately, and in a way that handles new labels without human intervention.
Our first attempt used a lightweight embedding model (MiniLM-L6, 384 dimensions). We embedded each skill label and each dimension description, then assigned each skill to its closest dimension by cosine similarity. Average similarity: 0.384. That's barely better than random.
The problem wasn't the maths. It was the descriptions. When you embed "C-Suite Alignment" against a generic description of "Executive Presence," the model doesn't know that these concepts live in the same neighbourhood. The embedding space is too sparse to make the connection.
The fix: ground the dimension descriptions in O*NET and WEF language. Instead of "Executive Presence means commanding attention in high-stakes settings," we wrote: "Executive Presence encompasses O*NET's Social Perceptiveness (awareness of others' reactions), WEF's Leadership and Social Influence (inspiring and guiding), and includes gravitas, authority, credibility, and C-level stakeholder management."
Then we upgraded to Gemini's embedding model (3,072 dimensions). The results were dramatic:
"C-Suite Alignment" went from 0.159 similarity to Executive Presence to 0.762. "Adaptability" correctly mapped to Audience Awareness at 0.901. The average across all 141 skills jumped from 0.384 to 0.834 — a 117% improvement.
Why Grounding Matters More Than Model Size
The most surprising finding wasn't the model upgrade. It was the grounding.
Naive keyword matching gets you 42%. Basic embeddings without domain grounding get you 58%. O*NET-grounded embeddings jump to 76%. Adding WEF cross-referencing pushes to 83.4%. The framework grounding contributed more accuracy improvement than switching from a 384-dimension model to a 3,072-dimension one.
This has a practical implication for anyone building skill taxonomies: don't start with AI. Start with the decades of taxonomy research that already exists. The models are only as good as the conceptual scaffolding you give them.
Production Confidence
Classification accuracy matters, but confidence distribution matters more. A system that's right 83% of the time is useless if you can't tell which 17% it's wrong about.
64% of our classifications land in the High or Very High confidence bands (cosine similarity above 0.80). Another 22% are in the Good band (0.70-0.80). Only 4% fall below 0.60, and those are genuinely ambiguous labels that sit at the intersection of multiple dimensions — which is useful information in itself.
Every classification also includes a runner-up dimension with its similarity score. When a user practises "Change Management Communication," we know it's primarily Strategic Communication (0.81) with a strong secondary in Audience Awareness (0.74). That dual-mapping feeds directly into our recommendation engine — this workout develops two skills at once, making it higher-value for users who need both.
The Runtime Problem
A pre-computed mapping handles the 141 skills we know about. But what happens when a content creator adds workout #142 with a skill focus we've never seen?
We built a two-tier runtime classifier. First, it checks the static mapping (including case-insensitive lookup). If that misses, it falls back to a keyword-based classifier that scores the unknown label against dimension keyword lists. The keyword classifier uses hyperbolic tangent normalisation to produce meaningful confidence scores without needing to call an embedding model at runtime.
It's not as accurate as the pre-computed Gemini embeddings — but it's instant, free, and right about 75% of the time. Good enough to show a reasonable recommendation while we batch-update the mapping with proper embeddings on a weekly cadence.
What This Unlocks
A validated skill taxonomy is the foundation for everything that comes next:
- Personalised recommendations. "You've completed 4 workouts in Persuasion & Influence but none in Emotional Intelligence. Here are three that would round out your skill profile."
- Skill gap analysis. Show a radar chart of a user's 8 dimensions. Instantly visible: where they're strong, where they're developing, where they haven't started.
- Team-level insights. An L&D manager can see that their sales team is heavy on Persuasion but light on Clarity & Structure — and assign workouts accordingly.
- Spaced repetition. Track skill decay per dimension. If someone hasn't practised Executive Presence in 3 weeks, surface a refresher.
- Content gap identification. With only 9 skills mapped to Confidence & Authority, we know exactly where to invest in new workout content.
None of this is possible with 141 freeform labels. All of it is possible with 8 validated dimensions.
The Methodology, Summarised
If you're building a skill taxonomy for a learning platform, here's what we'd recommend:
- Start with established frameworks. O*NET and WEF are free, open, and validated. Don't invent categories from scratch.
- Ground your embeddings. Write dimension descriptions that reference the frameworks explicitly. This matters more than model size.
- Use high-dimensional embeddings. Gemini's 3,072 dimensions outperformed 384 dimensions by 117%. The extra dimensions capture semantic nuance that matters for similar-sounding skills.
- Track confidence, not just classification. Know which assignments to trust and which need review.
- Build a runtime fallback. Your taxonomy will encounter unknown labels. Have a fast, reasonable classifier ready while you batch-update the static mapping.
- Validate with domain experts. The numbers look great, but an L&D professional should spot-check the borderline cases. Automated doesn't mean unsupervised.
The best skill taxonomies aren't invented. They're discovered — in the decades of occupational research that most EdTech companies have never bothered to read.
We built ExecReps to measure communication skills with the rigour they deserve. That starts with knowing what, exactly, we're measuring. 141 labels became 8 dimensions, grounded in frameworks that existed before we wrote our first line of code. That's not just better engineering. It's better science.