Building Emoji Search with Claude Code in Half an Hour
Built a semantic emoji search with Claude Code. Fully static, the model runs in the browser. A short writeup below.
1. Requirements
- Search emoji by text in Chinese or English; pasting an emoji should also surface similar ones.
- Show emoji glosses in both languages.
- Skin-tone toggle.
- Tunable retrieval params: top-K, similarity threshold, etc.
- Click to copy; Enter copies the first result.
- Fully static and local-only — privacy first.
- The model loads on first visit and is cached afterwards; offline-capable from the second visit, with first load under 30s.
2. Overall Design
The model is Xenova/multilingual-e5-small: multilingual, 384-dim, ~30 MB after int8 quantization — small enough to live in the browser’s IndexedDB. e5-base would be better quality but isn’t worth the cost in the browser.
1914 emoji, each annotated with Chinese and English names plus CLDR keywords (flag entries derive from cldr-annotations-derived-full). At build time, embeddings are computed once in Node, centered, and quantized — totaling roughly 1.1 MB of artifacts.
At runtime the main thread only handles UI; model inference and vector math live in a Web Worker. Queries take one of three paths:
- Pure keyword queries hit a dictionary lookup.
- Text-semantic queries go through the model embedding + cosine.
- Pasted-emoji reverse lookups skip the model entirely and just dot-product against the prebuilt vectors.
3. Notable Bits
3.1 Emoji Reverse Search
The tool needs to support pasting an emoji to find similar ones (e.g. paste 🐼, get back 🦊 🐵 🐻). There’s a free optimization here: since the query is an emoji, its vector was already computed at build time and lives in emoji-vectors.i8.bin. At runtime we just look up the index and use it directly — no need to run the model again.
The implementation is straightforward: strip skin-tone modifiers (U+1F3FB to U+1F3FF) from the input, use Intl.Segmenter to extract the first grapheme cluster (which handles ZWJ sequences like 🤦♂️), look up Map<emoji, index>, and dot-product directly against the 1914 stored int8 vectors.
Text search runs the model every query (~tens of ms with q8). Reverse search has none of that overhead — it’s just dot products, and noticeably faster.
3.2 The e5 Distribution Bias
Right after wiring up reverse search, dragging the similarity slider had no visible effect. I checked the cosine distribution from 🐼 to every other emoji:
top-1 0.972 (🦊)
top-10 0.955 (🐭)
p50 0.901
p90 0.886 ← 90% of emoji sit above thisA 0.84 threshold basically lets every one of the 1913 other emoji through. Likely cause: e5 is trained with a passage: prefix on every passage, so all doc vectors get pulled toward a shared direction by that common context. Computing the mean confirms it: ||mean|| ≈ 0.7, far from negligible.
This isn’t a new problem. The fix has been around for years: All-but-the-Top (Mu & Viswanath, ICLR 2018) discusses how a few common directions dominate distance in word embeddings. The remedy is to subtract the mean and renormalize:
for (let i = 0; i < n; i++) {
let n2 = 0;
for (let j = 0; j < dim; j++) {
vec[i * dim + j] -= mean[j];
n2 += vec[i * dim + j] ** 2;
}
const nrm = Math.sqrt(n2);
for (let j = 0; j < dim; j++) vec[i * dim + j] /= nrm;
}Subtract the mean at build time, store the centered vectors, and at runtime subtract the same mean from query embeddings — keeping the feature spaces aligned. After:
🐼 vs 🐻 0.963 → 0.570
🐼 vs 🍜 0.909 → -0.011
🐼 vs 😀 0.920 → 0.201The distribution opens up. The same threshold now works for both text and reverse search.
The All-but-the-Top paper actually recommends going further: subtract the mean and remove the top K = D/100 principal components (D=384 → K≈4). I tried it:
| K | 🐼 top-1 | 🐼 vs 🐻 | 🐼 vs 🍜 |
|---|---|---|---|
| 0 (mean only) | 0.628 | 0.572 | -0.009 |
| 4 (paper default) | 0.531 | 0.478 | -0.090 |
| 8 | 0.376 | 0.343 | 0.004 |
But the result is counterintuitive: as K grows, the cosine to related emoji (🐼-🐻) gets squashed alongside everything else — the distribution narrows instead of widening. My guess is the paper targeted word2vec / GloVe — Zipfian-frequency-dominated word vectors with many shared directions — whereas e5 is a contrastively trained sentence encoder whose training objective already does a lot of the work compressing shared directions. After removing the mean, what’s left is mostly real semantics, and removing more is just dropping signal.
Final pick: K=0, first-order is enough.