Building Emoji Search with Claude Code in Half an Hour • WZH

Built a semantic emoji search with Claude Code. Fully static, the model runs in the browser. A short writeup below.

1. Requirements

Search emoji by text in Chinese or English; pasting an emoji should also surface similar ones.
Show emoji glosses in both languages.
Skin-tone toggle.
Tunable retrieval params: top-K, similarity threshold, etc.
Click to copy; Enter copies the first result.
Fully static and local-only — privacy first.
The model loads on first visit and is cached afterwards; offline-capable from the second visit, with first load under 30s.

2. Overall Design

The model is Xenova/multilingual-e5-small: multilingual, 384-dim, ~30 MB after int8 quantization — small enough to live in the browser’s IndexedDB. e5-base would be better quality but isn’t worth the cost in the browser.

1914 emoji, each annotated with Chinese and English names plus CLDR keywords (flag entries derive from cldr-annotations-derived-full). At build time, embeddings are computed once in Node, centered, and quantized — totaling roughly 1.1 MB of artifacts.

At runtime the main thread only handles UI; model inference and vector math live in a Web Worker. Queries take one of three paths:

Pure keyword queries hit a dictionary lookup.
Text-semantic queries go through the model embedding + cosine.
Pasted-emoji reverse lookups skip the model entirely and just dot-product against the prebuilt vectors.

Fig. 1 · Build artifacts (top) feed directly into the runtime Worker (bottom)

3. Notable Bits

3.1 Emoji Reverse Search

The tool needs to support pasting an emoji to find similar ones (e.g. paste 🐼, get back 🦊 🐵 🐻). There’s a free optimization here: since the query is an emoji, its vector was already computed at build time and lives in emoji-vectors.i8.bin. At runtime we just look up the index and use it directly — no need to run the model again.

The implementation is straightforward: strip skin-tone modifiers (U+1F3FB to U+1F3FF) from the input, use Intl.Segmenter to extract the first grapheme cluster (which handles ZWJ sequences like 🤦‍♂️), look up Map<emoji, index>, and dot-product directly against the 1914 stored int8 vectors.

Text search runs the model every query (~tens of ms with q8). Reverse search has none of that overhead — it’s just dot products, and noticeably faster.

3.2 The e5 Distribution Bias

Right after wiring up reverse search, dragging the similarity slider had no visible effect. I checked the cosine distribution from 🐼 to every other emoji:

plaintext

top-1   0.972  (🦊)
top-10  0.955  (🐭)
p50     0.901
p90     0.886  ← 90% of emoji sit above this

A 0.84 threshold basically lets every one of the 1913 other emoji through. Likely cause: e5 is trained with a passage: prefix on every passage, so all doc vectors get pulled toward a shared direction by that common context. Computing the mean confirms it: ||mean|| ≈ 0.7, far from negligible.

This isn’t a new problem. The fix has been around for years: All-but-the-Top (Mu & Viswanath, ICLR 2018) discusses how a few common directions dominate distance in word embeddings. The remedy is to subtract the mean and renormalize:

JavaScript

for (let i = 0; i < n; i++) {
  let n2 = 0;
  for (let j = 0; j < dim; j++) {
    vec[i * dim + j] -= mean[j];
    n2 += vec[i * dim + j] ** 2;
  }
  const nrm = Math.sqrt(n2);
  for (let j = 0; j < dim; j++) vec[i * dim + j] /= nrm;
}

Subtract the mean at build time, store the centered vectors, and at runtime subtract the same mean from query embeddings — keeping the feature spaces aligned. After:

plaintext

🐼 vs 🐻     0.963  →   0.570
🐼 vs 🍜     0.909  →  -0.011
🐼 vs 😀     0.920  →   0.201

The distribution opens up. The same threshold now works for both text and reverse search.

Fig. 2 · Cosine distribution of 🐼 against the other 1913 emoji, sharing an [−1, 1] x-axis. Solid line = cosine 0 (orthogonal / unrelated), dashed = default threshold 0.20. Before centering everything piles up between 0.85–0.95, where a 0.20 threshold has no bite; after centering the spread runs across [−0.24, 0.62] and the threshold actually discriminates.

The All-but-the-Top paper actually recommends going further: subtract the mean and remove the top K = D/100 principal components (D=384 → K≈4). I tried it:

K	🐼 top-1	🐼 vs 🐻	🐼 vs 🍜
0 (mean only)	0.628	0.572	-0.009
4 (paper default)	0.531	0.478	-0.090
8	0.376	0.343	0.004

But the result is counterintuitive: as K grows, the cosine to related emoji (🐼-🐻) gets squashed alongside everything else — the distribution narrows instead of widening. My guess is the paper targeted word2vec / GloVe — Zipfian-frequency-dominated word vectors with many shared directions — whereas e5 is a contrastively trained sentence encoder whose training objective already does a lot of the work compressing shared directions. After removing the mean, what’s left is mostly real semantics, and removing more is just dropping signal.

Final pick: K=0, first-order is enough.