WZH

Back

Built a semantic emoji search with Claude Code. Fully static, the model runs in the browser. A short writeup below.

demo

1. Requirements

  1. Search emoji by text in Chinese or English; pasting an emoji should also surface similar ones.
  2. Show emoji glosses in both languages.
  3. Skin-tone toggle.
  4. Tunable retrieval params: top-K, similarity threshold, etc.
  5. Click to copy; Enter copies the first result.
  6. Fully static and local-only — privacy first.
  7. The model loads on first visit and is cached afterwards; offline-capable from the second visit, with first load under 30s.

2. Overall Design

The model is Xenova/multilingual-e5-small: multilingual, 384-dim, ~30 MB after int8 quantization — small enough to live in the browser’s IndexedDB. e5-base would be better quality but isn’t worth the cost in the browser.

1914 emoji, each annotated with Chinese and English names plus CLDR keywords (flag entries derive from cldr-annotations-derived-full). At build time, embeddings are computed once in Node, centered, and quantized — totaling roughly 1.1 MB of artifacts.

At runtime the main thread only handles UI; model inference and vector math live in a Web Worker. Queries take one of three paths:

  1. Pure keyword queries hit a dictionary lookup.
  2. Text-semantic queries go through the model embedding + cosine.
  3. Pasted-emoji reverse lookups skip the model entirely and just dot-product against the prebuilt vectors.
BUILD · NODE 14 s · manual emoji + CLDR1914 × en/zh e5-smallembed 1914×384 center + L2subtract mean, renorm int8 quantizescale = 1/127 vec.bin + data.json718 KB + (en/zh) 400 KB RUNTIME · BROWSER WORKER fetch + modelIndexedDB cached route querykw · text · emoji cosine top-Kint8 · int8 · scale² render + copyrender w/ current skin tone
Fig. 1 · Build artifacts (top) feed directly into the runtime Worker (bottom)

3. Notable Bits

The tool needs to support pasting an emoji to find similar ones (e.g. paste 🐼, get back 🦊 🐵 🐻). There’s a free optimization here: since the query is an emoji, its vector was already computed at build time and lives in emoji-vectors.i8.bin. At runtime we just look up the index and use it directly — no need to run the model again.

The implementation is straightforward: strip skin-tone modifiers (U+1F3FB to U+1F3FF) from the input, use Intl.Segmenter to extract the first grapheme cluster (which handles ZWJ sequences like 🤦‍♂️), look up Map<emoji, index>, and dot-product directly against the 1914 stored int8 vectors.

Text search runs the model every query (~tens of ms with q8). Reverse search has none of that overhead — it’s just dot products, and noticeably faster.

3.2 The e5 Distribution Bias

Right after wiring up reverse search, dragging the similarity slider had no visible effect. I checked the cosine distribution from 🐼 to every other emoji:

plaintext
top-1   0.972  (🦊)
top-10  0.955  (🐭)
p50     0.901
p90     0.886  ← 90% of emoji sit above this

A 0.84 threshold basically lets every one of the 1913 other emoji through. Likely cause: e5 is trained with a passage: prefix on every passage, so all doc vectors get pulled toward a shared direction by that common context. Computing the mean confirms it: ||mean|| ≈ 0.7, far from negligible.

This isn’t a new problem. The fix has been around for years: All-but-the-Top (Mu & Viswanath, ICLR 2018) discusses how a few common directions dominate distance in word embeddings. The remedy is to subtract the mean and renormalize:

JavaScript
for (let i = 0; i < n; i++) {
  let n2 = 0;
  for (let j = 0; j < dim; j++) {
    vec[i * dim + j] -= mean[j];
    n2 += vec[i * dim + j] ** 2;
  }
  const nrm = Math.sqrt(n2);
  for (let j = 0; j < dim; j++) vec[i * dim + j] /= nrm;
}

Subtract the mean at build time, store the centered vectors, and at runtime subtract the same mean from query embeddings — keeping the feature spaces aligned. After:

plaintext
🐼 vs 🐻     0.963  →   0.570
🐼 vs 🍜     0.909  →  -0.011
🐼 vs 😀     0.920  →   0.201

The distribution opens up. The same threshold now works for both text and reverse search.

Before centering After centering −1−0.500.51
Fig. 2 · Cosine distribution of 🐼 against the other 1913 emoji, sharing an [−1, 1] x-axis. Solid line = cosine 0 (orthogonal / unrelated), dashed = default threshold 0.20. Before centering everything piles up between 0.85–0.95, where a 0.20 threshold has no bite; after centering the spread runs across [−0.24, 0.62] and the threshold actually discriminates.

The All-but-the-Top paper actually recommends going further: subtract the mean and remove the top K = D/100 principal components (D=384 → K≈4). I tried it:

K🐼 top-1🐼 vs 🐻🐼 vs 🍜
0 (mean only)0.6280.572-0.009
4 (paper default)0.5310.478-0.090
80.3760.3430.004

But the result is counterintuitive: as K grows, the cosine to related emoji (🐼-🐻) gets squashed alongside everything else — the distribution narrows instead of widening. My guess is the paper targeted word2vec / GloVe — Zipfian-frequency-dominated word vectors with many shared directions — whereas e5 is a contrastively trained sentence encoder whose training objective already does a lot of the work compressing shared directions. After removing the mean, what’s left is mostly real semantics, and removing more is just dropping signal.

Final pick: K=0, first-order is enough.