Multi-Signal Fusion: Intersecting Lab Concepts in Natural-Language Search

Dotient’s Lab lets users define : named concepts grounded in exemplar images. Archived search historically treated each text query token with , merging semantic passes when the query fragmented into pieces. Multi-signal fusion is a separate track: when the user’s wording matches several signals at once, we want results that satisfy all of those concepts together (e.g. “rusty” and “architecture”), not an unweighted blend that surfaces “rusty tools” beside “modern glass towers.” This note explains detection, per-signal matching, , ranking, and the in-product affordances users see.

Signals and embeddings

Each signal stores a normalized computed from starred images. Archive items have vision embeddings in the same space. Matching a single signal is standard against that concept (with stars forced to maximum relevance when appropriate). Negative images can define a directional penalty so that unwanted modes of the embedding are damped via .

When does fusion activate?

Fusion is triggered only after the broader search pipeline has determined that embedding search is available and the query is non-empty. We scan registered signals whose labels align with the full query text or with decomposed tokens (stop-word filtered): substring matches, token equality, sensible prefix-style overlaps (short label in longer token, longer label containing a substantial token). Heuristics suppress redundant pairs such as activating both “rust” and “rusty” when the longer label already subsumes the shorter.

If two or more signals match, the request takes the fusion path instead of treating each signal label as unrelated RRF passes that would mingle ranked lists (which favors “votes” over true conjunction). Exactly one matched signal stays on the enriched RRF route: its name may be injected as an extra retrieval pass when the wording is fuzzy relative to bare tokens.

Intersection, not averaging

Earlier prototypes built a large approximate-neighbor pool and required every candidate to exceed every threshold in parallel. Approximate retrieval can miss plausible neighbors that still beat the threshold under exact cosine, and strict conjunctive gating produced empty screens even when users could see plausible overlap across single-signal result sets.

The current design builds, for each matched signal, the same expanded set single-signal search would: all starred IDs plus every embedded archive file whose cosine to that signal’s concept is greater than the signal’s similarity cutoff (respecting exclusions from negative IDs and, where applicable, the negative-concept gate). The fusion result set is the of those sets across signals, an item must be a plausible instance of signal A and signal B (and further signals when more concepts fire).

If that intersection is empty at the user-configured strengths, we walk a small , retrying intersection with mildly lower effective thresholds so borderline-but-intended overlaps (like “weathered stonework”) can surfaces without silently falling back to a disjunctive mash.

Ranking inside the intersection

Membership in the intersection is binary; ordering still matters. Each surviving ID collects per-signal cosine scores (stars count as unity for their signal). We combine them with the , multiply by cosine alignment between the archive embedding and the , and sort descending. Color filters and optional hubness post-processing follow the ordinary search path so behavior stays coherent with the rest of the app.

Sidebar and pinned signals

The desktop sidebar shows for quick launches. Pins use the same fuzzy label-detection heuristic as fusion: highlighting is unified whether the query is an exact snapshot of one label or a longer phrase that merely contains it. When at least two catalog signals match the live search text (not limited to pins), a compact “fusion” badge explains that retrieval is conjunctive. The pinned list still grounds the mental model, users discover which vocabulary is “live” relative to Lab concepts they care about.

Relation to multi-pass RRF search

Default archive search keeps using across decomposed tokens, explicit signal-name passes, filenames, etc. Fusion is narrowly scoped: multi-signal AND semantics replace that mixing only when multiple Lab signals fire on the same natural-language query. This preserves familiar behavior for ordinary queries while giving power users a principled notion of overlapping concepts grounded in embeddings rather than lexical tricks alone.

Formal references

RRF originates in work on fused ad-hoc retrieval; cosine similarity on normalized embedding vectors is standard practice for dual-encoder architectures in vision–language retrieval. Dotient applies these ideas in a local-first, personal-archive setting with user-authored concepts instead of curated taxonomies.

References

Lab signals. User-defined visual concepts in Dotient: a name, a set of exemplar images, optional negatives, and a stored concept embedding used to retrieve similar archive items.
concept vector. A normalized mean of the CLIP-style image embeddings of a signal’s starred exemplars, used as a direction in embedding space for similarity search.
cosine similarity. Dot product of two unit-normalized embedding vectors; in [−1, 1], treated as a relevance score when comparing an image to a signal’s concept.
Reciprocal Rank Fusion (RRF). A score-combination method that merges ranked lists from multiple retrieval passes by summing a term like 1/(k + rank); used in Dotient’s default archive search when several query passes run in parallel.
set intersection. The usual mathematical intersection: an item appears in the fused result only if it belongs to every per-signal match set, not if it merely scores well on an average of concepts.
geometric mean. For positive scores s₁…sₙ, (s₁·s₂·…·sₙ)^(1/n); used to rank intersection hits so that weak agreement on any concept pulls the combined score down smoothly.
query-text embedding. The same text encoder used elsewhere in search; the full user query is embedded and combined with per-signal geometry to order intersection results by overall query relevance.
threshold relaxation ladder. If strict per-signal similarity cutoffs yield an empty intersection, we retry with monotonically relaxed effective thresholds (e.g. 92%, 82%, 72% of each signal’s configured strength) until some overlap exists or the ladder is exhausted.
negative exemplars. Images marked as “not this concept”; candidates too similar to the negative concept direction can be filtered out, mirroring the single-signal search path.
pinned signals (sidebar). Signals the user keeps for quick access; the app highlights when the current search text matches a pin’s label (exact or fuzzy) and shows a fusion badge when multiple catalog signals match.

Formal citations: Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher, “Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods,” SIGIR 2009. Embedding-space similarity scoring follows cosine distance on normalized feature vectors as in CLIP-style dual encoders (Radford et al., “Learning Transferable Visual Models From Natural Language Supervision,” ICML 2021).