Models¶

Looking for pipeline architecture or chunking details? See Embedding model.

Presets¶

Use EMBEDDING_PRESET to choose a named model without memorising Hugging Face paths. The default preset is english.

Preset	Model	Dim	Size	License	Notes
`english` (default)	`Xenova/bge-small-en-v1.5`	384	~34 MB	MIT	English, asymmetric (BGE-style — `Represent this sentence for searching relevant passages:` on queries; empty on documents — applied automatically). Best retrieval under a ~60 MB budget.
`english-fast`	`MongoDB/mdbr-leaf-ir`	768	~22 MB	Apache-2.0	Retrieval-tuned 23M-param distillation of `mxbai-embed-large-v1` (Matryoshka student). Asymmetric — `Represent this sentence for searching relevant passages:` query prefix applied automatically. Sister model `MongoDB/mdbr-leaf-mt` is for general/clustering; `-ir` is what we wire here for RAG. v1.7.4: replaced `Xenova/paraphrase-MiniLM-L3-v2`.
`english-quality`	`Xenova/bge-base-en-v1.5`	768	~110 MB	MIT	Highest English CPU quality. Asymmetric. Over the default size budget but worth it when you have the RAM / disk.
`multilingual`	`Xenova/multilingual-e5-small`	384	~135 MB	MIT	94 languages. Asymmetric (E5 prefixes auto-applied).
`multilingual-quality`	`Xenova/multilingual-e5-base`	768	~279 MB	MIT	Highest-quality multilingual preset via transformers.js (no Ollama needed) — but see KNOWN ISSUES below. Asymmetric.
`multilingual-ollama`	`qwen3-embedding:0.6b` (via Ollama)	1024	~600 MB	Apache-2.0	Highest-quality multilingual preset. 100+ languages, 32 768-token context, instruction-aware retrieval. Beats `multilingual-quality` by +5.3pp MTEB-multilingual (64.3 vs 59.0) with 64× the context window. Requires Ollama + `ollama pull qwen3-embedding:0.6b`.

Example MCP client config with a preset:

{
  "mcpServers": {
    "obsidian-brain": {
      "command": "npx",
      "args": ["-y", "obsidian-brain@latest", "server"],
      "env": {
        "VAULT_PATH": "/absolute/path/to/your/vault",
        "EMBEDDING_PRESET": "multilingual-ollama"
      }
    }
  }
}

Quality ranking¶

MTEB scores from the user's research. Higher is better. Preset names highlighted where applicable.

English presets¶

Model	Preset	MTEB eng avg
`onnx-community/embeddinggemma-300m-ONNX`	(BYOM — v1.8.0 preset candidate)	0.6524
`Alibaba-NLP/gte-modernbert-base`	(BYOM — v1.8.0 preset candidate)	0.6421
`jinaai/jina-embeddings-v5-text-nano`	(BYOM — future)	0.6286
`Xenova/bge-base-en-v1.5`	`english-quality`	0.6138
`MongoDB/mdbr-leaf-ir`	`english-fast`	retrieval-tuned, distilled from `mxbai-embed-large-v1`
`Xenova/bge-small-en-v1.5`	`english` (default)	0.5863

Multilingual presets¶

Model	Preset	MTEB multi avg
`Qwen/Qwen3-Embedding-0.6B`	`multilingual-ollama`	64.33
`BAAI/bge-m3`	(BYOM — Ollama route, `ollama pull bge-m3`)	59.56
`intfloat/multilingual-e5-large-instruct`	(BYOM — Ollama route)	—
`Xenova/multilingual-e5-base`	`multilingual-quality`	59.0
`Xenova/multilingual-e5-small`	`multilingual`	—

How model metadata is resolved¶

Per-model metadata (output dim, max tokens, query / document prefix) is resolved through a 6-step layered chain so canonical presets are zero-network and BYOM models still get the correct prefix without you declaring anything:

User overrides — ~/.config/obsidian-brain/model-overrides.json, managed via npx obsidian-brain models add <id> … and models override <id> …. Topmost layer; survives npm update. When a user override fully specifies all three load-bearing fields (maxTokens, queryPrefix, documentPrefix), the resolver short-circuits the rest of the chain entirely — no HF round-trip on first use.
Hot cache — embedder_capability table in your vault DB. Lives forever once written; the cached fields are immutable for a given HF id. Invalidate explicitly with npx obsidian-brain models refresh-cache (or --model <id> for one entry) when you want to pick up an upstream config correction.
Bundled seed — data/seed-models.json shipped in the npm tarball. Regenerated at every release from MTEB's Python registry (zero HF API calls). Covers ~348 dense, text-only, open-weights embedding models — every canonical preset plus every popular community model. Cache miss + seed hit copies the seed entry into the cache (one-time; subsequent boots are cache hits). A user-fetched copy at ~/.config/obsidian-brain/seed-models.json (managed via models fetch-seed) takes priority when present, so users can pull MTEB fixes without waiting for an npm release.
Live HF fetch — for BYOM models not in the seed, getEmbeddingMetadata(modelId) reads config.json + tokenizer_config.json + sentence_bert_config.json + config_sentence_transformers.json + modules.json in parallel, plus the upstream base_model's same JSON when the direct repo lacks prompts. 5s timeout, 2 retries with backoff.
Embedder probe + safe defaults — if HF is unreachable, dim is probed from the loaded ONNX pipeline; max-tokens defaults to 512; symmetry assumed. Stderr warning surfaces the degraded state. Boot continues.

This replaces three hardcoded tables that previously lived in the codebase (a getTransformersPrefix family-pattern if/else, a KNOWN_MAX_TOKENS validation map, and dim / symmetric / sizeMb / lang columns on EMBEDDING_PRESETS). Wrong-prefix bugs become impossible because upstream HF configs are now the source of truth, not us.

Bring Your Own Model (BYOM)¶

The EMBEDDING_MODEL env var accepts any Hugging Face model id supported by transformers.js. Set EMBEDDING_PROVIDER=transformers (default) or EMBEDDING_PROVIDER=ollama depending on the model's available route.

{
  "env": {
    "VAULT_PATH": "/path/to/vault",
    "EMBEDDING_MODEL": "Alibaba-NLP/gte-modernbert-base"
  }
}

BYOM Ollama auto-pull (allowlist + opt-in)¶

For EMBEDDING_PROVIDER=ollama, auto-pull behavior depends on whether the model is in Ollama's official library namespace:

`EMBEDDING_MODEL` shape	Auto-pulls on first boot?	Why
`nomic-embed-text` (no `/`)	✅ Yes	Official `library` namespace (allowlisted)
`library/llama3:8b` (explicit `library/` prefix)	✅ Yes	Official namespace (allowlisted)
`qwen3-embedding:0.6b` (no `/`)	✅ Yes	Official namespace (allowlisted)
`user/custom-fork` (third-party namespace)	❌ No	Requires `OBSIDIAN_BRAIN_OLLAMA_BYOM_AUTO_PULL=1` opt-in
`myregistry.com/team/model:tag` (third-party)	❌ No	Same — requires opt-in
Any model with `EMBEDDING_PRESET` set	✅ Yes	Preset path uses default-ON auto-pull

The allowlist exists because Ollama has no built-in trust gate (ollama/ollama#11941 tracks the proposed "Secure Mode"; CVE-2024-37032 was a real path-traversal exploit via crafted manifests). For BYOM models outside the official namespace, three escape hatches:

Pull manually: ollama pull user/custom-fork
Opt in to auto-pull: set OBSIDIAN_BRAIN_OLLAMA_BYOM_AUTO_PULL=1
Use a preset instead: EMBEDDING_PRESET=multilingual-ollama

The master kill-switch OBSIDIAN_BRAIN_OLLAMA_AUTO_PULL=0 overrides everything and disables auto-pull entirely (including preset paths). See Troubleshooting → Ollama BYOM model not pulling for the actionable error format.

Pre-flight a model before committing:

npx obsidian-brain models check Alibaba-NLP/gte-modernbert-base

models check goes directly to the live HuggingFace API (skipping the cache + seed chain — always fresh) (~2s, no model download). Reports dim, max tokens, query / document prefix, prefix source, base model, and ONNX size. Add --load to also download + load the ONNX weights for end-to-end validation (~30s).

Other models subcommands (full reference in CLI):

Command	What it does
`models list [--all] [--filter <q>]`	Lists the 6 presets by default. `--all` surfaces every entry in the bundled seed (~348 models). `--filter` narrows by id substring. Zero network calls.
`models recommend`	Scans your vault and recommends `english` or `multilingual` based on non-Latin character ratio
`models prefetch [preset]`	Downloads ONNX weights for a preset's model so first `index` doesn't pay download cost
`models check <id>`	Fetches HF metadata directly (~2s). Skips the cache/seed chain — always live HF. Add `--load` to also download + load.
`models add <id>`	Register a new model not in the seed (writes to `~/.config/obsidian-brain/model-overrides.json`). Refuses if id already in seed or overrides
`models override <id>`	Patch fields on an existing model id (writes to overrides file). Survives `npm update`. `--list` dumps every override; `--remove [--field name]` clears
`models fetch-seed`	Download latest seed from `main` branch on GitHub. Bypasses npm-release wait. Schema-version-aware
`models refresh-cache [--model <id>]`	Invalidate cached metadata. Cheap (~0 HF calls for seeded models). Does NOT require `VAULT_PATH`

EmbedderLoadError kinds:

Kind	Cause	Fix
`not-found`	Model id doesn't exist on HF Hub	Check the model id — look under `Xenova/` for ONNX ports
`no-onnx`	Repo exists but has no ONNX weights	Use a Xenova port, or use Ollama for the model
`offline`	Network unavailable at load time	Ensure HF Hub is reachable, or pre-fetch with `models prefetch`

BYOM model catalogue¶

Models documented here are not built-in presets but can be used today (or in the near future) via BYOM configuration.

Model	License	Route	MTEB eng	MTEB multi	Status
`jinaai/jina-embeddings-v5-text-nano`	CC-BY-NC-4.0	—	0.6286	—	Future — no ONNX in official repo (GGUF only); Ollama support pending (#14641)
`intfloat/multilingual-e5-large-instruct`	MIT	Ollama (community)	—	0.7781	Usable today via Ollama
`Alibaba-NLP/gte-modernbert-base`	Apache-2.0	transformers.js	0.6421	—	Usable today; planned to become `english-longctx` preset in a future release
`onnx-community/embeddinggemma-300m-ONNX`	Gemma Terms	transformers.js / Ollama	0.6524	—	Usable today; preset candidate for v1.8.0
`onnx-community/mdbr-leaf-mt-ONNX`	Apache-2.0	transformers.js	—	—	Usable today; best sub-30M on MTEB; preset upgrade candidate for v1.8.0

intfloat/multilingual-e5-large-instruct — exact recipe¶

Highest MTEB multilingual score (0.7781) in the catalogue. MIT licensed. No transformers.js ONNX port, but a community Ollama model works:

ollama pull qllama/multilingual-e5-large-instruct

Then configure obsidian-brain:

{
  "env": {
    "VAULT_PATH": "/absolute/path/to/your/vault",
    "EMBEDDING_PROVIDER": "ollama",
    "EMBEDDING_MODEL": "qllama/multilingual-e5-large-instruct"
  }
}

jinaai/jina-embeddings-v5-text-nano — status¶

CC-BY-NC-4.0, 212M parameters, 8192-token context, MTEB eng 0.6286 (+6.9pp over bge-base). The official HF repo contains only GGUF weights — no ONNX — so it cannot be loaded via transformers.js. Ollama support is tracked at ollama#14641. Not usable via either provider today. Watch that issue for progress.

Alibaba-NLP/gte-modernbert-base — BYOM recipe¶

Apache-2.0, 149M parameters, 8192-token context, MTEB eng 0.6421 (+8.3pp over bge-base). ONNX weights are in the official repo; ModernBERT architecture is supported in transformers.js v3.7.0+.

{
  "env": {
    "VAULT_PATH": "/absolute/path/to/your/vault",
    "EMBEDDING_MODEL": "Alibaba-NLP/gte-modernbert-base"
  }
}

This model is planned to become the english-longctx first-class preset in a future release.

onnx-community/embeddinggemma-300m-ONNX — BYOM recipe¶

Gemma Terms (permissive for embeddings), 308M parameters, MTEB eng 0.6524 (+9.3pp over bge-base, highest English score in this catalogue). ONNX weights available (fp32/q8/q4).

{
  "env": {
    "VAULT_PATH": "/absolute/path/to/your/vault",
    "EMBEDDING_MODEL": "onnx-community/embeddinggemma-300m-ONNX"
  }
}

This model is planned to become a first-class preset in a future release.

onnx-community/mdbr-leaf-mt-ONNX — BYOM recipe¶

Apache-2.0, 23M parameters — the best sub-30M model on MTEB. ONNX weights available (fp32/q8/q4).

{
  "env": {
    "VAULT_PATH": "/absolute/path/to/your/vault",
    "EMBEDDING_MODEL": "onnx-community/mdbr-leaf-mt-ONNX"
  }
}

Preset upgrade candidate for a future release.

Known issues¶

multilingual-quality: token_type_ids bug¶

Xenova/multilingual-e5-base (the multilingual-quality preset) has a known pipeline mismatch in transformers.js (#267, #938): notes longer than approximately 400 words trigger a "Missing the following inputs: token_type_ids" ONNX shape error. When this happens:

The note-level embedding fails and is skipped.
The failure is recorded in the failed_chunks table and surfaced via the index_status tool.
A stderr warning is emitted when EMBEDDING_PRESET=multilingual-quality is resolved, pointing to this issue.

Impact: users with multilingual vaults containing long notes will see those notes missing from semantic search results. The index_status tool will show a non-zero failedChunksTotal.

Recommended workaround: switch to multilingual-ollama (qwen3-embedding:0.6b), which does not have this limitation, handles 32 768-token context, and scores +5.3pp higher on MTEB multilingual benchmarks.

{
  "env": {
    "VAULT_PATH": "/absolute/path/to/your/vault",
    "EMBEDDING_PRESET": "multilingual-ollama"
  }
}

If you cannot run Ollama, multilingual (Xenova/multilingual-e5-small) is more tolerant of longer notes than multilingual-quality because its lower capacity means it hits the length limit less often in practice.

Long-note handling¶

obsidian-brain splits notes at markdown headings (H1–H4) and further splits oversized sections on paragraph and sentence boundaries, preserving code fences and $$…$$ LaTeX blocks. Notes of any length are handled correctly — you do not need to keep notes short for semantic search to work.

Token budget — the chunk size is capped at 90% of the model's advertised model_max_length (read from the tokenizer config). This leaves headroom for the task prefix and avoids silent truncation. For Ollama models the budget is derived from /api/show.

TreeRAG parent-heading prefix (ACL 2025) — each chunk is prefixed with its nearest parent heading path before embedding (e.g. "Project Alpha > Goals > Q2"). This improves retrieval for multi-chunk notes by giving the embedder topical context that would otherwise be lost at chunk boundaries.

Override the budget — if you need a smaller or larger chunk window (e.g. for a model with a stale tokenizer config), set:

{ "env": { "OBSIDIAN_BRAIN_MAX_CHUNK_TOKENS": "512" } }

When set, this value overrides the tokenizer-derived budget entirely.

When your embedder is full¶

Occasionally a chunk is too long for the model even after the budget calculation — for example, a single code block or table that cannot be split further. The fault-tolerant indexer handles this gracefully:

Skip + log, not recurse-halve — the failing chunk is skipped and logged to stderr. obsidian-brain does not attempt to recursively halve the chunk; industry consensus (NAACL 2025) is that repeated halving produces incoherent sub-chunks with worse retrieval than skipping.
failed_chunks table — every skipped chunk is recorded with its note path, chunk offset, and the error message. The table persists across restarts.
Visible via index_status — the index_status MCP tool reports chunksSkippedInLastRun so you can see at a glance how many chunks were skipped and whether the count is growing.
Adaptive budget tightening — each "too long" failure lowers the cached discovered_max_tokens value, so subsequent chunks aim for a smaller budget and the same failure is less likely to repeat.

If you see a non-zero chunksSkippedInLastRun, check the notes listed in failed_chunks — they typically contain unusually large code blocks or tables. Trimming them or setting OBSIDIAN_BRAIN_MAX_CHUNK_TOKENS to a smaller value resolves the issue.

License catalogue¶

Quick reference for every model licensed in this document. obsidian-brain ships only model id strings, not weights — license obligations attach to the user who downloads from HF or Ollama.

License	Models	Interpretation
MIT	`Xenova/bge-small-en-v1.5`, `Xenova/bge-base-en-v1.5`, `Xenova/multilingual-e5-small`, `Xenova/multilingual-e5-base`, `bge-m3`, `intfloat/multilingual-e5-large-instruct`	Permissive. Commercial use, modification, and distribution allowed with attribution. No copyleft.
Apache-2.0	`Alibaba-NLP/gte-modernbert-base`, `MongoDB/mdbr-leaf-ir`, `MongoDB/mdbr-leaf-mt`, `onnx-community/mdbr-leaf-mt-ONNX`, `Qwen/Qwen3-Embedding-0.6B`, `qwen3-embedding:0.6b`	Permissive. Commercial use allowed. Patent grant included. Attribution required in distributed copies.
CC-BY-NC-4.0	`jinaai/jina-embeddings-v5-text-nano`	Non-commercial only without a separate commercial licence from Jina AI. obsidian-brain does not redistribute weights; the user accepts this licence when downloading from HF.
Gemma Terms	`onnx-community/embeddinggemma-300m-ONNX`	Google's Gemma licence. Permissive for most uses including personal and commercial embedding workloads; redistribution of model weights requires attribution and compliance with Gemma's acceptable use policy.