OKF Ecosystem Tools
An honest inventory of what exists today (Jun 2026) around the Open Knowledge Format. For each tool: what it does, how mature it actually is, and whether you should bother.
1. Reference Enrichment Agent (BigQuery → OKF Bundles)
What it is: An agent that pulls metadata from a pluggable source (currently only BigQuery) and emits a complete OKF bundle — a directory of markdowns with YAML frontmatter ready for humans, LLMs, and catalog tools.
How it works:
- BQ pass: Generates one OKF doc per concept using BigQuery metadata alone (schemas, descriptions, tables).
- Web pass: The LLM (Gemini via ADK) acts as a crawler — receives seed URLs, decides which links are authoritative documentation, then enriches existing docs or creates new reference docs.
Stack: Python 3.13, Google Agent Development Kit (ADK), Gemini as the model backend.
Running it:
# Install
python3.13 -m venv .venv
.venv/bin/pip install -e .[dev]
# Credentials
gcloud auth application-default login
export GEMINI_API_KEY=<your-key> # or Vertex AI
# Run (minimal)
.venv/bin/python -m enrichment_agent enrich \
--source bq \
--dataset <project>.<dataset> \
--web-seed-file seeds.txt \
--out ./bundles/<name>
# BigQuery only (no web crawl)
.venv/bin/python -m enrichment_agent enrich \
--source bq \
--dataset bigquery-public-data.ga4_obfuscated_sample_ecommerce \
--no-web \
--out ./bundles/ga4Included sample bundles:
bundles/ga4/— GA4 e-commercebundles/stackoverflow/— Stack Overflow public datasetbundles/crypto_bitcoin/— Bitcoin blocks/transactions
Link: github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf
Limitations:
- BigQuery is the only implemented source (the
Sourceinterface exists but nothing else plugs in) - Requires Gemini API key or Vertex AI configured
- Web pass can burn through tokens fast if you feed it too many seeds
- No incremental updates — runs from scratch every time
🟡 Maturity: Functional proof of concept. The bundles it produces are legit and useful. But the agent itself is a demo of what’s possible, not a product. It genuinely works for BigQuery public datasets. For production use, you’ll want to customize prompts and seeds — and probably add caching.
2. Static HTML Visualizer (viz.html)
What it is: A visualize subcommand that takes any OKF bundle and spits out a self-contained HTML file — interactive concept graph, detail panel, search, type filters, backlinks. No backend, no installation on the viewer side.
What you get:
- Force-directed graph (Cytoscape.js) with nodes colored by type
- Side panel with rendered frontmatter + markdown body
- Navigable internal links within the viewer
- “Cited by” section (computed backlinks)
- Search by title, ID, tags
- Alternative layouts (cose, concentric, breadthfirst, circle, grid)
Generating it:
.venv/bin/python -m enrichment_agent visualize --bundle ./bundles/ga4
# Produces bundles/ga4/viz.html
# Customize
.venv/bin/python -m enrichment_agent visualize \
--bundle ./bundles/crypto_bitcoin \
--out /tmp/btc.html \
--name "Bitcoin OKF"Using it: Open viz.html in any modern browser. Host on a static file server, email it to someone, commit it to the repo. It just works.
Link: Same repo — okf/README.md#visualize
Limitations:
- Large bundles produce heavy HTML files (everything is inlined as JSON)
- The viewer is a minimal SPA — no pagination, no lazy loading
- Depends on CDN for Cytoscape.js and marked.js (not truly offline without tweaks)
🟢 Maturity: This one actually works. It’s simple, does what it promises, and the viz.html files committed to the repo are great for demos. For bundles with 10–50 concepts, it’s perfect. At 500+, you’ll probably hit performance walls.
3. kcmd CLI + MCP Server (Metadata as Code)
What it is: A bidirectional sync tool between local metadata (YAML/markdown on your filesystem) and Google Cloud Knowledge Catalog (formerly Dataplex). Think “git for metadata” — you edit locally and push/pull to the cloud catalog.
Format: YAML for entries, sidecar .md files for rich content (overviews, descriptions). Hierarchical layout mirroring the resource structure.
Distribution: TypeScript library (npm install kcmd), standalone CLI (kcmd), and an MCP server.
CLI usage:
# Initialize a snapshot from a BigQuery dataset
kcmd init --bigquery-dataset <projectId>.<datasetId>
# Pull metadata from catalog
kcmd pull
# Check local changes
kcmd status
# Push changes to catalog (with dry-run)
kcmd push --dry-run
kcmd pushMCP Server config:
{
"mcpServers": {
"kc-mac": {
"command": "kcmd",
"args": ["mcp", "--path", "/path/to/root"]
}
}
}Available MCP tools: pull, push, list-entries, lookup-entry, modify-entry.
Where you can plug it in:
- Gemini CLI / Google AI Studio
- Claude Desktop (via MCP config)
- Cursor / VS Code (any editor with MCP support)
- Custom agents (LangChain, ADK, etc.)
Link: github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/toolbox/mdcode
Limitations:
- Requires a GCP project with Knowledge Catalog enabled
- Auth via
gcloudonly (no direct service account support) - The YAML/sidecar format differs from pure OKF (it’s oriented toward the Dataplex catalog)
- Documentation is still sparse
🟡 Maturity: Early product, but well-structured. The CLI works, the MCP server is real, and the push/pull workflow makes intuitive sense. The fact that it ships as library + CLI + MCP shows intent to serve varied environments. Still no versioned npm releases though — so pin your expectations accordingly.
4. Google Cloud Knowledge Catalog (The Backend)
What it is: The GCP product (formerly Dataplex) that acts as an AI-powered metadata catalog. It’s the “official backend” that the tools above sync with.
Relevant features for OKF:
- Automatic harvesting from BigQuery, AlloyDB, Spanner, Cloud SQL, Firestore, Looker
- Third-party integrations: Ab Initio, Anomalo, Atlan, Collibra, Datahub
- Native Gemini enrichment — generates descriptions, glossaries, maps entities
- Sub-second semantic search for agents
- Context APIs + MCP tools for agents to discover assets
- Data products — asset packaging with SLAs and governance
Pricing (summary):
- Free tier: 100 DCU-hour/month + 1 MiB storage + 1M API calls/month
- Standard: $0.06/DCU-hour
- Premium (lineage, quality, profiling): $0.089/DCU-hour
- Storage: $2/GiB/month (above 1 MiB)
Link: cloud.google.com/products/knowledge-catalog
Limitations:
- GCP vendor lock-in (that’s literally why
kcmdexists — portability bridge) - Pricing can scale fast with heavy DCU-hour usage
- The “native format” is NOT OKF — OKF is the portable interop layer
🟢 Maturity: GA Google Cloud product. It’s real, runs in production, has SLA, has enterprise support. The Knowledge Catalog itself is mature — what’s new is the open-source tooling around it.
5. Possible Integrations
5.1 Obsidian
Status: No official plugin. But OKF was deliberately designed to work with Obsidian out of the box.
Why it just works:
- OKF bundles are directories of
.mdfiles with YAML frontmatter — exactly what Obsidian expects - Internal links work as relative paths
- Frontmatter tags show up natively
- The
index.mdworks as an index note
How to use today:
- Generate a bundle with the enrichment agent
- Open the bundle directory as a vault in Obsidian
- Navigate, edit, use the native graph view
What a proper plugin would add:
- Inline validation (OKF conformance linting)
- Templates for new concepts
- Sync with Knowledge Catalog via kcmd
🟢 Natural compatibility. No plugin needed — works by design. A plugin would be nice for validation, but it’s not blocking anything.
5.2 GitHub Actions
Status: No official Action published. But every command is scriptable.
Possible workflows:
# .github/workflows/okf-validate.yml
name: Validate OKF Bundle
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.13'
- run: pip install okf-validator # when it exists
- run: okf validate ./bundles/
# .github/workflows/okf-enrich.yml (advanced)
name: Enrich on Schedule
on:
schedule:
- cron: '0 6 * * 1' # Every Monday
jobs:
enrich:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install -e ./okf[dev]
- run: |
python -m enrichment_agent enrich \
--source bq --dataset ${{ secrets.BQ_DATASET }} \
--no-web --out ./bundles/weekly
- uses: peter-evans/create-pull-request@v6
with:
title: "chore: weekly OKF enrichment"🟡 High potential, zero official implementation. The “enrich → commit → PR” workflow is natural for OKF. Someone should publish a reusable Action — it’s low-hanging fruit.
5.3 Coding Agents (Claude, Codex, Cursor, Gemini)
Status: No official skill published. This is the most obvious gap.
What exists today:
- The OKF README works as documentation for an agent to understand the format
- The
SPEC.mdis readable enough for an LLM to generate conformant bundles - The toolbox/enrichment uses
tools/skills/as an agent skill directory
What’s missing:
- A standalone
.mdskill that teaches any agent to produce OKF - Generation-time validation (the agent checks conformance before saving)
- Reusable templates for common scenarios (SaaS metrics, analytics, APIs)
Skill pattern already used by the toolbox:
---
name: fileset-source
description: >
Use the fileset source to find relevant markdown documents...
---
[tool usage instructions]🟡 Clear opportunity. The OKF format was made to be agent-friendly, but nobody has packaged it as a distributable skill yet.
Maturity Map
| Tool | Maturity | Works today? | Who’s it for? |
|---|---|---|---|
| Enrichment Agent (OKF) | 🟡 Functional PoC | Yes, with setup | Data engineers exploring |
| Visualizer (viz.html) | 🟢 Ready | Yes | Anyone with a bundle |
| kcmd (Metadata as Code) | 🟡 Early product | Yes, with GCP | Teams using Knowledge Catalog |
| Knowledge Catalog (GCP) | 🟢 GA | Yes | Enterprise |
| Obsidian | 🟢 Native | Yes | Anyone |
| GitHub Actions | 🟡 DIY | Scriptable | DevOps/SRE |
| MCP Server (kcmd) | 🟢 Functional | Yes | Agent builders |
| Coding Agent Skills | 🟡 Opportunity | Partially | Up for grabs |
The Two Layers
The OKF ecosystem splits cleanly into two:
Portable layer (pure OKF): Format spec + enrichment agent + visualizer. Works standalone, no GCP required. This is where community opportunities live.
Enterprise layer (Knowledge Catalog): kcmd + catalog enrichment + GCP product. Works in production but demands Google Cloud infrastructure.
If you’re building for the portable layer, you can start today with zero cloud dependencies. If you need the enterprise layer, budget for GCP setup and expect a steeper ramp.