OKF Ecosystem Tools

An honest inventory of what exists today (Jun 2026) around the Open Knowledge Format. For each tool: what it does, how mature it actually is, and whether you should bother.

1. Reference Enrichment Agent (BigQuery → OKF Bundles)

What it is: An agent that pulls metadata from a pluggable source (currently only BigQuery) and emits a complete OKF bundle — a directory of markdowns with YAML frontmatter ready for humans, LLMs, and catalog tools.

How it works:

BQ pass: Generates one OKF doc per concept using BigQuery metadata alone (schemas, descriptions, tables).
Web pass: The LLM (Gemini via ADK) acts as a crawler — receives seed URLs, decides which links are authoritative documentation, then enriches existing docs or creates new reference docs.

Stack: Python 3.13, Google Agent Development Kit (ADK), Gemini as the model backend.

Running it:

# Install
python3.13 -m venv .venv
.venv/bin/pip install -e .[dev]

# Credentials
gcloud auth application-default login
export GEMINI_API_KEY=<your-key>  # or Vertex AI

# Run (minimal)
.venv/bin/python -m enrichment_agent enrich \
    --source bq \
    --dataset <project>.<dataset> \
    --web-seed-file seeds.txt \
    --out ./bundles/<name>

# BigQuery only (no web crawl)
.venv/bin/python -m enrichment_agent enrich \
    --source bq \
    --dataset bigquery-public-data.ga4_obfuscated_sample_ecommerce \
    --no-web \
    --out ./bundles/ga4

Included sample bundles:

bundles/ga4/ — GA4 e-commerce
bundles/stackoverflow/ — Stack Overflow public dataset
bundles/crypto_bitcoin/ — Bitcoin blocks/transactions

Link: github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf

Limitations:

BigQuery is the only implemented source (the Source interface exists but nothing else plugs in)
Requires Gemini API key or Vertex AI configured
Web pass can burn through tokens fast if you feed it too many seeds
No incremental updates — runs from scratch every time

🟡 Maturity: Functional proof of concept. The bundles it produces are legit and useful. But the agent itself is a demo of what’s possible, not a product. It genuinely works for BigQuery public datasets. For production use, you’ll want to customize prompts and seeds — and probably add caching.

2. Static HTML Visualizer (viz.html)

What it is: A visualize subcommand that takes any OKF bundle and spits out a self-contained HTML file — interactive concept graph, detail panel, search, type filters, backlinks. No backend, no installation on the viewer side.

What you get:

Force-directed graph (Cytoscape.js) with nodes colored by type
Side panel with rendered frontmatter + markdown body
Navigable internal links within the viewer
“Cited by” section (computed backlinks)
Search by title, ID, tags
Alternative layouts (cose, concentric, breadthfirst, circle, grid)

Generating it:

.venv/bin/python -m enrichment_agent visualize --bundle ./bundles/ga4
# Produces bundles/ga4/viz.html

# Customize
.venv/bin/python -m enrichment_agent visualize \
    --bundle ./bundles/crypto_bitcoin \
    --out /tmp/btc.html \
    --name "Bitcoin OKF"

Using it: Open viz.html in any modern browser. Host on a static file server, email it to someone, commit it to the repo. It just works.

Link: Same repo — okf/README.md#visualize

Limitations:

Large bundles produce heavy HTML files (everything is inlined as JSON)
The viewer is a minimal SPA — no pagination, no lazy loading
Depends on CDN for Cytoscape.js and marked.js (not truly offline without tweaks)

🟢 Maturity: This one actually works. It’s simple, does what it promises, and the viz.html files committed to the repo are great for demos. For bundles with 10–50 concepts, it’s perfect. At 500+, you’ll probably hit performance walls.

3. kcmd CLI + MCP Server (Metadata as Code)

What it is: A bidirectional sync tool between local metadata (YAML/markdown on your filesystem) and Google Cloud Knowledge Catalog (formerly Dataplex). Think “git for metadata” — you edit locally and push/pull to the cloud catalog.

Format: YAML for entries, sidecar .md files for rich content (overviews, descriptions). Hierarchical layout mirroring the resource structure.

Distribution: TypeScript library (npm install kcmd), standalone CLI (kcmd), and an MCP server.

CLI usage:

# Initialize a snapshot from a BigQuery dataset
kcmd init --bigquery-dataset <projectId>.<datasetId>

# Pull metadata from catalog
kcmd pull

# Check local changes
kcmd status

# Push changes to catalog (with dry-run)
kcmd push --dry-run
kcmd push

MCP Server config:

{
  "mcpServers": {
    "kc-mac": {
      "command": "kcmd",
      "args": ["mcp", "--path", "/path/to/root"]
    }
  }
}

Available MCP tools: pull, push, list-entries, lookup-entry, modify-entry.

Where you can plug it in:

Gemini CLI / Google AI Studio
Claude Desktop (via MCP config)
Cursor / VS Code (any editor with MCP support)
Custom agents (LangChain, ADK, etc.)

Link: github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/toolbox/mdcode

Limitations:

Requires a GCP project with Knowledge Catalog enabled
Auth via gcloud only (no direct service account support)
The YAML/sidecar format differs from pure OKF (it’s oriented toward the Dataplex catalog)
Documentation is still sparse

🟡 Maturity: Early product, but well-structured. The CLI works, the MCP server is real, and the push/pull workflow makes intuitive sense. The fact that it ships as library + CLI + MCP shows intent to serve varied environments. Still no versioned npm releases though — so pin your expectations accordingly.

4. Google Cloud Knowledge Catalog (The Backend)

What it is: The GCP product (formerly Dataplex) that acts as an AI-powered metadata catalog. It’s the “official backend” that the tools above sync with.

Relevant features for OKF:

Automatic harvesting from BigQuery, AlloyDB, Spanner, Cloud SQL, Firestore, Looker
Third-party integrations: Ab Initio, Anomalo, Atlan, Collibra, Datahub
Native Gemini enrichment — generates descriptions, glossaries, maps entities
Sub-second semantic search for agents
Context APIs + MCP tools for agents to discover assets
Data products — asset packaging with SLAs and governance

Pricing (summary):

Free tier: 100 DCU-hour/month + 1 MiB storage + 1M API calls/month
Standard: $0.06/DCU-hour
Premium (lineage, quality, profiling): $0.089/DCU-hour
Storage: $2/GiB/month (above 1 MiB)

Link: cloud.google.com/products/knowledge-catalog

Limitations:

GCP vendor lock-in (that’s literally why kcmd exists — portability bridge)
Pricing can scale fast with heavy DCU-hour usage
The “native format” is NOT OKF — OKF is the portable interop layer

🟢 Maturity: GA Google Cloud product. It’s real, runs in production, has SLA, has enterprise support. The Knowledge Catalog itself is mature — what’s new is the open-source tooling around it.

5. Possible Integrations

5.1 Obsidian

Status: No official plugin. But OKF was deliberately designed to work with Obsidian out of the box.

Why it just works:

OKF bundles are directories of .md files with YAML frontmatter — exactly what Obsidian expects
Internal links work as relative paths
Frontmatter tags show up natively
The index.md works as an index note

How to use today:

Generate a bundle with the enrichment agent
Open the bundle directory as a vault in Obsidian
Navigate, edit, use the native graph view

What a proper plugin would add:

Inline validation (OKF conformance linting)
Templates for new concepts
Sync with Knowledge Catalog via kcmd

🟢 Natural compatibility. No plugin needed — works by design. A plugin would be nice for validation, but it’s not blocking anything.

5.2 GitHub Actions

Status: No official Action published. But every command is scriptable.

Possible workflows:

# .github/workflows/okf-validate.yml
name: Validate OKF Bundle
on: [push, pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.13'
      - run: pip install okf-validator  # when it exists
      - run: okf validate ./bundles/

# .github/workflows/okf-enrich.yml (advanced)
name: Enrich on Schedule
on:
  schedule:
    - cron: '0 6 * * 1'  # Every Monday
jobs:
  enrich:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -e ./okf[dev]
      - run: |
          python -m enrichment_agent enrich \
            --source bq --dataset ${{ secrets.BQ_DATASET }} \
            --no-web --out ./bundles/weekly
      - uses: peter-evans/create-pull-request@v6
        with:
          title: "chore: weekly OKF enrichment"

🟡 High potential, zero official implementation. The “enrich → commit → PR” workflow is natural for OKF. Someone should publish a reusable Action — it’s low-hanging fruit.

5.3 Coding Agents (Claude, Codex, Cursor, Gemini)

Status: No official skill published. This is the most obvious gap.

What exists today:

The OKF README works as documentation for an agent to understand the format
The SPEC.md is readable enough for an LLM to generate conformant bundles
The toolbox/enrichment uses tools/skills/ as an agent skill directory

What’s missing:

A standalone .md skill that teaches any agent to produce OKF
Generation-time validation (the agent checks conformance before saving)
Reusable templates for common scenarios (SaaS metrics, analytics, APIs)

Skill pattern already used by the toolbox:

---
name: fileset-source
description: >
  Use the fileset source to find relevant markdown documents...
---

[tool usage instructions]

🟡 Clear opportunity. The OKF format was made to be agent-friendly, but nobody has packaged it as a distributable skill yet.

Maturity Map

Tool	Maturity	Works today?	Who’s it for?
Enrichment Agent (OKF)	🟡 Functional PoC	Yes, with setup	Data engineers exploring
Visualizer (viz.html)	🟢 Ready	Yes	Anyone with a bundle
kcmd (Metadata as Code)	🟡 Early product	Yes, with GCP	Teams using Knowledge Catalog
Knowledge Catalog (GCP)	🟢 GA	Yes	Enterprise
Obsidian	🟢 Native	Yes	Anyone
GitHub Actions	🟡 DIY	Scriptable	DevOps/SRE
MCP Server (kcmd)	🟢 Functional	Yes	Agent builders
Coding Agent Skills	🟡 Opportunity	Partially	Up for grabs

The Two Layers

The OKF ecosystem splits cleanly into two:

Portable layer (pure OKF): Format spec + enrichment agent + visualizer. Works standalone, no GCP required. This is where community opportunities live.
Enterprise layer (Knowledge Catalog): kcmd + catalog enrichment + GCP product. Works in production but demands Google Cloud infrastructure.

If you’re building for the portable layer, you can start today with zero cloud dependencies. If you need the enterprise layer, budget for GCP setup and expect a steeper ramp.