mcp-bookmarks: Turning a Reading Habit into a Knowledge Engine

I read a lot.

Books, excerpts from technology books, but mostly articles: blog posts, engineering deep dives, architecture postmortems, language design essays, conference talks, research papers someone summarized on their Substack. On a good week I get through a hundred or so. I recently discovered a tab on Chrome Mobile with 2,673 bookmarks. The speed of information I’m interested in is hard to keep up with. I have subscriptions to the O’Reilly Learning Platform, Medium, Feedly, and others. I don’t want to miss a thing.

The problem isn’t finding things to read. The problem is that I read something genuinely useful — a breakdown of how a distributed system handles clock skew, a pattern for structuring DynamoDB tables, a hard-won lesson about LLM prompt engineering, a useful snippet — and six weeks later I can’t retrieve it. I remember that I read it. I remember it was good. I cannot remember where it was or what it said.

I tried Pocket. I tried Notion. I tried a Raindrop.io subscription I never fully used. The pattern was always the same: save the link, never come back to it, feel vaguely ashamed that my “knowledge base” was a graveyard of unread links.

What I actually wanted wasn’t a bookmark manager. I wanted something that would do the reading for me — extract what mattered, tag it correctly, write a summary worth returning to — so that saving a link created a real artifact of knowledge I could find later. I wanted zero friction at the point of saving and maximum value at the point of retrieval.

So I built Blogmarks.

Version 1: The Lazy Architecture

The first version was an exercise in shipping fast by cutting every corner I could justify.

I needed a way to save links. I needed a way to browse them later. I didn’t want to build an admin interface. The laziest viable path was Sanity.io — a headless CMS with a hosted, configurable admin UI — feeding a Next.js static site that rebuilt itself whenever content changed. I wasn’t going to use Sanity for what it was designed for, but it had a clean API and I didn’t have to write a single admin screen.

The static Next.js build worked fine. I saved links through Sanity’s editor, the site rebuilt, and the links appeared. It was slow. The rebuild pipeline took a minute or two. There was no AI, no tagging intelligence, no content extraction. It was a list of links with titles and whatever snippet of text I’d typed into Sanity manually.

It was better than nothing, barely.

The fundamental mistake wasn’t Sanity — it was treating the saved bookmark as the end of the pipeline. The link went in, stopped moving, and waited for me to do something with it. I never did.

Version 2: The Pipeline Insight

The rebuild started with a different premise: saving a bookmark is the start of a pipeline, not the end.

When a link comes in, the right behavior is:

Store the URL immediately so it’s not lost
Fetch the full article content in the background
Analyze the content with an AI model: summarize it, extract meaningful tags, write a description worth re-reading
Store the enriched result back, now searchable and categorized

The user experience collapses to one gesture: share the link. Everything else is automatic.

I rebuilt on AWS. The PWA frontend registers as a Share Target — on Android, “Share” from any browser offers Blogmarks as a target, same as sharing to WhatsApp or Twitter. Tapping it fires the URL at a Lambda, which writes the raw link to DynamoDB and returns immediately. A DynamoDB Stream triggers a second Lambda — the AI processor — which runs the enrichment pipeline in the background.

Sanity was gone. The admin UI was gone. The static build was gone. The app became a real-time PWA reading from DynamoDB directly.

The AI Processor: A Four-Agent CrewAI Pipeline

The enrichment Lambda uses CrewAI to decompose the problem into specialized agents. Each agent is cheap to run and focused on a single responsibility:

URL → Scraper (Haiku) → Analyst (Sonnet) → Tagger (Haiku) → Publisher (Haiku)

Scraper fetches the full page content. It tries Bright Data’s Web Unlocker API first (for paywalled or JS-heavy pages) and falls back to a direct fetch. It outputs the raw article text, word count, title, description, and OG metadata as JSON.

Analyst reads the scraper output and produces three things: a definitive title, a two-to-three sentence summary worth re-reading months later, and the best available image URL. No tools — pure reasoning from the extracted text.

Tagger is the most interesting agent. Before assigning tags, it reads the full tag taxonomy through a GetTagsTool call. The taxonomy isn’t a flat list of strings — each tag has a description field that defines its scope and distinguishes it from semantically adjacent tags. The tagger reads those descriptions before deciding. If no existing tag fits, it calls CreateTagTool to extend the taxonomy. The result is consistent tagging: the same concept gets the same tag across hundreds of saves rather than accumulating near-duplicate variations.

Publisher writes the enriched item back to DynamoDB. The loop closes.

The pipeline runs asynchronously from the user’s perspective. You share a link, see it appear in your feed within seconds (with a pending indicator), and a minute or two later it’s enriched with a real summary and proper tags.

The Tag Taxonomy Problem

Consistent tagging is a harder problem than it looks.

If you ask an LLM to tag a bookmark about React Server Components, it might use react, server-components, nextjs, frontend, ssr, rsc, or some combination. If you ask it again next week about a different RSC article, you might get a different set. After a hundred saves, your tag cloud is meaningless.

The solution is giving the LLM institutional memory about its own prior decisions. The bookmarks://taxonomy resource returns every tag with its full metadata:

[
  {
    "slug": "server-components",
    "name": "Server Components",
    "description": "React Server Components and related patterns for server-side rendering with React 18+. Covers RSC, streaming SSR, Suspense boundaries, and Next.js App Router.",
    "usage_count": 14
  }
]

Before making any tagging decision, the agent reads this. The description tells it when to use the tag and — implicitly — when not to. An article about general SSR concepts without React specifics wouldn’t match server-components. An article that would genuinely extend the taxonomy can create a new tag with a description that delimits its future scope.

Over time, the taxonomy becomes a curated ontology of your reading interests. High usage_count tags reveal what topics you actually follow. The tag descriptions capture distinctions you care about, built incrementally through AI curation rather than manual effort.

Introducing mcp-bookmarks

Blogmarks as described above is a personal PWA. It runs on AWS, requires Cognito authentication, and is built for a browser.

But I wanted to interact with my knowledge base differently. I wanted to save a link while in a Claude conversation without leaving the chat. I wanted to ask “what have I read about distributed tracing?” and get an answer from my actual saved content. I wanted the bookmark manager to be a first-class participant in my AI workflow, not a separate tab I’d have to switch to.

Model Context Protocol (MCP) was the answer.

MCP is Anthropic’s open standard for connecting LLMs to external tools and data sources. A conforming MCP server exposes tools (actions), resources (readable data), and prompts (reusable templates) that any compatible client — Claude Desktop, Claude Code, any IDE with MCP support — can use. The model invokes tools the same way it would a function call, but the implementation lives in your server.

mcp-bookmarks is an MCP server for Blogmarks. It exposes the bookmark knowledge base as a set of tools and resources that Claude can read and write directly, without the user leaving their chat context.

The MCP Server Design

The server is built with FastMCP and exposes 14 tools:

Tool	What it does
`save_bookmark`	Save a URL (triggers full pipeline)
`extract_content`	Fetch and parse article text from a URL
`search_bookmarks`	Full-text search across titles, summaries, content
`read_bookmark`	Read a single bookmark with full content
`get_tags`	List all tags in the taxonomy
`create_tag`	Add a new tag to the taxonomy
`tag_bookmark`	Associate tags with a bookmark
`set_summary`	Write or update a bookmark’s summary
`set_bookmark_body`	Store the full extracted article text
`delete_bookmark`	Remove a bookmark
`update_tag`	Edit a tag’s name or description
`delete_tag`	Remove a tag from the taxonomy
`merge_tags`	Combine two tags into one, reassigning all associations
`get_stats`	Count of bookmarks, tags, and usage events

Two resources make the knowledge base readable without tool calls:

bookmarks://taxonomy — the full tag taxonomy with descriptions and usage counts (what the tagger agent reads before making decisions)
bookmarks://recent/{n} — the last N saved bookmarks with their summaries and tags

The workflow inside Claude looks like this:

User: "Save this article and find related things I've read"

Claude:
1. Calls save_bookmark(url) → stores article, triggers extraction
2. Calls extract_content(url) → gets article text
3. Calls search_bookmarks("distributed systems consensus") → finds related saves
4. Reads bookmarks://taxonomy → surfaces relevant topic tags
5. Synthesizes a response with the new save + related prior reading

One gesture, inside the conversation, no tab switching.

Two Backends, One Interface

The server supports two storage backends behind the same tool interface:

SQLite is the default. The database lives at ~/.mcp-bookmarks/bookmarks.db and requires zero infrastructure. Clone the repo, set an API key, run uv run mcp-bookmarks, and you have a fully functional knowledge base backed by a local file. The schema auto-migrates on startup.

DynamoDB mode activates by setting DYNAMODB_MODE=true. In this mode, the server reads and writes the same tables as the Blogmarks PWA — blogmarks-links and blogmarks-tags. Items saved through mcp-bookmarks with userId=mcp-agent trigger the same DynamoDB Stream, which fires the same CrewAI enrichment Lambda. The AI enrichment happens automatically, serverlessly, even when saving through the MCP interface.

This dual-backend design separates the MCP contract from the storage concern. The tools work identically regardless of backend. Developers who want a self-hosted, zero-cloud setup use SQLite. Blogmarks subscribers use DynamoDB and get the full serverless pipeline behind their MCP tools.

The Blogmarks visible in my public demos are my own account, opened up deliberately so people can see what the product looks like with real data in it. Subscribers do not inherit that account or step into a shared workspace. Each subscriber gets a separate account, starts from zero, and decides what stays private and what, if anything, becomes public.

Multi-Tenancy and Usage Metering

What started as a personal tool is becoming a product, which means multi-tenancy.

The server supports multiple isolation models:

# Single static key + org (simple case)
MCP_STATIC_API_KEY=sk-mykey:org-default

# Multiple keys, multiple orgs (multi-tenant)
MCP_API_KEYS=key1:org-alice,key2:org-bob,key3:org-charlie

Each API key maps to an organization ID. Every database query scopes to the requesting org’s data. One server instance handles multiple tenants without data leakage between them.

Usage metering tracks every tool invocation per tenant:

METERED_EVENTS = [
    "mcp_save_bookmark",
    "mcp_extract_content",
    "mcp_create_tag",
    "mcp_search_bookmarks",
    "mcp_set_summary",
    # ...
]

Monthly totals accumulate in a usage_events table. A configurable MCP_MONTHLY_USAGE_LIMIT rejects tool calls that would exceed the quota. Stripe webhooks update subscription state when plans change. The billing infrastructure is wired in — not bolted on after the fact.

The RAG-as-a-Service Direction

Here’s the part I’m most excited about.

The MCP server gives Claude access to your bookmark knowledge base during a conversation. That’s already valuable. But it requires you to consciously invoke tools — asking Claude to search your bookmarks, explicitly saving a link during a session.

The more interesting capability is passive context injection: automatically inserting relevant fragments from your knowledge base into the model’s context before it answers, without you asking. You ask Claude about database indexing strategies; the system notices you’ve saved fourteen articles on the topic and surfaces the three most semantically relevant summaries as background context before generating its answer. The model’s response is informed by your specific reading history, not just its training data.

This is RAG — Retrieval-Augmented Generation — but powered by knowledge you built over time, not a static document corpus. The bookmark_embeddings table already exists in the schema, storing vector embeddings per bookmark. The semantic search infrastructure is there. The remaining work is the retrieval layer: a plugin interface that lets any MCP-compatible client inject your bookmark context automatically, without manual invocation.

The product angle is this: your accumulated reading becomes a persistent layer of context that improves every AI conversation you have, on any topic you’ve studied. You don’t configure it per-session. You don’t manually retrieve things. You just read, save, and the knowledge accretes. The friction is at zero.

The architecture for this is straightforward:

User question → Semantic search against bookmark_embeddings
                    → retrieve top-K relevant summaries
                    → inject as system context
                    → model answers with your knowledge as background

The embedding model runs on save (or lazily on first search). The retrieval runs on every query. The user does nothing.

What It Took to Get Here

The path from “lazy Sanity.io static site” to production-grade MCP server ran through several rewrites and one clarifying insight.

The Sanity rewrite replaced editorial curation (me, manually writing descriptions) with automated enrichment (AI doing it on save). The automation had to be invisible — if saving a link felt like triggering a workflow, I’d stop doing it. It had to feel like sending a message: fire and forget.

The MCP rewrite replaced a browser-only PWA with a protocol-level interface. Claude can now be the primary client. The bookmark manager stopped being an app and became an API for intelligence.

The product rewrite is current. Self-hosted SQLite for developers who want full local control. DynamoDB + Lambda + serverless pipeline for cloud users. Multi-tenancy for teams. Usage metering for billing. The personal project infrastructure is becoming product infrastructure, one layer at a time.

None of this would have happened if I’d kept patching the Sanity integration. The breakthrough was recognizing that the problem wasn’t “I need a better bookmark manager.” The problem was “I need a system that builds knowledge on my behalf, with the bookmark as the trigger and not the destination.”

Getting Started

Self-hosted (SQLite):

git clone https://github.com/kayaman/mcp-bookmarks
uv sync
uv run mcp-bookmarks

Add to Claude Desktop’s MCP config:

{
  "mcpServers": {
    "bookmarks": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/mcp-bookmarks", "mcp-bookmarks"],
      "env": {
        "MCP_STATIC_API_KEY": "your-key:default"
      }
    }
  }
}

DynamoDB mode (requires Blogmarks AWS infrastructure):

DYNAMODB_MODE=true \
DYNAMODB_LINKS_TABLE=blogmarks-links \
DYNAMODB_TAGS_TABLE=blogmarks-tags \
DYNAMODB_ORG_ID=your-org \
uv run mcp-bookmarks

What’s Next

Embedding-based retrieval. The schema has bookmark_embeddings. The next step is populating it on save and exposing a search_bookmarks_semantic tool that returns by meaning rather than keyword.

Automatic context injection. The passive RAG layer — querying your knowledge base before answering, without you asking — is the product’s core differentiator. This requires an MCP resource that Claude can read automatically on session start, plus a retrieval prompt template.

Knowledge graph. Tags are a flat taxonomy. The interesting next representation is a graph: articles can reference other articles, topics have prerequisite relationships, reading sequences emerge from content analysis. The data for this exists in the content and links; the extraction is a pipeline problem.

Team knowledge bases. The multi-tenant infrastructure supports org-level isolation today. Team knowledge bases — where a shared reading culture across an engineering team builds collective institutional memory — is a natural extension.

Reading has always been about building a mental model of a domain, incrementally, over time. The frustration was that the mental model lived only in my head, degraded with time, and couldn’t be queried.

The solution wasn’t a smarter bookmark app. It was an AI-powered pipeline that turns the act of saving a link into the act of adding to a structured, searchable, persistent knowledge base — and then an MCP interface that makes that knowledge base a first-class participant in every AI conversation I have.

Zero friction at save time. Full knowledge at retrieval time.

That’s the idea Blogmarks is being built around. We’re just getting started.