- Notifications
You must be signed in to change notification settings - Fork1
Semantic search ETL pipeline for the MCP server registry. Exposed as both FastAPI and MCP servers
License
lastmile-ai/mcp-registry-search
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Semantic search API for MCP servers using the official Model Context Protocol registry.
Use as:
- REST API (search):https://mcp-registry-search.vercel.app/search?q=kubernetes&limit=2
- REST API (list):https://mcp-registry-search.vercel.app/servers?limit=10&offset=0
- MCP Server (SSE):https://mcp-registry-search.vercel.app/api/sse
Cron job reindexes the entire registry every night.
Built with:
- mcp-agent cloud
- Supabase
- Vercel
Query the registry using the/search
endpoint:
Endpoint:https://mcp-registry-search.vercel.app/search?q=kubernetes&limit=2
Example:
curl"https://mcp-registry-search.vercel.app/search?q=kubernetes&limit=2"
Example Response:
{"results": [ {"id":259,"name":"io.github.vfarcic/dot-ai","description":"AI-powered development platform for Kubernetes deployments and intelligent automation","version":"0.101.0","repository": {"url":"https://github.com/vfarcic/dot-ai","source":"github" },"packages": [ {"version":"0.101.0","transport": {"type":"stdio" },"identifier":"@vfarcic/dot-ai","registryType":"npm" } ],"remotes": [],"similarity_score":0.606411385574579 }, {"id":272,"name":"io.github.containers/kubernetes-mcp-server","description":"An MCP server that provides [describe what your server does]","version":"1.0.0","repository": {"url":"https://github.com/containers/kubernetes-mcp-server","source":"github" },"packages": [],"remotes": [],"similarity_score":0.451448836663574 } ],"query":"kubernetes","limit":2,"count":2}
Query Parameters:
q
(required): Search query stringlimit
(optional): Maximum number of results (default: 10)full_text_weight
(optional): Weight for full-text search (default: 1.0)semantic_weight
(optional): Weight for semantic search (default: 1.0)
List all servers with pagination using the/servers
endpoint:
Endpoint:https://mcp-registry-search.vercel.app/servers?limit=10&offset=0
Example:
curl"https://mcp-registry-search.vercel.app/servers?limit=5"
Query Parameters:
limit
(optional): Maximum number of results (default: 100)offset
(optional): Number of results to skip (default: 0)
Connect to the MCP server via SSE for direct integration with MCP clients:
Endpoint:https://mcp-registry-search.vercel.app/api/sse
Available Tools:
search_mcp_servers(query, limit, full_text_weight, semantic_weight)
- Search servers using hybrid searchlist_mcp_servers(limit, offset)
- List all servers with pagination
Add to your MCP client config:
{"mcpServers": {"registry-search": {"url":"https://mcp-registry-search.vercel.app/api/sse","transport": {"type":"sse" } } }}
- 🔍Hybrid search combining lexical (PostgreSQL full-text) and semantic (pgvector) search
- 🚀Fast vector similarity using OpenAI embeddings + Supabase pgvector
- 📊Ranked results using weighted scoring
- 🔄Automatic ETL pipeline to fetch and index MCP servers
- 🌐FastAPI REST API for web access
- 🔌FastMCP server for MCP client integration
- ☁️Deployable to Vercel (FastAPI) and any MCP-compatible host (FastMCP)
┌─────────────────┐│ MCP Registry ││ (Source API) │└────────┬────────┘ │ │ ETL Pipeline ↓┌─────────────────┐│ Supabase ││ (PostgreSQL + ││ pgvector) │└────────┬────────┘ │ │ Search Query ↓┌─────────────────┬─────────────────┐│ FastAPI REST │mcp-agent Server ││ (Web) │ (MCP Clients) │└─────────────────┴─────────────────┘
uv sync
- Create aSupabase account
- Create a new project
- Run the SQL in
schema.sql
in the Supabase SQL editor - Get your project URL and anon key from Settings > API
cp .env.example .env
Edit.env
and add:
OPENAI_API_KEY
: Your OpenAI API keySUPABASE_URL
: Your Supabase project URLSUPABASE_KEY
: Your Supabase anon key
uv run etl.py
This will:
- Fetch all servers from the MCP registry
- Filter to active servers (latest versions only)
- Generate embeddings using OpenAI
- Store in Supabase with vector indices
FastAPI (REST API):
uvicorn api:app --reload
API available athttp://localhost:8000
- Docs:
http://localhost:8000/docs
FastMCP (MCP Server):
uv run main.py
Search servers:
curl"http://localhost:8000/search?q=kubernetes&limit=5"
List all servers:
curl"http://localhost:8000/servers?limit=100&offset=0"
Health check:
curl"http://localhost:8000/health"
The FastMCP server provides:
Tools:
search_mcp_servers(query, limit, full_text_weight, semantic_weight)
- Search serverslist_mcp_servers(limit, offset)
- List all servers
Resources:
mcp-registry://search/{query}
- Search results as formatted text
Prompts:
find_mcp_server(task)
- Prompt template to find servers for a task
Add to your MCP client config:
{"mcpServers": {"registry-search": {"command":"uv","args": ["run","main.py"],"cwd":"/path/to/mcp-registry-search","env": {"OPENAI_API_KEY":"your-key","SUPABASE_URL":"your-url","SUPABASE_KEY":"your-key" } } }}
- Install Vercel CLI:
npm i -g vercel
- Add environment variables to Vercel:
vercel env add OPENAI_API_KEYvercel env add SUPABASE_URLvercel env add SUPABASE_KEYvercel env add CRON_SECRET# Random secret to protect cron endpoint
- Deploy:
vercel
Automatic ETL Updates:The project includes a Vercel Cron job that runs nightly at midnight (UTC) to refresh the server index. The cron job calls/api/cron/etl
which is protected by theCRON_SECRET
environment variable.
Expose an authenticated upstream SSE endpoint publicly by proxying through a Vercel Edge Function that injects the bearer token and streams responses.
- Configure env vars (in Vercel):
vercel env add UPSTREAM_SSE_URL# e.g. https://<host>/ssevercel env add UPSTREAM_SSE_TOKEN# bearer token for upstream
Alternative names supported for the token:LM_API_KEY
orLM_API_TOKEN
.
- Endpoint path
- The SSE proxy is available at
/api/sse
(seeapi/sse.ts
). - The MCP messages proxy is available at
/api/messages
(seeapi/messages.ts
). - Rewrites expose root paths:
/sse
→/api/sse
,/messages
→/api/messages
for MCP clients that expect root-level endpoints.
- CORS and streaming
- CORS:
Access-Control-Allow-Origin: *
- Streaming: Edge Runtime streams SSE by default; cache disabled.
- Example usage
curl -N https://<your-project>.vercel.app/api/sse# or using the root rewritecurl -N https://<your-project>.vercel.app/sse
- Custom upstream per deployment (optional)
- Override with
UPSTREAM_SSE_URL
in env without changing code. - Messages upstream auto-derives from the SSE URL, or set
UPSTREAM_MESSAGES_URL
explicitly if needed.
We usemcp-agent cloud to deploy and host the MCP server. Under the covers, it's a FastMCP server (seemain.py).
To do so yourself, you can run:
- uv run mcp-agent login
- uv run mcp-agent deploy
To manually refresh the server index:
Locally:
uv run etl.py# ormake etl
On Vercel (trigger cron endpoint):
curl -X GET https://your-project.vercel.app/api/cron/etl \ -H"Authorization: Bearer YOUR_CRON_SECRET"
The automatic nightly cron job handles updates, but you can manually trigger it anytime.
Project structure:
registry/├── api.py # FastAPI REST API (hosted on Vercel)├── main.py # MCP server (hosted on mcp-agent cloud)├── search.py # Search engine├── etl.py # ETL pipeline├── schema.sql # Supabase schema├── pyproject.toml # Dependencies├── vercel.json # Vercel config└── README.md # This file
Apache 2.0
About
Semantic search ETL pipeline for the MCP server registry. Exposed as both FastAPI and MCP servers
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.