- Python 98.4%
- Dockerfile 1.6%
P0: Crawl4AI at 95.2% memory refusing new browsers. - Image bump picks up upstream memory leak fixes - shm_size=1g prevents Chromium /dev/shm exhaustion - mem_limit=2g caps runaway browser processes Dispatch: d38528b8 |
||
|---|---|---|
| .forgejo/workflows | ||
| .gitea/issue_template | ||
| compose | ||
| config | ||
| docs | ||
| steel-mcp | ||
| .env.example | ||
| .gitignore | ||
| CLAUDE.md | ||
| LICENSE | ||
| README.md | ||
g1-web 🌐
Web intelligence for Generate One — privacy-first search, browser automation, and content extraction exposed as MCP tools.
✨ Overview
g1-web provides the web intelligence layer for the Generate One platform. SearXNG aggregates results from dozens of search engines without tracking, while Crawl4AI handles browser-based scraping and structured content extraction. Steel Browser adds a fully managed browser session API. All three services are exposed as MCP tools under the g1-web namespace, giving AI agents sovereign web access.
Search queries are optionally enhanced via LLM-powered rewriting using the g1-llm-mini model (Qwen3-235B on Cerebras) before being dispatched to SearXNG. This can be bypassed per-call.
🏗️ Architecture
graph TD
A[Claude Code / LibreChat] -->|MCP tools| B[MetaMCP<br/>mcp.generate.one]
B -->|g1-web namespace| C[searxng-mcp<br/>FastMCP wrapper]
B -->|g1-web namespace| D[crawl4ai-mcp<br/>FastMCP wrapper]
B -->|g1-web namespace| E[steel-mcp<br/>FastMCP wrapper]
C -->|query rewrite| F[g1-llm-mini<br/>Qwen3-235B]
C --> G[SearXNG<br/>Metasearch engine]
D --> H[Crawl4AI<br/>Browser automation]
E --> I[Steel Browser<br/>Session API]
📦 Services
| Service | Image | Port | Description |
|---|---|---|---|
| SearXNG | searxng/searxng:latest |
8080 | Privacy-focused metasearch engine aggregating 40+ sources |
| Crawl4AI | unclecode/crawl4ai:latest |
11235 | Async web crawler with browser automation and LLM extraction |
| Steel Browser | (managed) | — | Managed browser session pool with CDP support |
| searxng-mcp | Python FastMCP | SSE | MCP wrapper — searxng_web_search, web_url_read |
| crawl4ai-mcp | Python FastMCP | SSE | MCP wrapper — crawl, get_markdown, extract_links, smart_crawl, etc. |
| steel-mcp | Python FastMCP | SSE | MCP wrapper — steel_scrape, steel_screenshot, steel_create_session, etc. |
🚀 Quick Start
# Service directory (Coolify-managed)
cd /data/coolify/services/eegd3jgmg2vd5e9lxnew0iqu
# Apply changes
docker compose up -d
# Test SearXNG directly
curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results[:3]'
# Test via MCP namespace
# Use g1-web:searxng_web_search or g1-web:crawl tools via MetaMCP
🔧 Configuration
| Variable | Description | Default |
|---|---|---|
SEARXNG_SECRET |
SearXNG instance secret key | — |
CRAWL4AI_API_TOKEN |
Crawl4AI authentication token | — |
REWRITE_MODEL |
LLM model for query rewriting | g1-llm-mini |
🔗 Dependencies
Depends on:
g1-llm— LiteLLM for query rewriting model access (g1-llm-mini)svc-tools— MetaMCP routes theg1-webnamespace to these services
Depended on by:
g1-gpt— LibreChat webSearch feature uses SearXNGg1-api— Fusio/v1/searchendpoint proxies to SearXNG- Claude Code — web search and scraping via
g1-webMCP namespace
📊 MCP Tools Reference
| Tool | Service | Description |
|---|---|---|
searxng_web_search |
SearXNG | Privacy-first web search with optional LLM query rewrite |
web_url_read |
SearXNG | Fetch and parse a URL as clean text |
crawl |
Crawl4AI | Crawl a URL and return structured markdown |
get_markdown |
Crawl4AI | Extract clean markdown from any page |
extract_links |
Crawl4AI | Extract all links from a page |
smart_crawl |
Crawl4AI | Intelligent multi-page crawl with depth control |
batch_crawl |
Crawl4AI | Crawl multiple URLs in parallel |
extract_with_llm |
Crawl4AI | LLM-powered structured data extraction |
steel_scrape |
Steel | Scrape with full browser rendering |
steel_screenshot |
Steel | Capture screenshot of any page |
steel_create_session |
Steel | Create a managed browser session |
🔗 Related Repos
| Repo | Relationship |
|---|---|
| svc-tools | MetaMCP routes g1-web namespace |
| g1-llm | LiteLLM for query rewriting |
| g1-api | Fusio search endpoint |
| g1-gpt | LibreChat web search |
🛡️ Part of Generate One
Generate One — AI infrastructure that answers to you.
Self-hosted, sovereign AI platform. generate.one