Search, scraping, web intelligence (web-stack: SearXNG, Crawl4AI)
  • Python 98.4%
  • Dockerfile 1.6%
Find a file
g1admin ad827ae5bb
Some checks failed
Sync web-search compose to Coolify / sync-coolify (push) Successful in 6s
Secret Scan / scan (push) Failing after 4s
fix(crawl4ai): bump 0.8.0→0.8.5, add shm_size=1g + mem_limit=2g
P0: Crawl4AI at 95.2% memory refusing new browsers.
- Image bump picks up upstream memory leak fixes
- shm_size=1g prevents Chromium /dev/shm exhaustion
- mem_limit=2g caps runaway browser processes

Dispatch: d38528b8
2026-03-29 17:48:18 +00:00
.forgejo/workflows ci: add Coolify sync workflow for web-search (SearXNG + Crawl4AI) 2026-03-22 18:49:52 +00:00
.gitea/issue_template chore: add infrastructure change issue template 2026-03-04 13:48:20 +00:00
compose fix(crawl4ai): bump 0.8.0→0.8.5, add shm_size=1g + mem_limit=2g 2026-03-29 17:48:18 +00:00
config Initialize config directory 2026-02-23 17:56:22 +00:00
docs Initialize docs directory 2026-02-23 17:55:38 +00:00
steel-mcp feat: add steel-mcp, Steel Browser, and web-tools compose (Rule 62 fix) 2026-03-22 05:41:45 +00:00
.env.example feat: sync live compose from production (2026-02-27) 2026-02-26 22:09:35 +00:00
.gitignore Add .gitignore 2026-02-25 15:39:27 +00:00
CLAUDE.md Update CLAUDE.md: Web Lead agent naming 2026-02-26 22:12:38 +00:00
LICENSE Add MIT LICENSE 2026-02-25 15:39:51 +00:00
README.md docs: premium README overhaul 2026-03-21 05:10:23 +00:00

g1-web 🌐

Web intelligence for Generate One — privacy-first search, browser automation, and content extraction exposed as MCP tools.

Status License Platform MCP Namespace


Overview

g1-web provides the web intelligence layer for the Generate One platform. SearXNG aggregates results from dozens of search engines without tracking, while Crawl4AI handles browser-based scraping and structured content extraction. Steel Browser adds a fully managed browser session API. All three services are exposed as MCP tools under the g1-web namespace, giving AI agents sovereign web access.

Search queries are optionally enhanced via LLM-powered rewriting using the g1-llm-mini model (Qwen3-235B on Cerebras) before being dispatched to SearXNG. This can be bypassed per-call.

🏗️ Architecture

graph TD
    A[Claude Code / LibreChat] -->|MCP tools| B[MetaMCP<br/>mcp.generate.one]
    B -->|g1-web namespace| C[searxng-mcp<br/>FastMCP wrapper]
    B -->|g1-web namespace| D[crawl4ai-mcp<br/>FastMCP wrapper]
    B -->|g1-web namespace| E[steel-mcp<br/>FastMCP wrapper]
    C -->|query rewrite| F[g1-llm-mini<br/>Qwen3-235B]
    C --> G[SearXNG<br/>Metasearch engine]
    D --> H[Crawl4AI<br/>Browser automation]
    E --> I[Steel Browser<br/>Session API]

📦 Services

Service Image Port Description
SearXNG searxng/searxng:latest 8080 Privacy-focused metasearch engine aggregating 40+ sources
Crawl4AI unclecode/crawl4ai:latest 11235 Async web crawler with browser automation and LLM extraction
Steel Browser (managed) Managed browser session pool with CDP support
searxng-mcp Python FastMCP SSE MCP wrapper — searxng_web_search, web_url_read
crawl4ai-mcp Python FastMCP SSE MCP wrapper — crawl, get_markdown, extract_links, smart_crawl, etc.
steel-mcp Python FastMCP SSE MCP wrapper — steel_scrape, steel_screenshot, steel_create_session, etc.

🚀 Quick Start

# Service directory (Coolify-managed)
cd /data/coolify/services/eegd3jgmg2vd5e9lxnew0iqu

# Apply changes
docker compose up -d

# Test SearXNG directly
curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results[:3]'

# Test via MCP namespace
# Use g1-web:searxng_web_search or g1-web:crawl tools via MetaMCP

🔧 Configuration

Variable Description Default
SEARXNG_SECRET SearXNG instance secret key
CRAWL4AI_API_TOKEN Crawl4AI authentication token
REWRITE_MODEL LLM model for query rewriting g1-llm-mini

🔗 Dependencies

Depends on:

  • g1-llm — LiteLLM for query rewriting model access (g1-llm-mini)
  • svc-tools — MetaMCP routes the g1-web namespace to these services

Depended on by:

  • g1-gpt — LibreChat webSearch feature uses SearXNG
  • g1-api — Fusio /v1/search endpoint proxies to SearXNG
  • Claude Code — web search and scraping via g1-web MCP namespace

📊 MCP Tools Reference

Tool Service Description
searxng_web_search SearXNG Privacy-first web search with optional LLM query rewrite
web_url_read SearXNG Fetch and parse a URL as clean text
crawl Crawl4AI Crawl a URL and return structured markdown
get_markdown Crawl4AI Extract clean markdown from any page
extract_links Crawl4AI Extract all links from a page
smart_crawl Crawl4AI Intelligent multi-page crawl with depth control
batch_crawl Crawl4AI Crawl multiple URLs in parallel
extract_with_llm Crawl4AI LLM-powered structured data extraction
steel_scrape Steel Scrape with full browser rendering
steel_screenshot Steel Capture screenshot of any page
steel_create_session Steel Create a managed browser session
Repo Relationship
svc-tools MetaMCP routes g1-web namespace
g1-llm LiteLLM for query rewriting
g1-api Fusio search endpoint
g1-gpt LibreChat web search

🛡️ Part of Generate One

Generate One — AI infrastructure that answers to you.

Self-hosted, sovereign AI platform. generate.one