Admin access required

Sign in with an admin account to continue.

Site Map

Architecture diagrams for the Baseline Labs platform. Click a tab to jump to a section.

Product Architecture

The recursive split pattern: core/ holds shared infrastructure, each product is independently deletable.

core/ — shared infrastructure  api.baselinelabs.ai
shared/ db, config, errors, r2, gpu_pool
api/ auth, admin, log
templates/ shell, nav, footer, head
pages/ login, account, admin/*
scraper/ httpx + Playwright crawler, domain queue
gpu/ Vast.ai pool, vLLM inference workers
provides db provides auth provides config provides templates provides error logging
markupschema/  www.markupschema.com
api/ schema, schema_generator, scrape, inference, rolling, vastai, blog, docs
pages/ home, dashboard, schema-generator, rolling-schema, getting-started, pricing, blog, docs
admin/ gpu-monitor, scraper-monitor, schema-playground, rolling-monitor
consumes scraper consumes gpu consumes R2
geo/  baselinelabs.ai
api/ search, parse, reports, brand_reports, backlinks, mentions, analytics, keywords, screenshots, email
pages/ home, dashboard, getting-started, reports, templates, brand-reports, brand-templates, backlinks, analytics, ranking, analysis, mention-scan, resources, about-us, faq, contact
admin/ errors
consumes scraper consumes DataForSEO consumes GPT / Gemini / Perplexity
Extraction test: take every product/ dir at every depth + every shared/ dir above it. The other product + shared infra must still work.

Docker Services

Six containers on a single VPS. Caddy terminates TLS, two app pools split fast/heavy traffic.

Internet
Ports 80, 443 (HTTPS via 3 domains) + 8080 (scraper direct IP)
caddy  caddy:2-alpine
TLS termination, reverse proxy for 3 domains: www.markupschema.com (MS), baselinelabs.ai (GEO), api.baselinelabs.ai (Core). Routes scrape/inference to bulk, everything else to app. Static asset caching (1hr).
fast → :8080    heavy → :8081
app  :8080
uvicorn, 4 workers. Starts first: compiles pages, runs migrations. Serves all pages, auth, and fast API endpoints. Health check on /api/core/health.
app-bulk  :8081
uvicorn, 2 workers. Waits for app healthy. Handles scrape submit/claim/complete, inference, and direct :8080 scraper traffic.
postgres  postgres:16-alpine
Database: baseline. Schemas: shared, ms, geo. Volume: pgdata.
redis  redis:7-alpine
128 MB, allkeys-lru. No persistence (appendonly no). Job queue + caching.
backup
pg_dump -Fc every hour. 7-day retention. Saves to ./data/.
Caddy domain routing
markupschema.com /api/ms/*, /api/core/* → app | scrape/inference → bulk | clean URLs rewrite to /ms/*
baselinelabs.ai /api/geo/*, /api/core/* → app | /admin/* → /geo/admin/* | clean URLs rewrite to /geo/*
api.baselinelabs.ai /api/core/*, /api/geo/*, /api/ms/* → app | core pages + all admin
:8080 direct IP scraper/GPU workers → bulk :8081
app depends on postgres + redis (service_started). app-bulk depends on app (service_healthy) + postgres + redis. caddy depends on app + app-bulk.

Network Topology

WireGuard mesh connecting mainframe, scraper, and GPU workers. Cloudflare R2 for stateless object storage.

Mainframe 10.10.0.1
Hetzner CX22
  • FastAPI (app :8080 + app-bulk :8081)
  • Caddy (80, 443, 8080)
  • PostgreSQL 16
  • Redis 7
  • Backup (pg_dump hourly)
Scraper 10.10.1.1
Hetzner CX53
  • httpx + Playwright
  • 100 domain capacity
  • Polite delays (robots.txt)
GPU Workers 10.10.2.x
Vast.ai spot instances
  • Qwen3-VL-4B vLLM fp8
  • Chisel reverse tunnel
  • Watchdog auto-respawn
Cloudflare R2
  • Cleaned HTML from scraper
  • Screenshots
  • Schema JSON-LD output
All workers are stateless. R2 is the source of truth for crawled content and inference output. Chisel tunnel overhead ~1ms.

Data Flow — Schema Generation

End-to-end flow from URL submission to schema.org JSON-LD output.

1
User submits URL
User → app :8080 → PostgreSQL
Check schema cache in ms.schema_jobs. If cached and fresh, return immediately.
Cache miss: Insert into shared.scrape_queue (status=pending, trigger_inference=true), return 202 Accepted with job_id.
2
Scraper crawls page
Scraper (10.10.1.1) → app-bulk :8081 → PostgreSQL + R2
Scraper polls /api/ms/scrape/next-job via direct IP :8080. Claims job atomically (status=claimed). Fetches page with httpx + Playwright. Uploads cleaned HTML to R2. Reports complete — scrape_queue marked done, pushed to ms.inference_queue via Redis.
3
GPU runs inference
GPU Worker (10.10.2.x) → Redis → R2 → app-bulk :8081 → PostgreSQL
GPU worker pops from ms.inference_queue. Downloads cleaned HTML from R2. Runs Qwen3-VL-4B vLLM fp8 inference (~1.5s small pages, ~17s large). Uploads schema JSON-LD to R2. Reports complete — logged to shared.inference_log, schema_jobs updated.
4
User polls for result
User → app :8080 → PostgreSQL
Check ms.schema_jobs status. When complete, return schema URL pointing to R2 result_path.
Domain-aware crawling with polite delays (min 1s, default 2s from robots.txt). Watchdog auto-respawns GPU workers on Vast.ai spot preemption.

Database Schema

One PostgreSQL database (baseline), three schemas. All code uses schema-qualified table names. 31 tables total.

shared.* — 13 tables, cross-product
users
  • id PK
  • email UK
  • google_id UK
  • name, role, password_hash
  • is_admin, tier
  • avatar_path
  • created_at, last_login
api_keys
  • id PK
  • user_id FK→users
  • key_hash UK
  • name, rate_limit, active
  • created_at, last_used
scrape_queue
  • id PK
  • url, url_hash UK
  • status, domain, priority
  • html_path, html_hash
  • screenshot_path
  • source, trigger_inference
  • error, retry_count
  • metadata jsonb
inference_log
  • id PK
  • url, html_hash
  • result_path
  • inference_ms
  • error
  • created_at, completed_at
workers
  • id PK
  • worker_type, hostname, ip
  • status, concurrency
  • jobs_completed, jobs_failed
  • cpu_percent, memory_mb
  • gpu_type, gpu_util_percent
  • last_heartbeat
  • metadata jsonb
error_logs
  • id PK
  • service, level, product
  • message, stack_trace
  • request_path, request_method
  • user_id FK→users
  • resolved, resolved_at, resolved_by
  • metadata jsonb
request_logs
  • id PK
  • method, path, status_code
  • response_ms, product
  • user_id FK→users
  • ip, timestamp
usage_logs
  • id PK
  • api_key_id FK→api_keys
  • endpoint, status_code
  • response_ms, cache_hit
  • timestamp
scrape_passes
  • id PK
  • urls_at_start, domains_at_start
  • urls_completed, urls_harvested
  • started_at, completed_at
async_operations
  • id PK
  • operation_id, operation_type
  • user_id FK→users
  • save_id FK→geo_saves
  • status, progress, current_step
  • metadata jsonb
subscriptions
  • id PK
  • user_id FK→users
  • stripe_subscription_id
  • category, plan, status
  • current_period_start/end
sms_messages
  • id PK
  • from_number, to_number
  • body, direction
  • twilio_sid
  • created_at
oauth_codes
  • id PK
  • code UK
  • user_id FK→users
  • client_id, redirect_uri
  • scope, expires_at
ms.* — 4 tables, MarkupSchema only
inference_queue
  • id PK
  • url, html_hash, html_path
  • status, worker_id, priority
  • screenshot_path, result_path
  • domain, input_mode
  • error, retry_count
  • callback_data jsonb
schema_jobs
  • id PK
  • user_id FK→users
  • domain, status
  • pages_discovered, pages_scraped
  • pages_generated
  • urls jsonb
  • error
rolling_domains
  • id PK
  • user_id FK→users
  • domain UK w/ user
  • status, pages_per_hour
  • sitemap_url, last_sitemap_fetch
  • created_at
rolling_pages
  • id PK
  • domain_id FK→rolling_domains
  • url UK
  • last_html_hash, last_schema_at
  • schema_path, status
  • error
geo.* — 16 tables, GEO Studio only
geo_saves
  • id PK
  • user_id FK→users
  • name, primary_site
  • business_info jsonb
  • created_at, updated_at
geo_queries
  • id PK
  • user_id FK→users
  • save_id FK→saves
  • query_text, search_type
  • results jsonb
  • position
report_templates
  • id PK
  • user_id FK→users
  • save_id FK→saves
  • name, description
  • queries jsonb
reports
  • id PK
  • user_id FK→users
  • save_id FK→saves
  • template_id FK→templates
  • status, engines jsonb
report_query_status
  • id PK
  • report_id FK→reports
  • query_text, engine
  • status, position, error
brand_report_templates
  • id PK
  • user_id FK→users
  • save_id FK→saves
  • business_name, industry
  • products, competitors jsonb
backlinks
  • id PK
  • source_url, source_domain
  • target_url, target_domain
  • anchor_text, rel, position
  • first_seen, last_seen
  • times_seen
mention_profiles
  • id PK
  • user_id FK→users
  • save_id FK→saves
  • business_name, industry
  • products, competitors jsonb
mention_scans
  • id PK
  • user_id FK→users
  • profile_id FK→profiles
  • status, total_queries
  • completed_queries, mention_count
  • overall_sentiment_score
mention_queries
  • id PK
  • scan_id FK→scans
  • query_text, layer, intent
  • status, result_count
mention_raw_results
  • id PK
  • scan_id FK→scans
  • query_id FK→queries
  • url, title, snippet
  • rank_in_query, source_domain
mention_records
  • id PK
  • scan_id FK→scans
  • source_url, source_type
  • sentiment_score, sentiment_label
  • relevance_score
mention_summaries
  • id PK
  • scan_id FK→scans UK
  • overview jsonb
  • source_map jsonb
  • executive_summary jsonb
  • recommendations jsonb
rankings
  • id PK
  • query_id FK→queries
  • engine, position
  • url, title
ai_analytics
  • id PK
  • user_id FK→users
  • event_type, feature
  • metadata jsonb
email_queue
  • id PK
  • user_id FK→users
  • template, to_email
  • status, sent_at
shared.users → api_keys, usage_logs, error_logs, request_logs (all ON DELETE CASCADE)
shared.users → ms.schema_jobs (user's schema generation jobs)
shared.users → geo.saves, queries, keywords, reports, mention_profiles, mention_scans (all ON DELETE CASCADE)
geo.saves → queries, keywords, reports, mention_profiles, async_operations (all ON DELETE CASCADE)
geo.mention_profiles → scans → queries → raw_results, records (cascade chain)
geo.mention_scans → mention_summaries (1:1, ON DELETE CASCADE)
shared.* = cross-product (13) | ms.* = MarkupSchema only (4) | geo.* = GEO only (16) | All queries use schema-qualified names

Pages + APIs

Three domains, each serving one product context. Pages are config.json + content.html compiled into the shell. APIs are auto-discovered FastAPI routers. Caddy rewrites clean URLs to internal /ms/* and /geo/* paths.

api.baselinelabs.ai
PAGE /login
PAGE /account
PAGE /console/dashboard
PAGE /console/api-keys
PAGE /console/mcp
PAGE /admin/dashboard
PAGE /admin/errors
PAGE /admin/users
PAGE /admin/request-logs
PAGE /admin/reports
PAGE /admin/docs
PAGE /admin/log
PAGE /admin/metadata
PAGE /admin/sms
PAGE /admin/editor
PAGE /admin/tests
PAGE /admin/mainframe
PAGE /admin/site-map
API auth
API oauth
API admin
API billing
API sms
API editor
API log
API testing
API mcp
www.markupschema.com
PAGE /home
PAGE /dashboard
PAGE /schema-generator
PAGE /rolling-schema
PAGE /getting-started
PAGE /pricing
PAGE /blog
PAGE /docs
PAGE /admin/gpu-monitor
PAGE /admin/scraper-monitor
PAGE /admin/schema-playground
PAGE /admin/rolling-monitor
API schema
API schema_generator
API scrape
API inference
API rolling
API vastai
API blog
API docs
baselinelabs.ai
PAGE /home
PAGE /dashboard
PAGE /getting-started
PAGE /reports
PAGE /reports/new
PAGE /reports/view
PAGE /templates
PAGE /templates/edit
PAGE /templates/overview
PAGE /brand-reports
PAGE /brand-reports/new
PAGE /brand-reports/view
PAGE /brand-templates
PAGE /brand-templates/edit
PAGE /backlinks
PAGE /analytics
PAGE /ranking
PAGE /analysis
PAGE /mention-scan
PAGE /resources
PAGE /about-us
PAGE /faq
PAGE /contact
PAGE /admin/errors
API search
API parse
API reports
API brand_reports
API backlinks
API mentions
API analytics
API keywords
API screenshots
API email
server.py auto-discovers any .py with a router export. compile.py compiles pages. Caddy rewrites clean URLs: markupschema.com/X → /ms/X, baselinelabs.ai/X → /geo/X internally.

Billing System

Three layers: Stripe subscriptions, tier limits (hard caps), and unified credits (usage-based). Admin/internal bypass all checks.

1
Stripe Subscriptions
core/api/billing.py → shared.subscriptions
One subscription per user per category. Three categories: ms_api (Schema API), ms_rolling (Rolling Schema), geo (GEO Studio). Checkout, plan changes, and cancellations via Stripe. Webhooks sync status to DB.
2
Tier Limits (Hard Caps)
core/shared/credits.py → check_tier_limit()
Per-category caps enforced at resource creation. Not credits — these gate features.
GEO: Free=1 domain, Essentials=3, Classic=10, Select=100
MS API: Free=100 req/day, Starter=1k, Pro=10k
MS Rolling: 2 domains, 100 pages/hour
3
Unified Credits (Usage-Based)
core/shared/credits.py → require_credits()
Single credit pool per user across all products. Deducted atomically before each operation. 402 response if insufficient.
Sources: Monthly subscription grant (invoice.paid webhook), top-up purchase, admin grant
Sinks: Search results (2-28 credits/result by engine), brand scans (50-200), reports, schema generation (100)
4
Enforcement Points
All auth paths → same user_id → same credit balance
Browser cookies, API keys, and MCP/OAuth tokens all resolve to the same user_id. Credit checks and tier limits apply identically regardless of auth method. Admin users (is_admin=true) and internal API keys bypass all checks — operations are logged but not deducted.
Credit Tables (shared.*)
credit_balances user_id PK, balance, updated_at
credit_ledger append-only audit: amount, source, category, operation, reference_id
subscriptions user_id + category UK, stripe IDs, status, period dates
one balance per user cross-product atomic deducts
MS Tiers + Credits
Free 100 req/day, 50 credits/mo
Starter 1k req/day, 500 credits/mo — €29
Pro 10k req/day, 2k credits/mo — €99
Rolling 2 domains, 100 pg/hr — €19
GEO Tiers + Credits
Free 1 domain, 50 credits/mo
Essentials 3 domains, 1k credits/mo — €19
Classic 10 domains, 5k credits/mo — €39
Select 100 domains, 50k credits/mo — €59
Webhook flow: invoice.paid → _grant_monthly_credits() → grant_credits() → credit_balances += monthly_credits. Usage flow: API call → require_credits(amount) → atomic UPDATE ... WHERE balance >= amount → credit_ledger entry → 402 if insufficient. Top-ups: separate Stripe product → same grant_credits() path → same balance pool.
George
Online
0%

Hi, I'm George.

Ask me about your projects, reports, brand mentions, backlinks, or anything on the platform.