Skip to main content

Fincher Labs RAG Agent Implementation Guide (n8n + Cloudflare)

Internal Use Only — Fincher Labs Confidential

Document Control

  • Document Title: Fincher Labs RAG Agent Implementation Guide (n8n + Cloudflare)
  • Document ID: FL-IMP-002
  • Version: 1.0
  • Last Updated: 2025-08-11
  • Author: Fincher Labs
  • Status: Final
  • Distribution: Internal (Fincher Labs Staff and Contractors Only)
  • Approver: Fincher Labs Founders

Change Log

VersionDateAuthorChanges
1.02025-08-11Fincher LabsInitial implementation guide for n8n + Cloudflare RAG, Docusaurus embedding, CI/CD sync, security, and runbook.

Table of Contents

1 Executive Summary

This guide describes how to implement a Retrieval-Augmented Generation (RAG) agent for Fincher Labs that indexes the private GitHub repository’s content/ folder and stays continuously up to date. It uses Cloudflare Vectorize for storage, Workers AI for embeddings and responses, AI Gateway for caching and rate limiting, and n8n for the ingestion, retrieval, and chat orchestration. A lightweight chat widget is mounted in Docusaurus so the agent is accessible on our docs site. The design optimizes for speed, safety, and maintainability.

2 Architecture Overview

2.1 Data Flow

  1. Sync: n8n polls GitHub for changes to content/ (compare API for deltas; tree for full resync).
  2. Ingest: For changed files, n8n fetches raw content, normalizes Markdown, chunks to ~700–1,000 tokens with 80–120 overlap.
  3. Embed: n8n calls Workers AI @cf/baai/bge-m3 to create 1024‑dimensional embeddings.
  4. Store: n8n upserts chunks + metadata to Cloudflare Vectorize via NDJSON.
  5. Serve: User asks a question in Docusaurus widget → n8n retrieves matches from Vectorize, optionally re‑ranks, and composes a grounded answer with source links.

2.2 Components

  • Cloudflare Vectorize: Vector DB (indexes + metadata indexes; filterable search).
  • Cloudflare Workers AI: Embeddings (bge-m3) and a text‑gen model (e.g., Llama 3.1 8B Instruct) for final answer drafting.
  • Cloudflare AI Gateway: Caching, rate limits, retries, model fallback; observability.
  • n8n: Workflows for ingestion (GitHub → Vectorize) and Q&A (Query → Rerank → Answer).
  • Docusaurus: Docs website hosting the chat widget; optional Cloudflare Pages deployment.

3 Cloudflare Platform Setup

3.1 Vectorize (Vector DB)

  • Create index with dimension: 1024 (bge‑m3 dense vectors). Add a metadata index on docId/path for fast deletes/filters.
  • Upsert via application/x-ndjson stream; each line contains id, values, and metadata.
  • Query supports filters on metadata; choose topK (≤ 100; 20 if returning full values/metadata) and return score and metadata.
  • Index info exposes processedUpToMutation to safely wait until mutations are searchable.

Sample: create index

curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/vectorize/v2/indexes"   -H "Authorization: Bearer $CF_API_TOKEN"   -H "Content-Type: application/json"   -d '{
"name": "fincher-docs",
"type": "dense",
"dimension": 1024,
"metric": "cosine"
}'

Sample: create metadata indexes

# Create an index on "docId"
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/vectorize/v2/indexes/fincher-docs/metadata_index/create" -H "Authorization: Bearer $CF_API_TOKEN" -H "Content-Type: application/json" -d '{"propertyName": "docId", "type": "string"}'

# Create an index on "path"
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/vectorize/v2/indexes/fincher-docs/metadata_index/create" -H "Authorization: Bearer $CF_API_TOKEN" -H "Content-Type: application/json" -d '{"propertyName": "path", "type": "string"}'

3.2 Workers AI (Embeddings & LLM)

  • Embeddings: Use @cf/baai/bge-m3 for multilingual, long‑context embeddings (1024‑dimensional, up to ~8k tokens). Batch chunk arrays where possible.
  • LLM: Use a Workers AI text model for answer drafting (for example @cf/meta/llama-3.1-8b-instruct or the -fast variant for latency).

Sample: embed with Workers AI REST

curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3"   -H "Authorization: Bearer $CF_API_TOKEN"   -H "Content-Type: application/json"   -d '{"text": ["Chunk 1 text...", "Chunk 2 text..."]}'

3.3 AI Gateway (Caching, Rate Limits, Retries)

  • Create an AI Gateway and point Workers AI and any external model calls through it for: analytics, caching, rate limiting, automatic retries (up to 5), and fallbacks. Enable caching for identical embedding requests in case of reprocessing.

3.4 Zero Trust Access & Service Tokens

  • Protect the n8n instance (and any custom endpoints) behind Cloudflare Access. Issue Service Tokens (Client ID/Secret) for machine‑to‑machine webhook or chat calls.
  • If exposing webhooks, configure bypass or service‑auth rules only for the required routes.

3.5 Optional: R2 Object Storage

  • If needed, store raw text snapshots or large binary assets in R2. Use pre‑signed URLs for secure, time‑bound upload/download. Keep vector store as the source of truth for search.

4 GitHub Content Sync (content/)

4.1 Authentication

Use either:

  • Fine‑grained PAT with Contents: read (and Metadata: read for compare), or
  • GitHub App installation token with Contents: read on the repo.

4.2 Listing & Fetching Files

  • List tree (recursive) to find Markdown under content/.
  • Fetch content for changed files; response is Base64 for raw file bodies.

4.3 Change Detection (fast)

  • Use Compare two commits (base...head) to fetch changed files since the last processed commit; paginate if needed.

4.4 Full Resync (safe)

  • For cold start or schema changes, walk the Git tree recursively; use ETags with If-None-Match for conditional GETs to minimize rate usage.

5 Ingestion Workflow (n8n)

5.1 Split, Normalize, Chunk

  • Normalize Markdown (join wrapped lines, fix hyphenation, strip artifacts).
  • Chunk to ~700–1,000 tokens with 80–120 overlap. Store metadata per chunk: docId, path, title, version, lastUpdated (from front‑matter or commit), url (docs site route), hash (content digest), and seq (chunk index).

5.2 Embeddings (bge-m3)

  • Batch text arrays for throughput. Keep request size below provider limits. Capture model shape and token counts where available for telemetry.

5.3 Upsert to Vectorize

Send NDJSON lines with stable IDs like "{docId}:{seq}:{hash}" so updates are idempotent. After the stream completes, poll index/info until processedUpToMutation ≥ the returned mutationId.

Sample: NDJSON upsert

# Each line is a JSON object
{"id":"FL-SG-001:12:ab12","values":[0.12, ... 1024 dims ...],"metadata":{"docId":"FL-SG-001","path":"content/styleguide.md","seq":12,"hash":"ab12","title":"Documentation Styleguide"}}
{"id":"FL-SG-001:13:ab12","values":[0.09, ...],"metadata":{"docId":"FL-SG-001","path":"content/styleguide.md","seq":13,"hash":"ab12"}}

5.4 Deletion on Rename/Remove

  • If a file is deleted or renamed, remove its vectors via delete-by-IDs. To collect the IDs, first query the index with a metadata filter (for example, docId or path) and extract id for each match, then call delete_by_ids in batches. Maintain a stable id scheme (for example, "{docId}:{seq}:{hash}") so you can also reconstruct IDs without querying in emergencies.

6 Retrieval & Answering

6.1 Vectorize Query + Filter

  • Query with the user’s question embedding or use text‑to‑embedding with bge‑m3.
  • Use metadata filters (for example, only path under content/) to scope results. Choose topK (≤ 100; if returnValues/returnMetadata is true, effective topK for full payload is 20). Return score and the stored metadata for citation.

Sample: query with filter

curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/vectorize/v2/indexes/fincher-docs/query"   -H "Authorization: Bearer $CF_API_TOKEN"   -H "Content-Type: application/json"   -d '{
"vector": [ ...1024 floats... ],
"topK": 12,
"returnValues": false,
"returnMetadata": true,
"filter": {"path": {"$gte": "content/", "$lt": "content0"}} # prefix match for "content/" per Vectorize string range docs
}'

6.2 Optional Reranker

  • Rerank the retrieved texts with an LLM reranker (for example, bge‑reranker‑base) to improve answer quality at small cost. Keep the final context under the LLM’s max tokens.

6.3 Compose Final Answer

  • Prompt the LLM with the top N chunks (post‑rerank), instructing “answer strictly from the provided context; cite path + title”. Add a safety fallback: if confidence is low or no chunks meet a threshold, respond with “not found in docs”.

7 Frontend Chat & Docusaurus Embed

7.1 n8n Chat Trigger

  • Create a Chat Trigger in n8n; set Allowed origins to your docs domain(s). Use a Respond to Chat node to connect to the retrieval workflow. Enable streaming for better UX.

Client snippet (Docusaurus or any site):

<script src="https://cdn.jsdelivr.net/npm/@n8n/chat/dist/chat.min.js"></script>
<div id="fincher-chat"></div>
<script>
const chat = window.createChat({
webhookUrl: "https://YOUR_N8N/chat/YOUR_TRIGGER_ID",
allowFileUploads: false,
enableStreaming: true,
theme: { title: "Fincher Labs Assistant" },
target: "#fincher-chat",
});
</script>

7.2 Docusaurus Integration

In docusaurus.config.js:

export default {
// ...
scripts: [
{src: "https://cdn.jsdelivr.net/npm/@n8n/chat/dist/chat.min.js", defer: true},
],
stylesheets: [
"https://cdn.jsdelivr.net/npm/@n8n/chat/dist/style.css"
],
};

Place <div id="fincher-chat"></div> in a homepage or DocLayout component. For Pages deployment or Cloudflare Workers hosting, set the site url and baseUrl accordingly.

8 Security

8.1 n8n Allowed Origins

Lock chat to your docs domain(s). If n8n sits behind Cloudflare Tunnel/Access, ensure the correct Origin header is passed or rewrite it via a Cloudflare Transform Rule if needed. Prefer Service Tokens for server‑to‑server calls.

8.2 Turnstile Validation

Protect the chat form with Turnstile. On the server side (Cloudflare Pages Function or Worker), validate tokens via the siteverify endpoint before invoking workflow logic. Rate‑limit failures, and short‑cache success (few minutes).

9 CI/CD & Environments

  • Branching: Production index (fincher-docs) for main. For staging, use fincher-docs-stg and a separate n8n webhook.
  • Deploy: Docusaurus to Cloudflare Pages; n8n via Docker on our preferred host with Cloudflare Tunnel.
  • Secrets: Store all tokens (GitHub, Cloudflare, Service Tokens) in the platform’s secret manager. Never commit to Git.

10 Observability & Performance

  • Route embedding and generation calls through AI Gateway with Caching enabled.
  • Set Rate limits sized to expected QPS; enable automatic retries (up to 5) and fallbacks.
  • Track hit rate, latency, token volumes, and top queries in the Gateway analytics to optimize chunk sizes and prompt length.

11 Failure Modes & Runbook

  • GitHub 304 / ETag flow: On 304, skip processing. If 412/409 with conditional ops, retry without precondition.
  • Vectorize upsert/query errors: Backoff and retry. If the index is re‑created, perform a controlled full resync.
  • Embedding timeouts: Shorten batches or use Gateway fallback to a secondary embedding model.
  • Tunnel/Access issues: Verify service token headers and Origin rewrite. Temporarily bypass Access for the chat route only, if necessary.
  • Widget 4xx: Check n8n Allowed origins and CORS. Confirm Turnstile server‑side validation path.

12 Cost & Scaling Notes

  • Vectorize: Cost scales with vector count and queries. Use de‑duplication by chunk hash to avoid re‑embedding unchanged content.
  • Workers AI: Choose -fast variants for interactive chat; cache embeddings via Gateway to reduce repeat costs.
  • Gateway: Tune cache TTL for embeddings (long), generation (short), and enable per‑route rate limits.

13 Implementation Steps (Checklist)

  1. Cloudflare: Create Vectorize index (1024), add metadata indexes (docId, path). Create AI Gateway and Workers AI token/binding.
  2. n8n: Build Ingestion workflow: GitHub → Diff/Tree → Fetch → Normalize → Chunk → Embed (Workers AI) → Vectorize Upsert (NDJSON) → Wait on processedUpToMutation.
  3. n8n: Build Q&A workflow: Chat Trigger → Embed question → Vectorize Query (+ filter) → Optional Reranker → LLM Answer (cite path + title) → Respond.
  4. Security: Put n8n behind Cloudflare Access (Service Token for machine calls). Set Allowed origins. Add Turnstile + server validation.
  5. Docusaurus: Add scripts + stylesheets, mount <div id="fincher-chat">, and link to the chat webhook.
  6. CI/CD: Configure separate staging/prod indexes + webhooks. Add full‑resync workflow callable on demand.
  7. Observability: Route through AI Gateway with caching/limits/retries; review analytics weekly.

14 References