Can I use Qdrant with EU-hosted inference?

Yes. Qdrant runs on your own infrastructure (or Qdrant Cloud in EU regions). The inference call to JuiceFactory only receives the retrieved context — the vector database itself never leaves your environment.

How does this differ from using OpenAI directly?

Two lines of code change: base_url and api_key. The SDK, request format, and response format are identical. The difference is where the data goes — EU infrastructure with zero retention instead of US servers with 30-day retention.

What embedding model should I use for EU-compliant RAG?

JuiceFactory offers Qwen3-Embed (2560 dimensions) hosted in Stockholm. It outperforms most 1024-dim models on retrieval benchmarks and processes your documents statelessly — no training on your data, no retention.

RAG in Python: Build a GDPR-Safe Document Search API with EU-Hosted Inference

Build a production-ready retrieval-augmented generation (RAG) system in Python that keeps all data within the EU. This guide covers document ingestion with PyMuPDF, vector storage with Qdrant, and LLM inference through Juice Factory's private EU API — all wrapped in a FastAPI service.

By the end, you'll have a working document search API that:

Extracts text from PDFs using PyMuPDF
Generates embeddings and stores them in Qdrant
Answers questions using retrieved context + EU-hosted LLM inference
Never sends user data outside the EU

Prerequisites

Python 3.10+
Docker (for Qdrant)
A Juice Factory API key (get one here)

Architecture Overview

┌──────────────┐     ┌───────────────┐     ┌──────────────────┐
│  PDF Upload  │────▶│  PyMuPDF      │────▶│  Qdrant          │
│  (FastAPI)   │     │  Text Extract │     │  Vector Store    │
└──────────────┘     └───────────────┘     └──────────────────┘
                                                    │
┌──────────────┐     ┌───────────────┐              │
│  User Query  │────▶│  Embedding    │──── search ──┘
│  (FastAPI)   │     │  (EU API)     │
└──────────────┘     └───────┬───────┘
                             │
                     ┌───────▼───────┐     ┌──────────────────┐
                     │  Context +    │────▶│  LLM Inference   │
                     │  Query        │     │  (EU-hosted)     │
                     └───────────────┘     └──────────────────┘

The system follows a standard RAG pipeline, but every component that touches user data runs within EU infrastructure. Qdrant is self-hosted, and both embeddings and LLM inference route through Juice Factory's EU endpoints.

Step 1: Project Setup

Create the project directory and install dependencies:

mkdir rag-document-search && cd rag-document-search
python -m venv .venv
source .venv/bin/activate

Install the required packages:

pip install fastapi uvicorn pymupdf qdrant-client openai python-multipart

Create the project structure:

rag-document-search/
├── main.py              # FastAPI application
├── ingest.py            # Document ingestion pipeline
├── search.py            # Query and retrieval logic
├── config.py            # Configuration
└── requirements.txt

requirements.txt:

fastapi==0.115.0
uvicorn==0.30.0
pymupdf==1.24.0
qdrant-client==1.11.0
openai==1.50.0
python-multipart==0.0.9

Step 2: Configuration

Set up the configuration with your Juice Factory API credentials:

# config.py
import os

# Juice Factory EU API (OpenAI-compatible)
API_BASE_URL = "https://api.juicefactory.ai/v1"
API_KEY = os.environ.get("JUICEFACTORY_API_KEY", "your-api-key")

# Embedding model
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIMENSIONS = 1536

# Chat model for RAG responses
CHAT_MODEL = "gpt-4"

# Qdrant configuration (self-hosted in EU)
QDRANT_HOST = os.environ.get("QDRANT_HOST", "localhost")
QDRANT_PORT = int(os.environ.get("QDRANT_PORT", "6333"))
COLLECTION_NAME = "documents"

# Chunk settings
CHUNK_SIZE = 500       # tokens per chunk (approximate)
CHUNK_OVERLAP = 50     # overlap between chunks
TOP_K = 5              # number of chunks to retrieve

Step 3: Start Qdrant with Docker

Run Qdrant locally (or on your EU server):

docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -p 6334:6334 \
  -v qdrant_storage:/qdrant/storage \
  qdrant/qdrant:latest

Qdrant stores all data locally — no external calls, no telemetry, full control over data location.

Step 4: Document Ingestion with PyMuPDF

The ingestion pipeline extracts text from PDFs, splits it into chunks, generates embeddings via the EU API, and stores everything in Qdrant.

# ingest.py
import fitz  # PyMuPDF
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import uuid
import config


def get_openai_client() -> OpenAI:
    """Create OpenAI client pointing to Juice Factory EU API."""
    return OpenAI(
        api_key=config.API_KEY,
        base_url=config.API_BASE_URL,
    )


def get_qdrant_client() -> QdrantClient:
    """Create Qdrant client."""
    return QdrantClient(host=config.QDRANT_HOST, port=config.QDRANT_PORT)


def extract_text_from_pdf(pdf_bytes: bytes) -> list[dict]:
    """Extract text from PDF, page by page."""
    doc = fitz.open(stream=pdf_bytes, filetype="pdf")
    pages = []
    for page_num, page in enumerate(doc):
        text = page.get_text("text").strip()
        if text:
            pages.append({
                "page": page_num + 1,
                "text": text,
            })
    doc.close()
    return pages


def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
    """Split text into overlapping chunks by word count."""
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = start + chunk_size
        chunk = " ".join(words[start:end])
        chunks.append(chunk)
        start = end - overlap
    return chunks


def generate_embeddings(texts: list[str], client: OpenAI) -> list[list[float]]:
    """Generate embeddings using Juice Factory EU API."""
    response = client.embeddings.create(
        model=config.EMBEDDING_MODEL,
        input=texts,
    )
    return [item.embedding for item in response.data]


def ensure_collection(qdrant: QdrantClient):
    """Create Qdrant collection if it doesn't exist."""
    collections = [c.name for c in qdrant.get_collections().collections]
    if config.COLLECTION_NAME not in collections:
        qdrant.create_collection(
            collection_name=config.COLLECTION_NAME,
            vectors_config=VectorParams(
                size=config.EMBEDDING_DIMENSIONS,
                distance=Distance.COSINE,
            ),
        )


def ingest_pdf(pdf_bytes: bytes, filename: str) -> int:
    """Full ingestion pipeline: PDF → chunks → embeddings → Qdrant."""
    openai_client = get_openai_client()
    qdrant = get_qdrant_client()
    ensure_collection(qdrant)

    # Extract text from PDF
    pages = extract_text_from_pdf(pdf_bytes)

    # Chunk all pages
    all_chunks = []
    for page_data in pages:
        chunks = chunk_text(
            page_data["text"],
            chunk_size=config.CHUNK_SIZE,
            overlap=config.CHUNK_OVERLAP,
        )
        for chunk in chunks:
            all_chunks.append({
                "text": chunk,
                "page": page_data["page"],
                "filename": filename,
            })

    if not all_chunks:
        return 0

    # Generate embeddings (batch)
    texts = [c["text"] for c in all_chunks]
    embeddings = generate_embeddings(texts, openai_client)

    # Store in Qdrant
    points = [
        PointStruct(
            id=str(uuid.uuid4()),
            vector=embedding,
            payload={
                "text": chunk["text"],
                "page": chunk["page"],
                "filename": chunk["filename"],
            },
        )
        for chunk, embedding in zip(all_chunks, embeddings)
    ]

    qdrant.upsert(
        collection_name=config.COLLECTION_NAME,
        points=points,
    )

    return len(points)

Key points:

PyMuPDF (fitz) extracts text without external dependencies or cloud calls
Embeddings are generated through Juice Factory's EU API — same OpenAI SDK, EU endpoint
Qdrant stores vectors locally with no telemetry

Step 5: Search and RAG Query

The search module embeds the user query, retrieves relevant chunks, and sends them with the question to the LLM.

# search.py
from openai import OpenAI
from qdrant_client import QdrantClient
import config
from ingest import get_openai_client, get_qdrant_client, generate_embeddings


def search_documents(query: str, top_k: int = None) -> list[dict]:
    """Search for relevant document chunks."""
    if top_k is None:
        top_k = config.TOP_K

    openai_client = get_openai_client()
    qdrant = get_qdrant_client()

    # Embed the query
    query_embedding = generate_embeddings([query], openai_client)[0]

    # Search Qdrant
    results = qdrant.search(
        collection_name=config.COLLECTION_NAME,
        query_vector=query_embedding,
        limit=top_k,
    )

    return [
        {
            "text": hit.payload["text"],
            "page": hit.payload["page"],
            "filename": hit.payload["filename"],
            "score": hit.score,
        }
        for hit in results
    ]


def rag_query(question: str) -> dict:
    """Full RAG pipeline: embed query → retrieve context → generate answer."""
    # Retrieve relevant chunks
    chunks = search_documents(question)

    if not chunks:
        return {
            "answer": "No relevant documents found. Please upload documents first.",
            "sources": [],
        }

    # Build context from retrieved chunks
    context_parts = []
    for i, chunk in enumerate(chunks, 1):
        context_parts.append(
            f"[Source {i}: {chunk['filename']}, page {chunk['page']}]\n{chunk['text']}"
        )
    context = "\n\n".join(context_parts)

    # Generate answer using EU-hosted LLM
    openai_client = get_openai_client()
    response = openai_client.chat.completions.create(
        model=config.CHAT_MODEL,
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a document assistant. Answer questions based on the "
                    "provided context. Always cite which source and page number your "
                    "answer comes from. If the context doesn't contain enough "
                    "information to answer, say so clearly."
                ),
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}",
            },
        ],
        temperature=0.1,
        max_tokens=1000,
    )

    return {
        "answer": response.choices[0].message.content,
        "sources": [
            {
                "filename": c["filename"],
                "page": c["page"],
                "score": round(c["score"], 4),
                "excerpt": c["text"][:200] + "..." if len(c["text"]) > 200 else c["text"],
            }
            for c in chunks
        ],
        "model": response.model,
        "usage": {
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
        },
    }

The rag_query function is the core of the system:

Embeds the user question via EU API
Retrieves the top-K most relevant chunks from Qdrant
Sends context + question to the EU-hosted LLM
Returns the answer with source citations

Step 6: FastAPI Application

Wire everything together with a FastAPI service:

# main.py
from fastapi import FastAPI, UploadFile, File, HTTPException
from pydantic import BaseModel
from ingest import ingest_pdf
from search import rag_query, search_documents

app = FastAPI(
    title="GDPR-Safe Document Search API",
    description="RAG-powered document search with EU-hosted inference",
    version="1.0.0",
)


class QueryRequest(BaseModel):
    question: str
    top_k: int = 5


class QueryResponse(BaseModel):
    answer: str
    sources: list[dict]
    model: str | None = None
    usage: dict | None = None


@app.post("/upload")
async def upload_document(file: UploadFile = File(...)):
    """Upload a PDF document for indexing."""
    if not file.filename.lower().endswith(".pdf"):
        raise HTTPException(status_code=400, detail="Only PDF files are supported")

    pdf_bytes = await file.read()
    if len(pdf_bytes) > 50 * 1024 * 1024:  # 50MB limit
        raise HTTPException(status_code=400, detail="File too large (max 50MB)")

    num_chunks = ingest_pdf(pdf_bytes, file.filename)

    return {
        "filename": file.filename,
        "chunks_indexed": num_chunks,
        "status": "indexed",
    }


@app.post("/query", response_model=QueryResponse)
async def query_documents(request: QueryRequest):
    """Ask a question about uploaded documents."""
    if not request.question.strip():
        raise HTTPException(status_code=400, detail="Question cannot be empty")

    result = rag_query(request.question)
    return QueryResponse(**result)


@app.post("/search")
async def search_only(request: QueryRequest):
    """Search for relevant chunks without generating an answer."""
    results = search_documents(request.question, top_k=request.top_k)
    return {"results": results}


@app.get("/health")
async def health():
    """Health check endpoint."""
    return {"status": "ok", "data_residency": "EU"}

Step 7: Run and Test

Start the API server:

export JUICEFACTORY_API_KEY="your-api-key"
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Upload a document

curl -X POST http://localhost:8000/upload \
  -F "file=@contract.pdf"

Response:

{
  "filename": "contract.pdf",
  "chunks_indexed": 47,
  "status": "indexed"
}

Ask a question

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the payment terms in the contract?"}'

Response:

{
  "answer": "According to the contract (Source 1, page 4), payment terms are Net 30 from the date of invoice. Late payments accrue interest at 1.5% per month as specified in Section 5.2.",
  "sources": [
    {
      "filename": "contract.pdf",
      "page": 4,
      "score": 0.9234,
      "excerpt": "Payment Terms. The Client shall pay all invoices within thirty (30) days..."
    }
  ],
  "model": "gpt-4-0125-preview",
  "usage": {
    "prompt_tokens": 847,
    "completion_tokens": 89
  }
}

GDPR Compliance Checklist

This architecture satisfies GDPR requirements at each layer:

Component	Data Handling	GDPR Compliance
PDF Upload	Files processed in memory, text extracted locally	No external data transfer
Embeddings	Generated via Juice Factory EU API	EU data residency, no retention
Vector Store	Self-hosted Qdrant, EU infrastructure	Full control over data location
LLM Inference	Juice Factory EU API, stateless processing	No query storage, no training use
API Server	Your infrastructure, your logging policy	Application-level control

Key guarantees:

User queries never leave the EU
No data is used for model training
Qdrant stores only embeddings (not raw queries)
LLM inference is stateless — queries are not retained
You control all logging and data retention policies

Production Considerations

Scaling Qdrant

For production deployments with large document collections:

# Run Qdrant with persistent storage and resource limits
docker run -d \
  --name qdrant \
  -p 6333:6333 \
  --memory=4g \
  -v /data/qdrant:/qdrant/storage \
  qdrant/qdrant:latest

For collections exceeding 10M vectors, consider Qdrant's distributed mode with sharding across multiple EU-hosted nodes.

Chunking Strategy

The simple word-count chunking in this guide works for most documents. For better results with structured documents:

Semantic chunking: Split on paragraph or section boundaries
Sliding window: Use overlapping chunks to avoid splitting context
Metadata enrichment: Include section headers, document titles, and dates in chunk metadata

Error Handling

Add retry logic for API calls and handle Qdrant connection failures:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, max=10))
def generate_embeddings_with_retry(texts, client):
    return generate_embeddings(texts, client)

Authentication

Add API key authentication to your FastAPI endpoints for production:

from fastapi import Depends, Security
from fastapi.security import APIKeyHeader

api_key_header = APIKeyHeader(name="X-API-Key")

async def verify_api_key(api_key: str = Security(api_key_header)):
    if api_key != os.environ.get("APP_API_KEY"):
        raise HTTPException(status_code=403, detail="Invalid API key")
    return api_key

@app.post("/query", dependencies=[Depends(verify_api_key)])
async def query_documents(request: QueryRequest):
    ...

Summary

This guide demonstrates a complete RAG pipeline that maintains GDPR compliance throughout:

Document ingestion: PyMuPDF extracts text locally, no cloud dependencies
Embeddings: Generated through Juice Factory's EU API with no data retention
Vector storage: Self-hosted Qdrant keeps all indexed data under your control
LLM inference: EU-hosted, stateless processing with no query storage
API layer: FastAPI gives you full control over access, logging, and data handling

The entire system can be deployed on EU infrastructure with no data leaving the region. Switching from a non-compliant setup is straightforward — replace the API base URL, point embeddings at the EU endpoint, and self-host your vector store.

Related Guides

GDPR-Safe AI Inference — Architecture guide for compliant AI applications with RAG
Replacing OpenAI with EU Infrastructure — Migration guide for switching API providers
n8n + Private AI Automation — Workflow automation with EU-hosted inference
Cursor AI BYOK Setup — Use Juice Factory as your BYOK provider in Cursor

RAG in Python: Build a GDPR-Safe Document Search API with EU-Hosted Inference

Prerequisites

Architecture Overview

Step 1: Project Setup

Step 2: Configuration

Step 3: Start Qdrant with Docker

Step 4: Document Ingestion with PyMuPDF

Step 5: Search and RAG Query

Step 6: FastAPI Application

Step 7: Run and Test

Upload a document

Ask a question

GDPR Compliance Checklist

Production Considerations

Scaling Qdrant

Chunking Strategy

Error Handling

Authentication

Summary

Related Guides

Related Guides

GDPR-Safe AI Inference

GDPR-Compliant AI Infrastructure

RAG with Qwen: Private Document Search

Ship GDPR-Compliant AI Today