Build AI Apps That Don't Store Your Users' Data: Stateless LLM APIs and GDPR

Most LLM APIs are stateful by default. When you send a request to OpenAI, the prompt and response are retained — typically for 30 days — for abuse monitoring, analytics, and operational logging. That retained data creates GDPR obligations: retention schedules, erasure mechanisms, Article 28 agreements, and audit exposure.

Stateless inference flips this. The API processes your request in memory, generates a response, and immediately discards everything. No logs containing your content, no training data, nothing to erase.

This guide explains how stateless LLM architecture works, when it matters for GDPR compliance, and how to build a production-ready chat application using it. All examples use JuiceFactory's OpenAI-compatible API.

Get your free API key: juicefactory.ai/api-key — no credit card required.

What stateless inference actually means

Statelessness in an LLM API means the provider maintains no conversation state between requests. Every request is independent. The API receives the messages you send, processes them, returns a response, and frees the memory. Nothing persists.

This is different from how you might build a chat application, where you store conversation history in your database and send it with each request. The statefulness is in your application layer — not in the inference service. The API never knows about previous conversations unless you include them in the current request.

The key distinction from a compliance standpoint:

# Stateful pattern — provider maintains history
# Your prompt goes to their server, gets logged, stored, retained
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "My patient ID is 12345..."}]
)
# That patient ID now sits in OpenAI's logs for 30 days

# Stateless pattern — provider discards everything
client = OpenAI(
    api_key="jf-...",
    base_url="https://api.juicefactory.ai/v1"
)
response = client.chat.completions.create(
    model="qwen3-vl",
    messages=[{"role": "user", "content": "My patient ID is 12345..."}]
)
# Request processed in memory, response returned, data discarded

The code looks nearly identical. The compliance implications are completely different.

Why this matters for GDPR

GDPR imposes specific requirements on how personal data is handled. For LLM applications, the most relevant are:

Article 5(1)(c) — Data minimization: Process only what's necessary. Retaining prompts for 30 days when you only needed the response for 200ms is hard to justify.

Article 5(1)(e) — Storage limitation: Data shouldn't be kept longer than necessary for its purpose. With stateful APIs, you're relying on the provider's retention schedule, not yours.

Article 17 — Right to erasure: Users can request deletion of their personal data. With stateless inference, there's nothing to delete — the data was never stored. With stateful providers, you need a mechanism to coordinate erasure requests across their infrastructure.

Article 28 — Processor agreements: Any service processing personal data on your behalf needs a DPA. Stateless providers have simpler DPAs because the scope of processing is limited to the immediate request.

The compliance value isn't just theoretical. In a healthcare application handling patient queries, or a legal tool processing case notes, you want to be able to tell your DPO: "The inference provider processes data transiently and retains nothing." That's a clean story. "They keep our prompts for 30 days, here's our DPA" is a harder conversation.

Architecture: where state lives

flowchart LR
    subgraph Your Infrastructure
        A[User] --> B[Your App]
        B --> C[(EU Database\nconversation history\nyour retention policy)]
    end
    subgraph JuiceFactory
        B -->|full context per request| D[Stateless API\nRAM only]
        D -->|response| B
        D --> E[Memory freed\nnothing stored]
    end

You own the data. You set the retention. The inference provider is a pure computation layer.

Building a stateless chat application

The slightly counterintuitive part: stateless inference doesn't mean you can't have multi-turn conversations. It means you manage the conversation state, not the provider. You store history in your database, send it with each request, and have full control over what gets retained and for how long.

Here's how to set it up:

Basic setup

from openai import OpenAI

client = OpenAI(
    api_key="your-juicefactory-api-key",
    base_url="https://api.juicefactory.ai/v1"
)

That's the only configuration change from a standard OpenAI setup. Everything else — request format, response parsing, streaming — is identical.

Conversation state management

from openai import OpenAI
from typing import List

client = OpenAI(
    api_key="your-juicefactory-api-key",
    base_url="https://api.juicefactory.ai/v1"
)

def chat(history: List[dict], message: str) -> tuple[str, List[dict]]:
    """
    Send a message with conversation history to the stateless API.
    Returns the response and the updated history.

    History lives in your application — the API never sees it
    except as part of the current request.
    """
    history.append({"role": "user", "content": message})

    response = client.chat.completions.create(
        model="qwen3-vl",
        messages=history,
        max_tokens=500,
        temperature=0.7
    )

    assistant_message = response.choices[0].message.content
    history.append({"role": "assistant", "content": assistant_message})

    return assistant_message, history

# Example usage
conversation = []
reply, conversation = chat(conversation, "What is the GDPR right to erasure?")
print(reply)
reply, conversation = chat(conversation, "Does it apply to AI systems?")
print(reply)

The API processes each request fresh. You control what history you include — and how long you store it in your own database.

Production server with FastAPI

A complete stateless chat server with GDPR-compliant data handling:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Dict
from openai import OpenAI
import uvicorn

client = OpenAI(
    api_key="your-juicefactory-api-key",
    base_url="https://api.juicefactory.ai/v1"
)

app = FastAPI()

class Message(BaseModel):
    user_id: str
    content: str

class Reply(BaseModel):
    content: str
    message_count: int

# ⚠️ DEMO ONLY — in-memory storage. In production, use an EU-hosted
# database (PostgreSQL, etc.) with encryption at rest and defined
# retention periods.
conversations: Dict[str, List[dict]] = {}

@app.post("/chat", response_model=Reply)
async def chat(message: Message):
    history = conversations.get(message.user_id, [])
    history.append({"role": "user", "content": message.content})

    response = client.chat.completions.create(
        model="qwen3-vl",
        messages=history,
        max_tokens=500,
        temperature=0.7
    )

    reply = response.choices[0].message.content
    history.append({"role": "assistant", "content": reply})
    conversations[message.user_id] = history

    return Reply(content=reply, message_count=len(history))

@app.delete("/conversations/{user_id}")
async def delete_conversation(user_id: str):
    """
    GDPR Article 17 — Right to Erasure.
    User requests deletion: remove their conversation history.
    Because the inference provider stores nothing, this is the only
    place their data exists.
    """
    if user_id in conversations:
        del conversations[user_id]
        return {"status": "deleted"}
    raise HTTPException(status_code=404, detail="No conversation found")

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

The Right to Erasure endpoint is trivial to implement — because you control the only copy of the data. With a stateful provider, you'd also need to coordinate with their infrastructure.

What zero data retention does and doesn't cover

"Zero data retention" from the inference provider means they don't store your prompts or responses. It doesn't mean your application has no retention — that's your responsibility.

What JuiceFactory doesn't retain:

Prompt content
Response content
Any derived data from your requests
Training data

What operational logs do contain (metadata only, not content):

Timestamps
Request IDs
Token counts
Latency measurements
Error status codes

These metadata fields don't contain personal data. They're what any infrastructure provider logs for billing and reliability purposes.

Your application layer is where GDPR obligations live:

Store conversation history in an EU-hosted database
Implement configurable retention periods
Build the erasure endpoints (like the example above)
Log what you need for your purposes, under your policies

OpenAI vs stateless EU providers: what's different

OpenAI's default API behavior retains request data for 30 days. Enterprise agreements can configure zero retention, but it requires a minimum commitment and explicit negotiation.

Even with zero retention configured on OpenAI's side, the underlying infrastructure is US-based, creating CLOUD Act exposure — US authorities can compel disclosure of data held by US companies regardless of where the servers are.

JuiceFactory is a Swedish company, EU-incorporated, with no US parent. The data never leaves EU jurisdiction and there's no CLOUD Act angle.

For most development projects with non-sensitive data, this distinction doesn't change the day-to-day. For applications handling regulated data — patient information, financial records, legal documents — it's often the deciding factor.

FAQ

Does stateless mean I can't have multi-turn conversations? No. You manage conversation history in your application and include it in each request. The API processes the full history you send but retains nothing after the response.

How do I verify the provider actually doesn't retain data? Review their Data Processing Agreement (available in JuiceFactory's portal under Settings → Legal). It specifies contractual prohibitions on data retention as a GDPR Article 28 obligation. You can also request a technical audit.

Does this replace my need for a GDPR compliance review? No. Stateless inference handles the processor side of the obligation. You still need lawful basis for processing, user transparency, data subject rights infrastructure, and documentation as the controller.

What about embeddings — are those also stateless? Yes. JuiceFactory's embedding endpoint (Qwen3-Embed) has the same zero-retention architecture. Documents you embed are processed transiently.

Is stateless inference slower? No. There's no meaningful performance difference. Stateless processing doesn't require additional computation — if anything, it's slightly faster because there's no state read/write overhead.

Ready to build? Get your free JuiceFactory API key.

Build AI Apps That Don't Store Your Users' Data: Stateless LLM APIs and GDPR

What stateless inference actually means

Why this matters for GDPR

Architecture: where state lives

Building a stateless chat application

Basic setup

Conversation state management

Production server with FastAPI

What zero data retention does and doesn't cover

OpenAI vs stateless EU providers: what's different

FAQ

Related Guides

GDPR-Safe AI Inference

GDPR-Compliant AI Infrastructure

RAG in Python: GDPR-Safe Document Search

Ship GDPR-Compliant AI Today