Deployment and Infrastructure¶

Merlin AI uses Excalibur's own AI models. No data is sent to external LLMs such as ChatGPT, Claude, or other third-party AI services — regardless of the deployment model.

Deployment Options¶

Merlin AI supports three deployment models:

SaaS — Excalibur runs Merlin AI in the Excalibur-managed cluster. You do not need to provision or maintain GPU hardware — Merlin is included as part of the SaaS platform
On-premise — deploy the AI service on your own hardware within your data center
Private cloud — deploy within your own cloud subscription (Azure, AWS, or other providers)

For on-premise and private cloud deployments, all inference happens locally on your GPU hardware and does not require internet connectivity.

SaaS deployment

With Excalibur SaaS, Merlin AI runs in the Excalibur-managed cluster. You get the full capabilities of Merlin Detect and Merlin Investigate without provisioning any GPU infrastructure.

Screenshot needed

A deployment topology diagram showing where Merlin AI sits relative to other Excalibur platform components will be added here.

Hardware Requirements¶

SaaS customers

If you use Excalibur as a SaaS service, skip this section. Excalibur manages all GPU infrastructure for you.

Merlin AI requires a GPU with sufficient VRAM to run the inference models. The following configurations represent tested reference architectures.

Recommended Configuration — Large Deployments¶

For organizations with many concurrent users requiring real-time anomaly detection:

Resource	Specification
GPU	1x NVIDIA A100 (80 GB VRAM)
vCPUs	24
System memory	220 GiB
VRAM	80 GB

This configuration supports a high number of concurrent Merlin Detect sessions and parallel Merlin Investigate conversations.

Basic Configuration — Standard Deployments¶

For smaller environments or initial deployments:

Resource	Specification
GPU	1x NVIDIA L40S (48 GB VRAM)
vCPUs	8
System memory	64 GiB
CPU	x86_64 architecture (e.g., AMD EPYC)
VRAM	48 GB

Minimum VRAM

Merlin AI requires a minimum of 40 GB VRAM. GPUs with less VRAM cannot run the inference models.

Cloud Instance Examples¶

The following cloud instance types match the configurations above:

AzureAWS

Large deployment: NC24ads v4 series — 24 vCPUs, 220 GiB RAM, 1x A100 80 GB

Standard deployment: g6e.2xlarge — 8 vCPUs, 64 GiB RAM, 1x L40S 48 GB

Sizing Guidance¶

Infrastructure requirements scale based on two factors:

Concurrent Merlin Detect sessions — each active session with real-time detection consumes GPU resources for contextual bubble evaluation
Concurrent Merlin Investigate conversations — each active investigation session consumes GPU resources for query processing and response generation

Workload Profile	GPU VRAM	Recommended For
Standard	40–48 GB	Small to mid-size deployments with moderate concurrent sessions
Large	80 GB	Enterprise deployments with many concurrent detection sessions and parallel Merlin Investigate usage

Right-sizing your deployment

Start with the basic configuration and monitor GPU utilization during typical workloads. Scale to the recommended configuration if you observe sustained high GPU utilization or increased inference latency.

Data Residency¶

For on-premise and private cloud deployments, all data remains within your infrastructure:

Session recordings stay on your storage infrastructure
Contextual bubbles are generated and stored locally
AI inference runs on your GPU — no API calls to external AI services
Model weights are deployed to your infrastructure during installation

No telemetry, training data, or inference results are transmitted outside your environment.

For SaaS deployments, data is processed within the Excalibur-managed cluster. Excalibur uses its own AI models — no data is sent to external LLMs or third-party AI services.

Next Steps¶

Merlin AI overview — capabilities and protocol coverage
Merlin Detect — understand what Merlin evaluates in real time
Merlin Investigate — how Merlin Investigate uses historical data