Skip to content

Deployment and Infrastructure

Merlin AI uses Excalibur's own AI models. No data is sent to external LLMs such as ChatGPT, Claude, or other third-party AI services — regardless of the deployment model.


Deployment Options

Merlin AI supports three deployment models:

  • SaaS — Excalibur runs Merlin AI in the Excalibur-managed cluster. You do not need to provision or maintain GPU hardware — Merlin is included as part of the SaaS platform
  • On-premise — deploy the AI service on your own hardware within your data center
  • Private cloud — deploy within your own cloud subscription (Azure, AWS, or other providers)

For on-premise and private cloud deployments, all inference happens locally on your GPU hardware and does not require internet connectivity.

SaaS deployment

With Excalibur SaaS, Merlin AI runs in the Excalibur-managed cluster. You get the full capabilities of Merlin Detect and Merlin Investigate without provisioning any GPU infrastructure.

Screenshot needed

A deployment topology diagram showing where Merlin AI sits relative to other Excalibur platform components will be added here.


Hardware Requirements

SaaS customers

If you use Excalibur as a SaaS service, skip this section. Excalibur manages all GPU infrastructure for you.

Merlin AI requires a GPU with sufficient VRAM to run the inference models. The following configurations represent tested reference architectures.

For organizations with many concurrent users requiring real-time anomaly detection:

Resource Specification
GPU 1x NVIDIA A100 (80 GB VRAM)
vCPUs 24
System memory 220 GiB
VRAM 80 GB

This configuration supports a high number of concurrent Merlin Detect sessions and parallel Merlin Investigate conversations.

Basic Configuration — Standard Deployments

For smaller environments or initial deployments:

Resource Specification
GPU 1x NVIDIA L40S (48 GB VRAM)
vCPUs 8
System memory 64 GiB
CPU x86_64 architecture (e.g., AMD EPYC)
VRAM 48 GB

Minimum VRAM

Merlin AI requires a minimum of 40 GB VRAM. GPUs with less VRAM cannot run the inference models.

Cloud Instance Examples

The following cloud instance types match the configurations above:

Large deployment: NC24ads v4 series — 24 vCPUs, 220 GiB RAM, 1x A100 80 GB

Standard deployment: g6e.2xlarge — 8 vCPUs, 64 GiB RAM, 1x L40S 48 GB


Sizing Guidance

Infrastructure requirements scale based on two factors:

  • Concurrent Merlin Detect sessions — each active session with real-time detection consumes GPU resources for contextual bubble evaluation
  • Concurrent Merlin Investigate conversations — each active investigation session consumes GPU resources for query processing and response generation
Workload Profile GPU VRAM Recommended For
Standard 40–48 GB Small to mid-size deployments with moderate concurrent sessions
Large 80 GB Enterprise deployments with many concurrent detection sessions and parallel Merlin Investigate usage

Right-sizing your deployment

Start with the basic configuration and monitor GPU utilization during typical workloads. Scale to the recommended configuration if you observe sustained high GPU utilization or increased inference latency.


Data Residency

For on-premise and private cloud deployments, all data remains within your infrastructure:

  • Session recordings stay on your storage infrastructure
  • Contextual bubbles are generated and stored locally
  • AI inference runs on your GPU — no API calls to external AI services
  • Model weights are deployed to your infrastructure during installation

No telemetry, training data, or inference results are transmitted outside your environment.

For SaaS deployments, data is processed within the Excalibur-managed cluster. Excalibur uses its own AI models — no data is sent to external LLMs or third-party AI services.


Next Steps