Deployment and Infrastructure¶
Merlin AI uses Excalibur's own AI models. No data is sent to external LLMs such as ChatGPT, Claude, or other third-party AI services — regardless of the deployment model.
Deployment Options¶
Merlin AI supports three deployment models:
- SaaS — Excalibur runs Merlin AI in the Excalibur-managed cluster. You do not need to provision or maintain GPU hardware — Merlin is included as part of the SaaS platform
- On-premise — deploy the AI service on your own hardware within your data center
- Private cloud — deploy within your own cloud subscription (Azure, AWS, or other providers)
For on-premise and private cloud deployments, all inference happens locally on your GPU hardware and does not require internet connectivity.
SaaS deployment
With Excalibur SaaS, Merlin AI runs in the Excalibur-managed cluster. You get the full capabilities of Merlin Detect and Merlin Investigate without provisioning any GPU infrastructure.
Screenshot needed
A deployment topology diagram showing where Merlin AI sits relative to other Excalibur platform components will be added here.
Hardware Requirements¶
SaaS customers
If you use Excalibur as a SaaS service, skip this section. Excalibur manages all GPU infrastructure for you.
Merlin AI requires a GPU with sufficient VRAM to run the inference models. The following configurations represent tested reference architectures.
Recommended Configuration — Large Deployments¶
For organizations with many concurrent users requiring real-time anomaly detection:
| Resource | Specification |
|---|---|
| GPU | 1x NVIDIA A100 (80 GB VRAM) |
| vCPUs | 24 |
| System memory | 220 GiB |
| VRAM | 80 GB |
This configuration supports a high number of concurrent Merlin Detect sessions and parallel Merlin Investigate conversations.
Basic Configuration — Standard Deployments¶
For smaller environments or initial deployments:
| Resource | Specification |
|---|---|
| GPU | 1x NVIDIA L40S (48 GB VRAM) |
| vCPUs | 8 |
| System memory | 64 GiB |
| CPU | x86_64 architecture (e.g., AMD EPYC) |
| VRAM | 48 GB |
Minimum VRAM
Merlin AI requires a minimum of 40 GB VRAM. GPUs with less VRAM cannot run the inference models.
Cloud Instance Examples¶
The following cloud instance types match the configurations above:
Large deployment: NC24ads v4 series — 24 vCPUs, 220 GiB RAM, 1x A100 80 GB
Standard deployment: g6e.2xlarge — 8 vCPUs, 64 GiB RAM, 1x L40S 48 GB
Sizing Guidance¶
Infrastructure requirements scale based on two factors:
- Concurrent Merlin Detect sessions — each active session with real-time detection consumes GPU resources for contextual bubble evaluation
- Concurrent Merlin Investigate conversations — each active investigation session consumes GPU resources for query processing and response generation
| Workload Profile | GPU VRAM | Recommended For |
|---|---|---|
| Standard | 40–48 GB | Small to mid-size deployments with moderate concurrent sessions |
| Large | 80 GB | Enterprise deployments with many concurrent detection sessions and parallel Merlin Investigate usage |
Right-sizing your deployment
Start with the basic configuration and monitor GPU utilization during typical workloads. Scale to the recommended configuration if you observe sustained high GPU utilization or increased inference latency.
Data Residency¶
For on-premise and private cloud deployments, all data remains within your infrastructure:
- Session recordings stay on your storage infrastructure
- Contextual bubbles are generated and stored locally
- AI inference runs on your GPU — no API calls to external AI services
- Model weights are deployed to your infrastructure during installation
No telemetry, training data, or inference results are transmitted outside your environment.
For SaaS deployments, data is processed within the Excalibur-managed cluster. Excalibur uses its own AI models — no data is sent to external LLMs or third-party AI services.
Next Steps¶
- Merlin AI overview — capabilities and protocol coverage
- Merlin Detect — understand what Merlin evaluates in real time
- Merlin Investigate — how Merlin Investigate uses historical data