Engineered for Scale

The Cortex Engine Architecture

A serverless, event-driven data pipeline designed to ingest, normalize, and analyze billions of cloud billing records in real-time.

Ingestion

Streaming CUR files & Cost APIs from AWS, Azure, GCP.

Normalization

Unified Schema mapping. Standardize resource tags & metadata.

Cortex AI

Anomaly detection models, Forecasting, RI/SP Optimization logic.

Action

Dashboards, Slack Alerts, Jira Tickets, Auto-remediation hooks.

How We Normalize Multi-Cloud Data

Each cloud provider has its own taxonomy. AWS has "Instances", Azure has "Virtual Machines". AWS bills by the second, GCP by the minute. The Cortex Engine creates a Unified Resource Model (URM) that abstracts these differences.

// Raw AWS Input
{ "lineItem/UsageType": "USW2-BoxUsage:t2.micro", "product/servicecode": "AmazonEC2" }
// Cortex Normalized Output
{
  "provider": "AWS",
  "category": "Compute",
  "resource_type": "Virtual Machine",
  "spec": "t2.micro",
  "region": "us-west-2"
}

Security & Privacy Architecture

We operate on a Zero-Trust principle. CloudNexus does not require root access or write permissions to your production infrastructure for visibility features.

  • Read-Only IAM Roles We use cross-account IAM roles with explicit ViewOnlyAccess policies.
  • Ephemeral Processing Sensitive payload data is processed in memory and never persisted to disk unencrypted.
  • Private Link Support Enterprise plans support AWS PrivateLink and Azure ExpressRoute connections.

Machine Learning Pipeline

Our specialized models are trained on over $5B of cloud spend data.

Forecasting (ARIMA/Prophet)

We use an ensemble of ARIMA and Facebook Prophet models to predict future spend based on seasonality and growth trends, achieving 96% accuracy on 30-day forecasts.

Anomaly Detection (RCF)

Our Random Cut Forest (RCF) implementation detects statistical outliers in real-time, filtering out expected spikes (like scheduled jobs) to reduce alert fatigue.

Contextual RCA (LLMs)

We leverage Llama-3 class LLMs to analyze CloudTrail and Audit logs alongside billing data, providing human-readable Root Cause Analysis for every cost spike.

System Status All Systems Operational
API Latency (p95) 42ms
Ingestion Throughput 1.2GB/sec
Uptime (L30D) 99.994%

Engineered for Reliability

Financial data demands absolute accuracy and availability. Our infrastructure is built on a multi-region Kubernetes architecture with automated failover and self-healing capabilities.

  • Multi-AZ Redundancy Data is replicated synchronously across 3 Availability Zones to withstand data center failures.
  • Polyglot Persistence We use TimescaleDB for time-series metrics, Postgres for transactional data, and Redis for real-time caching.
  • Chaos Engineering We randomly terminate instances in our staging environment daily to ensure our recovery scripts are battle-tested.

API-First Design

Build your own custom reports or trigger internal workflows.

Webhooks & Events

Subscribe to cost events like budget_exceeded or anomaly_detected to trigger functions in your own infrastructure.

POST /webhooks/receive HTTP/1.1
Content-Type: application/json

{
  "event": "cost_spike",
  "data": {
    "service": "AmazonRDS",
    "amount": 450.20,
    "threshold": 100.00,
    "timestamp": "2024-03-15T10:30:00Z"
  }
}

GraphQL API

Query your billing data with flexibility. Retrieve exactly the fields you need, nested by account, region, or custom tags.

query {
  account(id: "prod-aws") {
    spend(range: "last_7_days") {
      total
      byService {
        name
        cost
      }
    }
  }
}

Built for Scale

Whether you have 10 instances or 100,000, Cortex handles the load.