Skip to main content

LLM Monitoring & Proxy

Proxy Architecture

Integration Examples

Python Integration

Basic Setup:

import openai
import os

# Configure Noah proxy
NOAH_PROJECT_ID = "your-project-id"
NOAH_API_KEY = os.getenv("NOAH_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Set up client
openai.api_base = f"https://app.hollanoah.com/api/llm/proxy/{NOAH_PROJECT_ID}/chat"
openai.api_key = NOAH_API_KEY

# Make request
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is the capital of France?"}
],
headers={
"x-openai-api-key": OPENAI_API_KEY,
"x-cortif-pii-redact": "true" # Enable PII redaction
}
)

print(response.choices[0].message.content)

With Error Handling:

import openai
import time
from typing import Optional

class NoahLLMClient:
def __init__(self, project_id: str, noah_api_key: str, provider_api_key: str):
self.project_id = project_id
self.noah_api_key = noah_api_key
self.provider_api_key = provider_api_key

openai.api_base = f"https://app.hollanoah.com/api/llm/proxy/{project_id}/chat"
openai.api_key = noah_api_key

def chat_completion(
self,
messages: list,
model: str = "gpt-4",
temperature: float = 0.7,
max_retries: int = 3,
pii_redaction: bool = True
) -> Optional[str]:
"""
Make a chat completion request through Noah proxy with retry logic
"""
headers = {
"x-openai-api-key": self.provider_api_key,
"x-cortif-pii-redact": str(pii_redaction).lower()
}

for attempt in range(max_retries):
try:
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=temperature,
headers=headers
)
return response.choices[0].message.content

except openai.error.RateLimitError:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
print(f"Rate limit hit. Waiting {wait_time}s...")
time.sleep(wait_time)
else:
raise

except openai.error.APIError as e:
print(f"API error: {e}")
if attempt < max_retries - 1:
time.sleep(1)
else:
raise

except Exception as e:
print(f"Unexpected error: {e}")
raise

return None

# Usage
client = NoahLLMClient(
project_id="your-project-id",
noah_api_key=os.getenv("NOAH_API_KEY"),
provider_api_key=os.getenv("OPENAI_API_KEY")
)

response = client.chat_completion(
messages=[
{"role": "user", "content": "Hello, world!"}
],
pii_redaction=True
)

print(response)

Node.js Integration

TypeScript Implementation:

import OpenAI from 'openai';

interface NoahConfig {
projectId: string;
noahApiKey: string;
providerApiKey: string;
piiRedaction?: boolean;
}

class NoahLLMClient {
private openai: OpenAI;
private providerApiKey: string;
private piiRedaction: boolean;

constructor(config: NoahConfig) {
this.providerApiKey = config.providerApiKey;
this.piiRedaction = config.piiRedaction ?? true;

this.openai = new OpenAI({
apiKey: config.noahApiKey,
baseURL: `https://app.hollanoah.com/api/llm/proxy/${config.projectId}/chat`,
});
}

async chatCompletion(
messages: Array<{role: string; content: string}>,
model: string = 'gpt-4'
): Promise<string> {
try {
const response = await this.openai.chat.completions.create({
model,
messages,
// @ts-ignore - Custom headers
headers: {
'x-openai-api-key': this.providerApiKey,
'x-cortif-pii-redact': this.piiRedaction.toString(),
},
});

return response.choices[0]?.message?.content || '';
} catch (error) {
console.error('Noah LLM Error:', error);
throw error;
}
}

async streamCompletion(
messages: Array<{role: string; content: string}>,
onChunk: (chunk: string) => void,
model: string = 'gpt-4'
): Promise<void> {
const stream = await this.openai.chat.completions.create({
model,
messages,
stream: true,
// @ts-ignore
headers: {
'x-openai-api-key': this.providerApiKey,
'x-cortif-pii-redact': this.piiRedaction.toString(),
},
});

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
onChunk(content);
}
}
}
}

// Usage
const client = new NoahLLMClient({
projectId: process.env.NOAH_PROJECT_ID!,
noahApiKey: process.env.NOAH_API_KEY!,
providerApiKey: process.env.OPENAI_API_KEY!,
piiRedaction: true,
});

// Regular completion
const response = await client.chatCompletion([
{ role: 'user', content: 'Hello, world!' }
]);
console.log(response);

// Streaming completion
await client.streamCompletion(
[{ role: 'user', content: 'Tell me a story' }],
(chunk) => process.stdout.write(chunk)
);

cURL Examples

Basic Request:

curl https://app.hollanoah.com/api/llm/proxy/{PROJECT_ID}/chat/completions \
-H "Authorization: Bearer ${NOAH_API_KEY}" \
-H "x-openai-api-key: ${OPENAI_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'

With PII Redaction:

curl https://app.hollanoah.com/api/llm/proxy/{PROJECT_ID}/chat/completions \
-H "Authorization: Bearer ${NOAH_API_KEY}" \
-H "x-openai-api-key: ${OPENAI_API_KEY}" \
-H "x-cortif-pii-redact: true" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "My email is john@example.com and phone is 555-1234"
}
]
}'

Azure OpenAI:

curl https://app.hollanoah.com/api/llm/proxy/{PROJECT_ID}/chat/completions \
-H "Authorization: Bearer ${NOAH_API_KEY}" \
-H "x-azure-api-key: ${AZURE_API_KEY}" \
-H "x-azure-endpoint: ${AZURE_ENDPOINT}" \
-H "x-azure-deployment: ${AZURE_DEPLOYMENT}" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'

Monitoring Request Flow

Projects & Monitoring

Project Dashboard Overview

The Noah platform organizes your AI monitoring around projects, which represent individual AI systems or models that you want to monitor. The project dashboard provides a comprehensive view of all your monitored systems, their current status, and key performance indicators at a glance. Each project card displays critical information including the current operational status, the most recent test run or monitoring session, any active alerts that require attention, and the current drift score if drift detection is enabled. This high-level view enables teams to quickly identify which systems need attention and where to focus their efforts.

Navigation Path: Dashboard → Models (or Endpoints)

The dashboard presents your projects in an organized card-based layout:

  • LLM Projects: Models accessed through the Noah proxy (OpenAI, Azure OpenAI, Google Gemini)
  • Endpoint Projects: Custom ML endpoints for traditional models
  • Status Indicators: Active, Paused, Testing, Draft, Archived
  • Quick Actions: View Details, Run Test, Edit Configuration, Pause/Resume

Project Lifecycle Management

Projects follow a well-defined lifecycle ensuring proper setup, validation, and ongoing monitoring:

Projects start in Draft status, move through Configuring (set parameters and thresholds), Testing (upload golden datasets, establish baselines), then Active monitoring. Projects can be paused temporarily or archived for long-term storage while maintaining historical data.

Detailed Project Workflow

Phase 1: Project Creation

Click "New Model" or "New Endpoint" to create a project. Provide name, description, configure provider settings (OpenAI, Azure OpenAI, Google Gemini), and select model version. Upload golden datasets for drift detection baselines, set drift thresholds for alerts, and add robustness test prompts.

Phase 2: Baseline Establishment

Noah establishes baselines in two ways:

Manual baseline: Upload CSV with prompt-response pairs. Noah generates embeddings and calculates a centroid representing expected behavior. Auto baseline: Noah generates baseline from first 100 production requests, allowing immediate monitoring with later refinement.

Phase 3: Active Monitoring

Noah continuously monitors every request in real-time, displaying uptime, request volume, success rates, latency, costs, and drift events. View live request streams, export detailed reports, and configure metric-based alerts from the monitoring dashboard.

Phase 4: Analysis & Optimization

The analytics dashboard shows performance trends: success rates, latency changes, cost evolution, and drift stability. Interactive charts identify patterns and anomalies. Noah recommends optimizations like cost-effective model alternatives and latency reduction opportunities based on observed patterns.

Key Metrics

Noah collects and analyzes over 40 distinct metrics across multiple categories, providing comprehensive visibility into your AI systems:

Performance Metrics

task_success_rate represents the percentage of requests that complete successfully without errors. This is calculated as successful_requests divided by total_requests. A score of 1.0 (100%) indicates perfect performance with all requests succeeding. Scores between 0.95 and 0.99 are considered excellent with only minor issues. Scores from 0.90 to 0.95 are acceptable for production but warrant investigation. Anything below 0.90 indicates serious problems requiring immediate attention.

Common causes of low success rates include rate limiting from the LLM provider (429 errors), authentication issues with API keys (401 errors), server-side problems (500 errors), request timeouts, and malformed requests (400 errors). Each failure is logged with detailed context to help diagnose and resolve issues quickly.

task_e2e_latency measures the end-to-end response time in milliseconds from when Noah receives your request until the final response is returned. This includes time spent in PII detection, forwarding to the LLM provider, waiting for the model's response, and post-processing. Understanding latency patterns helps identify performance bottlenecks and optimize user experience. Teams can track latency percentiles (p50, p95, p99) to understand typical and worst-case performance.

token_usage tracks the total number of tokens consumed per request, split into prompt tokens (your input) and completion tokens (the model's output). Since LLM costs are directly tied to token consumption, monitoring this metric is essential for cost control and optimization. Teams can identify opportunities to reduce token usage through prompt optimization, response length limits, or model selection.

Cost Metrics

cost represents the total cost per request in USD, calculated based on the specific model's pricing and token usage. For example, with GPT-4, costs are calculated as (prompt_tokens × $0.03/1000) + (completion_tokens × $0.06/1000). Noah automatically tracks pricing for all supported models and updates calculations when providers change their rates. This metric is fundamental for understanding the financial impact of your AI operations and identifying opportunities for cost optimization.

cost_per_token provides a cost efficiency metric by dividing total cost by total tokens, helping you compare efficiency across different models and use cases. This metric is particularly useful when deciding whether to use more capable but expensive models versus faster, cheaper alternatives. For instance, if GPT-4 costs $0.00003 per token but GPT-3.5-turbo costs $0.000001 per token, you can evaluate whether the quality improvement justifies the 30x cost increase for specific use cases.

daily_spend aggregates all costs within a 24-hour period, providing clear visibility into your daily AI expenses. The platform also generates projections for monthly and annual spending based on current usage patterns, helping with budget planning and forecasting. Teams can set budget thresholds and receive alerts before spending limits are exceeded, preventing unexpected bills and enabling proactive cost management.

Quality Metrics

content_drift measures the semantic distance between current model responses and your established baseline, using a scale from 0 to 2. A score of 0.0 to 0.3 indicates excellent similarity to the baseline with responses very close to expected behavior. Scores from 0.3 to 0.7 show minor variations that are typically within acceptable ranges. Scores from 0.7 to 1.0 warrant attention as they indicate notable deviation from baseline behavior. Scores above 1.0 require immediate investigation as they represent significant divergence that could impact user experience or compliance.

The drift calculation uses sophisticated semantic embeddings that capture meaning rather than just matching text. Noah converts each response into a 1536-dimensional vector representation and calculates the cosine distance to the baseline centroid. This allows Noah to detect when your model's behavior changes in meaningful ways, even if the exact words differ. Common causes of drift include model version updates, changes to system prompts, shifts in input data distribution, or the golden dataset becoming outdated relative to current use cases.

language_readability_score uses the Flesch Reading Ease formula to assess how easy your model's responses are to understand, scored from 0 to 100. Scores of 90-100 indicate very easy text suitable for 5th-grade reading level. Scores of 60-69 represent standard text appropriate for general audiences (8th-9th grade level). Scores below 30 indicate very difficult text requiring college-level comprehension. Understanding readability helps ensure your AI systems communicate effectively with your target audience and maintain appropriate complexity for your use case.

sentiment_score analyzes the emotional tone of model responses on a scale from -1 (very negative) to +1 (very positive). This helps identify when models are generating inappropriately negative responses or when sentiment doesn't match the expected tone for your use case. For customer service applications, you might want consistently positive sentiment (0.3 to 0.8), while for analytical applications, neutral sentiment (−0.2 to 0.2) may be more appropriate.

Security Metrics

pii_input_total counts the number of personally identifiable information (PII) entities detected in user prompts. Noah automatically detects multiple PII types including:

  1. EMAIL_ADDRESS - Standard email formats (name@domain.com, user.name+tag@company.co.uk)
  2. PHONE_NUMBER - Various formats including US (555-123-4567) and international (+44 20 7123 4567)
  3. CREDIT_CARD - All major card networks (Visa, Mastercard, Amex, Discover) with proper Luhn validation
  4. US_SSN - Social Security Numbers in xxx-xx-xxxx format
  5. IP_ADDRESS - Both IPv4 (192.168.1.1) and IPv6 addresses
  6. LOCATION - Street addresses, ZIP codes, and geographic identifiers
  7. PERSON - Names with contextual validation
  8. DATE_TIME - Birth dates, appointment times with PII context
  9. US_DRIVER_LICENSE - State-specific formats
  10. US_PASSPORT - 9-digit passport numbers
  11. MEDICAL_RECORD_NUMBER - Healthcare identifiers

This metric helps teams understand how much sensitive data is being sent to AI models, enabling them to implement appropriate data handling policies and ensure compliance with privacy regulations like GDPR, HIPAA, and CCPA.

pii_output_total counts PII entities in model responses, which is a critical security metric that should typically be zero for most applications. If your model is outputting PII, it may be leaking sensitive information that was present in training data, previous context, or inadvertently generated. Any non-zero value requires immediate investigation to determine the source of the leak and implement remediation measures. This could indicate:

  • Data leakage from training data
  • Context retention issues
  • Inadequate output filtering
  • Prompt injection attacks bypassing safety measures

Teams should configure critical alerts for this metric and establish incident response procedures for PII leakage events.

robustness_score measures your model's resistance to adversarial prompts on a scale from 0 to 1, where higher scores indicate better security. Noah tests your model with various prompt injection attempts (trying to override system instructions), jailbreak techniques (attempting to bypass safety constraints), and other adversarial inputs designed to elicit inappropriate or unsafe behavior. A high robustness score indicates that your model appropriately refuses to comply with malicious instructions, which is essential for maintaining security and preventing misuse of your AI systems. Regular robustness testing helps ensure your AI systems remain secure even as attack techniques evolve.

Advanced Proxy Features

Request Modification

Custom System Prompts:

# Add organization-wide system prompt
headers = {
"x-openai-api-key": OPENAI_API_KEY,
"x-cortif-system-prompt": "Always be professional and concise"
}

response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}],
headers=headers
)

Response Filtering

Content Moderation:

headers = {
"x-openai-api-key": OPENAI_API_KEY,
"x-cortif-content-filter": "strict", # Options: strict, moderate, permissive
"x-cortif-profanity-filter": "true"
}

Cost Management

Budget Controls:

headers = {
"x-openai-api-key": OPENAI_API_KEY,
"x-cortif-max-cost": "0.05", # Reject requests exceeding $0.05
"x-cortif-user-id": "user123" # Track per-user costs
}