Metrics & Analytics

Noah tracks over 40 metrics for each LLM request, providing comprehensive insights into performance, cost, quality, and security. These metrics enable data-driven decisions and proactive issue detection.

Metric Categories

Core Metrics

Task Success Rate

Measures the percentage of requests that complete successfully without errors. This metric is calculated as successful_requests / total_requests.

Score	Status	Interpretation
1.0 (100%)	Pass	Perfect - all requests successful
0.95-0.99	Warn	Excellent - minor issues
0.90-0.95	Warn	Good - acceptable for production
Below 0.90	Fail	Poor - requires investigation

Common failure causes include rate limiting (429 errors), API key issues (401 errors), server errors (500 errors), timeouts, and invalid requests (400 errors).

Content Drift

Semantic distance between current responses and baseline (golden dataset). The calculation process involves converting responses to embeddings (1536-dimension vectors), calculating cosine distance to baseline centroid, with a range from 0 (identical) to 2 (opposite).

Drift Score	Status	Meaning	Action Required
0.0 - 0.3	Excellent	Responses very similar to baseline	None
0.3 - 0.7	Good	Minor variations, within expected range	Monitor trends
0.7 - 1.0	Warning	Notable deviation from baseline	Review recent responses
1.0 - 1.5	Critical	Significant deviation	Immediate review required
Above 1.5	Severe	Complete divergence from baseline	Emergency investigation

Cost Metrics

Noah calculates the total cost per request in USD based on token usage and model pricing.

Pricing Example (OpenAI GPT-4):

Input: $0.03 per 1K tokens
Output: $0.06 per 1K tokens
Request with 500 input + 200 output tokens = $0.027

PII Detection

Tracks count of PII entities in user prompts (pii_input_total) and model responses (pii_output_total).

Detected PII Types:

Category	Examples	Risk Level
Email	name@domain.com	Medium
Phone	(555) 123-4567, +1-555-123-4567	Medium
Credit Card	4xxx xxxx xxxx xxxx	Critical
SSN	xxx-xx-xxxx	Critical
Address	123 Main St, ZIP codes	Medium
IP Address	192.168.1.1	Low
Medical	Patient IDs, record numbers	Critical

Language Quality

Readability Score (Flesch Reading Ease, 0-100):

Score Range	Grade Level	Description	Use Case
90-100	5th grade	Very Easy	Simple instructions
80-89	6th grade	Easy	General audience
70-79	7th grade	Fairly Easy	Blog posts
60-69	8-9th grade	Standard	News articles
50-59	High School	Fairly Difficult	Technical docs
30-49	College	Difficult	Academic papers
0-29	Graduate	Very Difficult	Legal documents

Analytics Dashboard

The analytics interface provides visualization and analysis tools for understanding metric trends over time.

Cost Analysis

Track spending patterns and identify optimization opportunities.

Cost Breakdown:

By Model (GPT-4 vs GPT-3.5)
By Time (peak vs off-peak hours)
By User/Department
By Request Type

Optimization Recommendations:

Switch to cheaper models for simple queries
Reduce prompt tokens through optimization
Cache frequently requested responses
Implement request batching

Metric Correlation

Exporting Data

Export metrics in multiple formats for external analysis, reporting, or archival purposes.

Export Formats:

CSV: Raw data for spreadsheet analysis
JSON: Structured data for programmatic access
PDF: Formatted reports for stakeholders
PowerPoint: Executive presentations

Metric Categories​

Core Metrics​

Task Success Rate​

Content Drift​

Cost Metrics​

PII Detection​

Language Quality​

Analytics Dashboard​

Cost Analysis​

Metric Correlation​

Exporting Data​