Metrics & Analytics
Noah tracks over 40 metrics for each LLM request, providing comprehensive insights into performance, cost, quality, and security. These metrics enable data-driven decisions and proactive issue detection.
Metric Categories
Core Metrics
Task Success Rate
Measures the percentage of requests that complete successfully without errors. This metric is calculated as successful_requests / total_requests.
| Score | Status | Interpretation |
|---|---|---|
| 1.0 (100%) | Pass | Perfect - all requests successful |
| 0.95-0.99 | Warn | Excellent - minor issues |
| 0.90-0.95 | Warn | Good - acceptable for production |
| Below 0.90 | Fail | Poor - requires investigation |
Common failure causes include rate limiting (429 errors), API key issues (401 errors), server errors (500 errors), timeouts, and invalid requests (400 errors).
Content Drift
Semantic distance between current responses and baseline (golden dataset). The calculation process involves converting responses to embeddings (1536-dimension vectors), calculating cosine distance to baseline centroid, with a range from 0 (identical) to 2 (opposite).
| Drift Score | Status | Meaning | Action Required |
|---|---|---|---|
| 0.0 - 0.3 | Excellent | Responses very similar to baseline | None |
| 0.3 - 0.7 | Good | Minor variations, within expected range | Monitor trends |
| 0.7 - 1.0 | Warning | Notable deviation from baseline | Review recent responses |
| 1.0 - 1.5 | Critical | Significant deviation | Immediate review required |
| Above 1.5 | Severe | Complete divergence from baseline | Emergency investigation |
Cost Metrics
Noah calculates the total cost per request in USD based on token usage and model pricing.
Pricing Example (OpenAI GPT-4):
- Input: $0.03 per 1K tokens
- Output: $0.06 per 1K tokens
- Request with 500 input + 200 output tokens = $0.027
PII Detection
Tracks count of PII entities in user prompts (pii_input_total) and model responses (pii_output_total).
Detected PII Types:
| Category | Examples | Risk Level |
|---|---|---|
| name@domain.com | Medium | |
| Phone | (555) 123-4567, +1-555-123-4567 | Medium |
| Credit Card | 4xxx xxxx xxxx xxxx | Critical |
| SSN | xxx-xx-xxxx | Critical |
| Address | 123 Main St, ZIP codes | Medium |
| IP Address | 192.168.1.1 | Low |
| Medical | Patient IDs, record numbers | Critical |
Language Quality
Readability Score (Flesch Reading Ease, 0-100):
| Score Range | Grade Level | Description | Use Case |
|---|---|---|---|
| 90-100 | 5th grade | Very Easy | Simple instructions |
| 80-89 | 6th grade | Easy | General audience |
| 70-79 | 7th grade | Fairly Easy | Blog posts |
| 60-69 | 8-9th grade | Standard | News articles |
| 50-59 | High School | Fairly Difficult | Technical docs |
| 30-49 | College | Difficult | Academic papers |
| 0-29 | Graduate | Very Difficult | Legal documents |
Analytics Dashboard
The analytics interface provides visualization and analysis tools for understanding metric trends over time.
Cost Analysis
Track spending patterns and identify optimization opportunities.
Cost Breakdown:
- By Model (GPT-4 vs GPT-3.5)
- By Time (peak vs off-peak hours)
- By User/Department
- By Request Type
Optimization Recommendations:
- Switch to cheaper models for simple queries
- Reduce prompt tokens through optimization
- Cache frequently requested responses
- Implement request batching
Metric Correlation
Exporting Data
Export metrics in multiple formats for external analysis, reporting, or archival purposes.
Export Formats:
- CSV: Raw data for spreadsheet analysis
- JSON: Structured data for programmatic access
- PDF: Formatted reports for stakeholders
- PowerPoint: Executive presentations