Skip to main content

Metrics & Analytics

Noah tracks over 40 metrics for each LLM request, providing comprehensive insights into performance, cost, quality, and security. These metrics enable data-driven decisions and proactive issue detection.

Metric Categories

Core Metrics

Task Success Rate

Measures the percentage of requests that complete successfully without errors. This metric is calculated as successful_requests / total_requests.

ScoreStatusInterpretation
1.0 (100%)PassPerfect - all requests successful
0.95-0.99WarnExcellent - minor issues
0.90-0.95WarnGood - acceptable for production
Below 0.90FailPoor - requires investigation

Common failure causes include rate limiting (429 errors), API key issues (401 errors), server errors (500 errors), timeouts, and invalid requests (400 errors).

Content Drift

Semantic distance between current responses and baseline (golden dataset). The calculation process involves converting responses to embeddings (1536-dimension vectors), calculating cosine distance to baseline centroid, with a range from 0 (identical) to 2 (opposite).

Drift ScoreStatusMeaningAction Required
0.0 - 0.3ExcellentResponses very similar to baselineNone
0.3 - 0.7GoodMinor variations, within expected rangeMonitor trends
0.7 - 1.0WarningNotable deviation from baselineReview recent responses
1.0 - 1.5CriticalSignificant deviationImmediate review required
Above 1.5SevereComplete divergence from baselineEmergency investigation

Cost Metrics

Noah calculates the total cost per request in USD based on token usage and model pricing.

Pricing Example (OpenAI GPT-4):

  • Input: $0.03 per 1K tokens
  • Output: $0.06 per 1K tokens
  • Request with 500 input + 200 output tokens = $0.027

PII Detection

Tracks count of PII entities in user prompts (pii_input_total) and model responses (pii_output_total).

Detected PII Types:

CategoryExamplesRisk Level
Emailname@domain.comMedium
Phone(555) 123-4567, +1-555-123-4567Medium
Credit Card4xxx xxxx xxxx xxxxCritical
SSNxxx-xx-xxxxCritical
Address123 Main St, ZIP codesMedium
IP Address192.168.1.1Low
MedicalPatient IDs, record numbersCritical

Language Quality

Readability Score (Flesch Reading Ease, 0-100):

Score RangeGrade LevelDescriptionUse Case
90-1005th gradeVery EasySimple instructions
80-896th gradeEasyGeneral audience
70-797th gradeFairly EasyBlog posts
60-698-9th gradeStandardNews articles
50-59High SchoolFairly DifficultTechnical docs
30-49CollegeDifficultAcademic papers
0-29GraduateVery DifficultLegal documents

Analytics Dashboard

The analytics interface provides visualization and analysis tools for understanding metric trends over time.

Cost Analysis

Track spending patterns and identify optimization opportunities.

Cost Breakdown:

  • By Model (GPT-4 vs GPT-3.5)
  • By Time (peak vs off-peak hours)
  • By User/Department
  • By Request Type

Optimization Recommendations:

  • Switch to cheaper models for simple queries
  • Reduce prompt tokens through optimization
  • Cache frequently requested responses
  • Implement request batching

Metric Correlation

Exporting Data

Export metrics in multiple formats for external analysis, reporting, or archival purposes.

Export Formats:

  • CSV: Raw data for spreadsheet analysis
  • JSON: Structured data for programmatic access
  • PDF: Formatted reports for stakeholders
  • PowerPoint: Executive presentations