Skip to main content

Alerts & Notifications

Alert System Architecture

Creating Alert Rules

Navigate to Project → Alerts → New Alert Rule. Select metric, choose condition, set threshold, select severity (Low/Medium/High/Critical), configure notification channels (Email/Slack/Teams), set cooldown period to prevent spam, and enable auto-resolve when conditions clear. Test alerts before activating for production use.

Pre-Built Alert Templates

Alert Template Library

Cost Management Templates

1. High Request Cost

  • Triggers: cost > $0.50
  • Severity: High
  • Channels: Email, Slack
  • Use Case: Detect expensive requests that exceed budget

2. Daily Budget Exceeded

  • Triggers: daily_spend > $100
  • Severity: Critical
  • Channels: Email, Slack, Teams
  • Use Case: Prevent runaway costs

3. Cost Spike Detection

  • Triggers: cost > 3x rolling average
  • Severity: Medium
  • Channels: Email
  • Use Case: Early warning for cost anomalies

Performance Templates

1. High Latency

  • Triggers: task_e2e_latency > 10000ms
  • Severity: High
  • Channels: Slack, PagerDuty
  • Use Case: Ensure responsive user experience

2. Low Success Rate

  • Triggers: task_success_rate < 0.95
  • Severity: Critical
  • Channels: All
  • Use Case: Detect system failures

3. High Error Rate

  • Triggers: error_count > 50/hour
  • Severity: High
  • Channels: Email, Slack
  • Use Case: Monitor system health

Content Quality Templates

1. Content Drift

  • Triggers: content_drift > 0.7
  • Severity: Medium
  • Channels: Email
  • Use Case: Detect model behavior changes

2. Low Readability

  • Triggers: readability_score < 30
  • Severity: Low
  • Channels: Email
  • Use Case: Maintain content quality standards

3. Negative Sentiment

  • Triggers: sentiment_score < -0.5
  • Severity: Medium
  • Channels: Email, Slack
  • Use Case: Monitor customer satisfaction

Security & Compliance Templates

1. PII Output Leakage

  • Triggers: pii_output_total > 0
  • Severity: Critical
  • Channels: All + Incident Management
  • Use Case: Prevent data breaches

2. Excessive PII Input

  • Triggers: pii_input_total > 5
  • Severity: Medium
  • Channels: Email, Slack
  • Use Case: Monitor PII exposure

3. Robustness Test Failure

  • Triggers: robustness_score < 0.8
  • Severity: High
  • Channels: Email, Slack
  • Use Case: Ensure model safety

Alert Automation Examples

1. Auto-Scaling Based on Alerts

# webhook_handler.py
from flask import Flask, request
import boto3

app = Flask(__name__)
ec2 = boto3.client('ec2')

@app.route('/webhook/noah-alerts', methods=['POST'])
def handle_noah_alert():
alert = request.json

# High latency detected - scale up
if alert['metric'] == 'task_e2e_latency' and alert['severity'] == 'high':
current_instances = get_instance_count()

if current_instances < MAX_INSTANCES:
scale_up_instances(current_instances + 2)

return {
'action': 'scaled_up',
'from': current_instances,
'to': current_instances + 2
}

# Cost exceeded - scale down to cheaper model
if alert['metric'] == 'cost' and alert['value'] > 0.50:
switch_to_model('gpt-3.5-turbo')

return {
'action': 'switched_model',
'from': 'gpt-4',
'to': 'gpt-3.5-turbo'
}

return {'action': 'no_action_required'}

def get_instance_count():
# Implementation
pass

def scale_up_instances(count):
# Implementation
pass

def switch_to_model(model_name):
# Implementation
pass

2. Incident Management Integration

# pagerduty_integration.py
import requests
import os

PAGERDUTY_API_KEY = os.getenv('PAGERDUTY_API_KEY')
PAGERDUTY_SERVICE_ID = os.getenv('PAGERDUTY_SERVICE_ID')

def create_incident(alert):
"""Create PagerDuty incident from Noah alert"""

# Only create incidents for critical alerts
if alert['severity'] != 'critical':
return None

payload = {
'incident': {
'type': 'incident',
'title': f"Noah Alert: {alert['title']}",
'service': {
'id': PAGERDUTY_SERVICE_ID,
'type': 'service_reference'
},
'urgency': 'high',
'body': {
'type': 'incident_body',
'details': format_alert_details(alert)
}
}
}

response = requests.post(
'https://api.pagerduty.com/incidents',
json=payload,
headers={
'Authorization': f'Token token={PAGERDUTY_API_KEY}',
'Content-Type': 'application/json',
'Accept': 'application/vnd.pagerduty+json;version=2'
}
)

return response.json()

def format_alert_details(alert):
return f"""
Project: {alert['project_name']}
Metric: {alert['metric']}
Current Value: {alert['value']}
Threshold: {alert['threshold']}
Triggered: {alert['triggered_at']}

View in Noah: {alert['dashboard_url']}
"""

3. Slack Bot Commands

# slack_bot.py
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError

client = WebClient(token=os.getenv('SLACK_BOT_TOKEN'))

def handle_slash_command(command, user_id, channel_id):
"""Handle /noah commands in Slack"""

if command == '/noah alerts':
# Get active alerts
alerts = get_active_alerts()

blocks = []
for alert in alerts:
blocks.append({
'type': 'section',
'text': {
'type': 'mrkdwn',
'text': f"*{alert['title']}* ({alert['severity'].upper()})\n{alert['description']}"
},
'accessory': {
'type': 'button',
'text': {'type': 'plain_text', 'text': 'Acknowledge'},
'action_id': f"ack_{alert['id']}"
}
})

client.chat_postMessage(
channel=channel_id,
blocks=blocks,
text=f"You have {len(alerts)} active alerts"
)

elif command.startswith('/noah mute'):
# Mute project alerts
project_id = command.split()[-1]
mute_alerts(project_id, duration='1h')

client.chat_postMessage(
channel=channel_id,
text=f"Alerts muted for project {project_id} for 1 hour"
)

elif command == '/noah status':
# Get system status
status = get_system_status()

client.chat_postMessage(
channel=channel_id,
text=f"""
System Status:
- Active Projects: {status['active_projects']}
- Requests (24h): {status['requests_24h']}
- Active Alerts: {status['active_alerts']}
- System Health: {status['health']}
"""
)

Notification Channels

Email Notifications

  • Customizable templates
  • Recipient lists per severity
  • HTML or plain text format
  • Digest mode available (daily/weekly)

Slack Integration

  • Real-time notifications
  • Interactive buttons (acknowledge, mute)
  • Thread replies for updates
  • Custom channel routing

Microsoft Teams

  • Adaptive cards with rich formatting
  • Action buttons
  • Channel mentions
  • Priority notifications

Webhooks

  • Custom endpoint integration
  • JSON payload
  • Retry logic with exponential backoff
  • Signature verification

Best Practices

Alert Configuration

  1. Start with pre-built templates
  2. Adjust thresholds based on baseline metrics
  3. Use cooldown periods to prevent alert fatigue
  4. Enable auto-resolve for transient issues
  5. Test alert delivery before production

Notification Strategy

  1. Route critical alerts to multiple channels
  2. Use escalation policies for unacknowledged alerts
  3. Configure quiet hours for non-critical alerts
  4. Set up on-call rotation for critical systems
  5. Document alert response procedures

Alert Tuning

  1. Review alert frequency weekly
  2. Adjust thresholds based on false positive rate
  3. Archive unused alert rules
  4. Consolidate similar alerts
  5. Update documentation as thresholds change