Alerts & Notifications

Alert System Architecture

Creating Alert Rules

Navigate to Project → Alerts → New Alert Rule. Select metric, choose condition, set threshold, select severity (Low/Medium/High/Critical), configure notification channels (Email/Slack/Teams), set cooldown period to prevent spam, and enable auto-resolve when conditions clear. Test alerts before activating for production use.

Pre-Built Alert Templates

Alert Template Library

Cost Management Templates

1. High Request Cost

Triggers: cost > $0.50
Severity: High
Channels: Email, Slack
Use Case: Detect expensive requests that exceed budget

2. Daily Budget Exceeded

Triggers: daily_spend > $100
Severity: Critical
Channels: Email, Slack, Teams
Use Case: Prevent runaway costs

3. Cost Spike Detection

Triggers: cost > 3x rolling average
Severity: Medium
Channels: Email
Use Case: Early warning for cost anomalies

Performance Templates

1. High Latency

Triggers: task_e2e_latency > 10000ms
Severity: High
Channels: Slack, PagerDuty
Use Case: Ensure responsive user experience

2. Low Success Rate

Triggers: task_success_rate < 0.95
Severity: Critical
Channels: All
Use Case: Detect system failures

3. High Error Rate

Triggers: error_count > 50/hour
Severity: High
Channels: Email, Slack
Use Case: Monitor system health

Content Quality Templates

1. Content Drift

Triggers: content_drift > 0.7
Severity: Medium
Channels: Email
Use Case: Detect model behavior changes

2. Low Readability

Triggers: readability_score < 30
Severity: Low
Channels: Email
Use Case: Maintain content quality standards

3. Negative Sentiment

Triggers: sentiment_score < -0.5
Severity: Medium
Channels: Email, Slack
Use Case: Monitor customer satisfaction

Security & Compliance Templates

1. PII Output Leakage

Triggers: pii_output_total > 0
Severity: Critical
Channels: All + Incident Management
Use Case: Prevent data breaches

2. Excessive PII Input

Triggers: pii_input_total > 5
Severity: Medium
Channels: Email, Slack
Use Case: Monitor PII exposure

3. Robustness Test Failure

Triggers: robustness_score < 0.8
Severity: High
Channels: Email, Slack
Use Case: Ensure model safety

Alert Automation Examples

1. Auto-Scaling Based on Alerts

# webhook_handler.py
from flask import Flask, request
import boto3

app = Flask(__name__)
ec2 = boto3.client('ec2')

@app.route('/webhook/noah-alerts', methods=['POST'])
def handle_noah_alert():
    alert = request.json
    
    # High latency detected - scale up
    if alert['metric'] == 'task_e2e_latency' and alert['severity'] == 'high':
        current_instances = get_instance_count()
        
        if current_instances < MAX_INSTANCES:
            scale_up_instances(current_instances + 2)
            
            return {
                'action': 'scaled_up',
                'from': current_instances,
                'to': current_instances + 2
            }
    
    # Cost exceeded - scale down to cheaper model
    if alert['metric'] == 'cost' and alert['value'] > 0.50:
        switch_to_model('gpt-3.5-turbo')
        
        return {
            'action': 'switched_model',
            'from': 'gpt-4',
            'to': 'gpt-3.5-turbo'
        }
    
    return {'action': 'no_action_required'}

def get_instance_count():
    # Implementation
    pass

def scale_up_instances(count):
    # Implementation
    pass

def switch_to_model(model_name):
    # Implementation
    pass

2. Incident Management Integration

# pagerduty_integration.py
import requests
import os

PAGERDUTY_API_KEY = os.getenv('PAGERDUTY_API_KEY')
PAGERDUTY_SERVICE_ID = os.getenv('PAGERDUTY_SERVICE_ID')

def create_incident(alert):
    """Create PagerDuty incident from Noah alert"""
    
    # Only create incidents for critical alerts
    if alert['severity'] != 'critical':
        return None
    
    payload = {
        'incident': {
            'type': 'incident',
            'title': f"Noah Alert: {alert['title']}",
            'service': {
                'id': PAGERDUTY_SERVICE_ID,
                'type': 'service_reference'
            },
            'urgency': 'high',
            'body': {
                'type': 'incident_body',
                'details': format_alert_details(alert)
            }
        }
    }
    
    response = requests.post(
        'https://api.pagerduty.com/incidents',
        json=payload,
        headers={
            'Authorization': f'Token token={PAGERDUTY_API_KEY}',
            'Content-Type': 'application/json',
            'Accept': 'application/vnd.pagerduty+json;version=2'
        }
    )
    
    return response.json()

def format_alert_details(alert):
    return f"""
    Project: {alert['project_name']}
    Metric: {alert['metric']}
    Current Value: {alert['value']}
    Threshold: {alert['threshold']}
    Triggered: {alert['triggered_at']}
    
    View in Noah: {alert['dashboard_url']}
    """

3. Slack Bot Commands

# slack_bot.py
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError

client = WebClient(token=os.getenv('SLACK_BOT_TOKEN'))

def handle_slash_command(command, user_id, channel_id):
    """Handle /noah commands in Slack"""
    
    if command == '/noah alerts':
        # Get active alerts
        alerts = get_active_alerts()
        
        blocks = []
        for alert in alerts:
            blocks.append({
                'type': 'section',
                'text': {
                    'type': 'mrkdwn',
                    'text': f"*{alert['title']}* ({alert['severity'].upper()})\n{alert['description']}"
                },
                'accessory': {
                    'type': 'button',
                    'text': {'type': 'plain_text', 'text': 'Acknowledge'},
                    'action_id': f"ack_{alert['id']}"
                }
            })
        
        client.chat_postMessage(
            channel=channel_id,
            blocks=blocks,
            text=f"You have {len(alerts)} active alerts"
        )
    
    elif command.startswith('/noah mute'):
        # Mute project alerts
        project_id = command.split()[-1]
        mute_alerts(project_id, duration='1h')
        
        client.chat_postMessage(
            channel=channel_id,
            text=f"Alerts muted for project {project_id} for 1 hour"
        )
    
    elif command == '/noah status':
        # Get system status
        status = get_system_status()
        
        client.chat_postMessage(
            channel=channel_id,
            text=f"""
            System Status:
            - Active Projects: {status['active_projects']}
            - Requests (24h): {status['requests_24h']}
            - Active Alerts: {status['active_alerts']}
            - System Health: {status['health']}
            """
        )

Notification Channels

Email Notifications

Customizable templates
Recipient lists per severity
HTML or plain text format
Digest mode available (daily/weekly)

Slack Integration

Real-time notifications
Interactive buttons (acknowledge, mute)
Thread replies for updates
Custom channel routing

Microsoft Teams

Adaptive cards with rich formatting
Action buttons
Channel mentions
Priority notifications

Webhooks

Custom endpoint integration
JSON payload
Retry logic with exponential backoff
Signature verification

Best Practices

Alert Configuration

Start with pre-built templates
Adjust thresholds based on baseline metrics
Use cooldown periods to prevent alert fatigue
Enable auto-resolve for transient issues
Test alert delivery before production

Notification Strategy

Route critical alerts to multiple channels
Use escalation policies for unacknowledged alerts
Configure quiet hours for non-critical alerts
Set up on-call rotation for critical systems
Document alert response procedures

Alert Tuning

Review alert frequency weekly
Adjust thresholds based on false positive rate
Archive unused alert rules
Consolidate similar alerts
Update documentation as thresholds change

Alert System Architecture​

Creating Alert Rules​

Pre-Built Alert Templates​

Alert Template Library​

Cost Management Templates​

Performance Templates​

Content Quality Templates​

Security & Compliance Templates​

Alert Automation Examples​

1. Auto-Scaling Based on Alerts​

2. Incident Management Integration​

3. Slack Bot Commands​

Notification Channels​

Email Notifications​

Slack Integration​

Microsoft Teams​

Webhooks​

Best Practices​

Alert Configuration​

Notification Strategy​

Alert Tuning​