Usage Examples¶

Setup¶

Set these variables for the examples below:

SERVICE_URL=$(gcloud run services describe aimd-inference-mvp \
  --region us-central1 --format 'value(status.url)')
TOKEN=$(gcloud auth print-identity-token)
API_KEY="your-api-key"

cURL Examples¶

Health Check¶

curl ${SERVICE_URL}/health -H "Authorization: Bearer ${TOKEN}"

Basic Prediction¶

curl -X POST ${SERVICE_URL}/predict \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ${API_KEY}" \
  -d '{"gcs_uri": "gs://your-bucket/audio.mp3"}'

Prediction with Custom Parameters¶

Analyze only 3 snippets with a higher decision threshold:

curl -X POST ${SERVICE_URL}/predict \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ${API_KEY}" \
  -d '{"gcs_uri": "gs://your-bucket/audio.mp3", "num_snippets": 3, "xgb_threshold": 0.6}'

Python Examples¶

Single Prediction¶

import requests
import subprocess

service_url = "https://aimd-inference-mvp-xxxxx.run.app"
api_key = "your-api-key"

# Get identity token for Cloud Run
token = subprocess.check_output(
    ["gcloud", "auth", "print-identity-token"], text=True
).strip()

response = requests.post(
    f"{service_url}/predict",
    headers={
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json",
        "X-API-Key": api_key,
    },
    json={"gcs_uri": "gs://your-bucket/audio.mp3"},
)
response.raise_for_status()
result = response.json()

print(f"Prediction: {result['prediction']}")
print(f"Probability: {result['probability']:.3f}")
print(f"Confidence: {result['confidence']:.3f}")

Batch Processing with Rate Limit Handling¶

import time
import requests
import subprocess


def get_token():
    return subprocess.check_output(
        ["gcloud", "auth", "print-identity-token"], text=True
    ).strip()


def predict_with_retry(service_url, api_key, gcs_uri, max_retries=3):
    token = get_token()
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json",
        "X-API-Key": api_key,
    }

    for attempt in range(max_retries):
        response = requests.post(
            f"{service_url}/predict",
            headers=headers,
            json={"gcs_uri": gcs_uri},
        )

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 1))
            time.sleep(retry_after)
        elif response.status_code == 503:
            time.sleep(2 ** attempt)
        else:
            response.raise_for_status()

    raise RuntimeError(f"Failed after {max_retries} retries: {gcs_uri}")


# Process a list of files
gcs_uris = [
    "gs://your-bucket/song1.mp3",
    "gs://your-bucket/song2.mp3",
    "gs://your-bucket/song3.mp3",
]

results = []
for uri in gcs_uris:
    result = predict_with_retry(
        service_url="https://aimd-inference-mvp-xxxxx.run.app",
        api_key="your-api-key",
        gcs_uri=uri,
    )
    results.append(result)
    print(f"{result['filename']}: {result['prediction']} ({result['probability']:.3f})")

Interpreting Results¶

Probability¶

The probability field is the ensemble classifier output:

0.0 — The model is confident the audio is real (human-made)
1.0 — The model is confident the audio is AI-generated
0.5 — The model is uncertain

The prediction field is "AI" when probability >= xgb_threshold (default 0.5), otherwise "REAL".

Confidence¶

The confidence score measures how far the probability is from the decision boundary (0.5):

0.0 — Maximum uncertainty (probability is exactly 0.5)
1.0 — Maximum confidence (probability is 0.0 or 1.0)

Snippet Results¶

Each audio file is split into 30-second snippets. The snippet_results array shows the prediction for each snippet. This is useful for:

Identifying which parts of a track triggered an AI detection
Understanding if the detection is consistent across the track or localized

Model Probabilities¶

The model_probabilities object shows the average probability from each model in the ensemble. If models disagree significantly, the overall confidence may be lower.

Adjusting the Threshold¶

The xgb_threshold parameter (default 0.5) controls the decision boundary:

Lower threshold (e.g., 0.3) — More sensitive, catches more AI content but may increase false positives
Higher threshold (e.g., 0.7) — More conservative, fewer false positives but may miss some AI content

Monitoring¶

Scraping Prometheus Metrics¶

The /metrics endpoint returns Prometheus-formatted metrics:

curl ${SERVICE_URL}/metrics \
  -H "Authorization: Bearer ${TOKEN}"

Key metrics to monitor:

inference_requests_total — Track request volume and error rates
inference_request_duration_seconds — Monitor latency
rate_limit_rejections_total — Detect if rate limits are too restrictive