Usage Examples¶
Setup¶
Set these variables for the examples below:
SERVICE_URL=$(gcloud run services describe aimd-inference-mvp \
--region us-central1 --format 'value(status.url)')
TOKEN=$(gcloud auth print-identity-token)
API_KEY="your-api-key"
cURL Examples¶
Health Check¶
Basic Prediction¶
curl -X POST ${SERVICE_URL}/predict \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-H "X-API-Key: ${API_KEY}" \
-d '{"gcs_uri": "gs://your-bucket/audio.mp3"}'
Prediction with Custom Parameters¶
Analyze only 3 snippets with a higher decision threshold:
curl -X POST ${SERVICE_URL}/predict \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-H "X-API-Key: ${API_KEY}" \
-d '{"gcs_uri": "gs://your-bucket/audio.mp3", "num_snippets": 3, "xgb_threshold": 0.6}'
Python Examples¶
Single Prediction¶
import requests
import subprocess
service_url = "https://aimd-inference-mvp-xxxxx.run.app"
api_key = "your-api-key"
# Get identity token for Cloud Run
token = subprocess.check_output(
["gcloud", "auth", "print-identity-token"], text=True
).strip()
response = requests.post(
f"{service_url}/predict",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"X-API-Key": api_key,
},
json={"gcs_uri": "gs://your-bucket/audio.mp3"},
)
response.raise_for_status()
result = response.json()
print(f"Prediction: {result['prediction']}")
print(f"Probability: {result['probability']:.3f}")
print(f"Confidence: {result['confidence']:.3f}")
Batch Processing with Rate Limit Handling¶
import time
import requests
import subprocess
def get_token():
return subprocess.check_output(
["gcloud", "auth", "print-identity-token"], text=True
).strip()
def predict_with_retry(service_url, api_key, gcs_uri, max_retries=3):
token = get_token()
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"X-API-Key": api_key,
}
for attempt in range(max_retries):
response = requests.post(
f"{service_url}/predict",
headers=headers,
json={"gcs_uri": gcs_uri},
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 1))
time.sleep(retry_after)
elif response.status_code == 503:
time.sleep(2 ** attempt)
else:
response.raise_for_status()
raise RuntimeError(f"Failed after {max_retries} retries: {gcs_uri}")
# Process a list of files
gcs_uris = [
"gs://your-bucket/song1.mp3",
"gs://your-bucket/song2.mp3",
"gs://your-bucket/song3.mp3",
]
results = []
for uri in gcs_uris:
result = predict_with_retry(
service_url="https://aimd-inference-mvp-xxxxx.run.app",
api_key="your-api-key",
gcs_uri=uri,
)
results.append(result)
print(f"{result['filename']}: {result['prediction']} ({result['probability']:.3f})")
Interpreting Results¶
Probability¶
The probability field is the ensemble classifier output:
- 0.0 — The model is confident the audio is real (human-made)
- 1.0 — The model is confident the audio is AI-generated
- 0.5 — The model is uncertain
The prediction field is "AI" when probability >= xgb_threshold (default 0.5), otherwise "REAL".
Confidence¶
The confidence score measures how far the probability is from the decision boundary (0.5):
- 0.0 — Maximum uncertainty (probability is exactly 0.5)
- 1.0 — Maximum confidence (probability is 0.0 or 1.0)
Snippet Results¶
Each audio file is split into 30-second snippets. The snippet_results array shows the prediction for each snippet. This is useful for:
- Identifying which parts of a track triggered an AI detection
- Understanding if the detection is consistent across the track or localized
Model Probabilities¶
The model_probabilities object shows the average probability from each model in the ensemble. If models disagree significantly, the overall confidence may be lower.
Adjusting the Threshold¶
The xgb_threshold parameter (default 0.5) controls the decision boundary:
- Lower threshold (e.g., 0.3) — More sensitive, catches more AI content but may increase false positives
- Higher threshold (e.g., 0.7) — More conservative, fewer false positives but may miss some AI content
Monitoring¶
Scraping Prometheus Metrics¶
The /metrics endpoint returns Prometheus-formatted metrics:
Key metrics to monitor:
inference_requests_total— Track request volume and error ratesinference_request_duration_seconds— Monitor latencyrate_limit_rejections_total— Detect if rate limits are too restrictive