API Reference¶
Base URL: Your Cloud Run service URL (e.g., https://aimd-inference-mvp-xxxxx.run.app)
All requests to Cloud Run require a GCP identity token in the Authorization header when the service is deployed with --no-allow-unauthenticated. Additionally, POST /predict and GET /status require an API key in the X-API-Key header.
POST /predict¶
Run AI detection on an audio file.
Authentication¶
Authorization: Bearer <identity-token>(Cloud Run IAM)X-API-Key: <api-key>(application-level)
Request Body¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
gcs_uri |
string | Yes | — | GCS URI of the audio file (gs://bucket/path/file.mp3) |
num_snippets |
integer or "max" |
No | "max" |
Number of 30-second snippets to analyze. "max" analyzes all possible snippets. |
xgb_threshold |
float (0.0–1.0) | No | 0.5 |
Decision threshold for the ensemble classifier |
Response (200 OK)¶
| Field | Type | Description |
|---|---|---|
filename |
string | Original filename extracted from the GCS URI |
prediction |
"AI" or "REAL" |
Final ensemble prediction |
probability |
float | Ensemble output probability (0.0 = definitely real, 1.0 = definitely AI) |
confidence |
float | Confidence score (0.0 = uncertain, 1.0 = highly confident) |
snippet_results |
array | Per-snippet breakdown (see below) |
model_probabilities |
object | Average probability from each individual model |
processing_time_ms |
float | Total processing time in milliseconds |
snippet_results items¶
| Field | Type | Description |
|---|---|---|
snippet_id |
integer | 1-indexed snippet number |
start_time |
float | Snippet start time in seconds |
end_time |
float | Snippet end time in seconds |
probability |
float | Snippet-level AI probability |
prediction |
"AI" or "REAL" |
Snippet-level prediction |
model_probabilities keys¶
The model_probabilities object contains a key for each model in the ensemble, with its average probability across all snippets. The specific keys are stable across a given release version.
Example¶
curl -X POST ${SERVICE_URL}/predict \
-H "Authorization: Bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
-H "X-API-Key: ${API_KEY}" \
-d '{"gcs_uri": "gs://my-bucket/audio/song.mp3"}'
{
"filename": "song.mp3",
"prediction": "AI",
"probability": 0.85,
"confidence": 0.70,
"snippet_results": [
{
"snippet_id": 1,
"start_time": 0.0,
"end_time": 30.0,
"probability": 0.88,
"prediction": "AI"
},
{
"snippet_id": 2,
"start_time": 30.0,
"end_time": 60.0,
"probability": 0.82,
"prediction": "AI"
}
],
"model_probabilities": {
"aggro_cnn": 0.88,
"tory_cnn": 0.75,
"aggro_stt": 0.91,
"tory_stt": 0.80,
"udio_lite": 0.85
},
"processing_time_ms": 2345.6
}
Error Responses¶
| Status | Error | Description |
|---|---|---|
| 400 | Invalid GCS URI | Malformed gs:// path or path traversal detected |
| 400 | Download failed | File not accessible in GCS (permissions or not found) |
| 401 | Unauthorized | Missing or invalid X-API-Key header |
| 422 | Inference failed | Model error during processing |
| 429 | Rate limit exceeded | Too many requests. Check Retry-After header. |
| 503 | Service busy | All GPU slots occupied. Retry after a few seconds. |
GET /health¶
Health check endpoint. Used by Cloud Run for startup and liveness probes.
Authentication¶
No X-API-Key required. Cloud Run IAM (Authorization header) may still apply.
Response (200 OK)¶
| Field | Type | Description |
|---|---|---|
status |
"healthy" or "unhealthy" |
Service health status |
device |
string | Compute device ("cuda" or "cpu") |
models_loaded |
integer | Number of loaded models |
xgb_loaded |
boolean | Whether the ensemble classifier is loaded |
gpu_available |
boolean | Whether a GPU is detected |
gpu_memory_allocated_mb |
float | Current GPU memory usage in MB |
Example¶
GET /metrics¶
Prometheus metrics endpoint.
Authentication¶
No X-API-Key required.
Response¶
Returns metrics in Prometheus text exposition format.
Available metrics:
| Metric | Type | Labels | Description |
|---|---|---|---|
inference_requests_total |
Counter | status, endpoint |
Total requests by status and endpoint |
inference_request_duration_seconds |
Histogram | endpoint |
Request latency (buckets: 0.1s to 120s) |
rate_limit_rejections_total |
Counter | — | Total rate limit rejections |
GET /status¶
Detailed service status including rate limiter state. Useful for debugging.
Authentication¶
Authorization: Bearer <identity-token>(Cloud Run IAM)X-API-Key: <api-key>(required)
Response (200 OK)¶
Returns a JSON object with engine (same as /health response) and rate_limiter state.
curl ${SERVICE_URL}/status \
-H "Authorization: Bearer $(gcloud auth print-identity-token)" \
-H "X-API-Key: ${API_KEY}"
Request Tracing¶
All requests are assigned a unique request ID for tracing. You can provide your own via the X-Request-ID header, or the service will generate one automatically. The request ID is returned in the X-Request-ID response header.