API Reference¶

Base URL: Your Cloud Run service URL (e.g., https://aimd-inference-mvp-xxxxx.run.app)

All requests to Cloud Run require a GCP identity token in the Authorization header when the service is deployed with --no-allow-unauthenticated. Additionally, POST /predict and GET /status require an API key in the X-API-Key header.

POST /predict¶

Run AI detection on an audio file.

Authentication¶

Authorization: Bearer <identity-token> (Cloud Run IAM)
X-API-Key: <api-key> (application-level)

Request Body¶

Field	Type	Required	Default	Description
`gcs_uri`	string	Yes	—	GCS URI of the audio file (`gs://bucket/path/file.mp3`)
`num_snippets`	integer or `"max"`	No	`"max"`	Number of 30-second snippets to analyze. `"max"` analyzes all possible snippets.
`xgb_threshold`	float (0.0–1.0)	No	`0.5`	Decision threshold for the ensemble classifier

Response (200 OK)¶

Field	Type	Description
`filename`	string	Original filename extracted from the GCS URI
`prediction`	`"AI"` or `"REAL"`	Final ensemble prediction
`probability`	float	Ensemble output probability (0.0 = definitely real, 1.0 = definitely AI)
`confidence`	float	Confidence score (0.0 = uncertain, 1.0 = highly confident)
`snippet_results`	array	Per-snippet breakdown (see below)
`model_probabilities`	object	Average probability from each individual model
`processing_time_ms`	float	Total processing time in milliseconds

`snippet_results` items¶

Field	Type	Description
`snippet_id`	integer	1-indexed snippet number
`start_time`	float	Snippet start time in seconds
`end_time`	float	Snippet end time in seconds
`probability`	float	Snippet-level AI probability
`prediction`	`"AI"` or `"REAL"`	Snippet-level prediction

`model_probabilities` keys¶

The model_probabilities object contains a key for each model in the ensemble, with its average probability across all snippets. The specific keys are stable across a given release version.

Example¶

curl -X POST ${SERVICE_URL}/predict \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ${API_KEY}" \
  -d '{"gcs_uri": "gs://my-bucket/audio/song.mp3"}'

{
  "filename": "song.mp3",
  "prediction": "AI",
  "probability": 0.85,
  "confidence": 0.70,
  "snippet_results": [
    {
      "snippet_id": 1,
      "start_time": 0.0,
      "end_time": 30.0,
      "probability": 0.88,
      "prediction": "AI"
    },
    {
      "snippet_id": 2,
      "start_time": 30.0,
      "end_time": 60.0,
      "probability": 0.82,
      "prediction": "AI"
    }
  ],
  "model_probabilities": {
    "aggro_cnn": 0.88,
    "tory_cnn": 0.75,
    "aggro_stt": 0.91,
    "tory_stt": 0.80,
    "udio_lite": 0.85
  },
  "processing_time_ms": 2345.6
}

Error Responses¶

Status	Error	Description
400	Invalid GCS URI	Malformed `gs://` path or path traversal detected
400	Download failed	File not accessible in GCS (permissions or not found)
401	Unauthorized	Missing or invalid `X-API-Key` header
422	Inference failed	Model error during processing
429	Rate limit exceeded	Too many requests. Check `Retry-After` header.
503	Service busy	All GPU slots occupied. Retry after a few seconds.

GET /health¶

Health check endpoint. Used by Cloud Run for startup and liveness probes.

Authentication¶

No X-API-Key required. Cloud Run IAM (Authorization header) may still apply.

Response (200 OK)¶

Field	Type	Description
`status`	`"healthy"` or `"unhealthy"`	Service health status
`device`	string	Compute device (`"cuda"` or `"cpu"`)
`models_loaded`	integer	Number of loaded models
`xgb_loaded`	boolean	Whether the ensemble classifier is loaded
`gpu_available`	boolean	Whether a GPU is detected
`gpu_memory_allocated_mb`	float	Current GPU memory usage in MB

Example¶

curl ${SERVICE_URL}/health \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)"

GET /metrics¶

Prometheus metrics endpoint.

Authentication¶

No X-API-Key required.

Response¶

Returns metrics in Prometheus text exposition format.

Available metrics:

Metric	Type	Labels	Description
`inference_requests_total`	Counter	`status`, `endpoint`	Total requests by status and endpoint
`inference_request_duration_seconds`	Histogram	`endpoint`	Request latency (buckets: 0.1s to 120s)
`rate_limit_rejections_total`	Counter	—	Total rate limit rejections

GET /status¶

Detailed service status including rate limiter state. Useful for debugging.

Authentication¶

Authorization: Bearer <identity-token> (Cloud Run IAM)
X-API-Key: <api-key> (required)

Response (200 OK)¶

Returns a JSON object with engine (same as /health response) and rate_limiter state.

curl ${SERVICE_URL}/status \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  -H "X-API-Key: ${API_KEY}"

Request Tracing¶

All requests are assigned a unique request ID for tracing. You can provide your own via the X-Request-ID header, or the service will generate one automatically. The request ID is returned in the X-Request-ID response header.

API Reference¶

POST /predict¶

Authentication¶

Request Body¶

Response (200 OK)¶

snippet_results items¶

model_probabilities keys¶

Example¶

Error Responses¶

GET /health¶

Authentication¶

Response (200 OK)¶

Example¶

GET /metrics¶

Authentication¶

Response¶

GET /status¶

Authentication¶

Response (200 OK)¶

Request Tracing¶

`snippet_results` items¶

`model_probabilities` keys¶