Getting Started¶

This guide walks you through pulling the AIMD Docker image, deploying it to Cloud Run, and making your first prediction.

Prerequisites¶

Docker installed
Google Cloud SDK (gcloud) installed and authenticated
Access granted to the AIMD Artifact Registry repository (provided by Beatdapp)
API key for authentication (provided by Beatdapp)

Step 1: Pull the Docker Image¶

Authenticate Docker with Artifact Registry:

gcloud auth configure-docker ${REGION}-docker.pkg.dev

Pull the image using the tag from the Release Notes:

docker pull ${REGION}-docker.pkg.dev/${GCP_PROJECT}/aimd-inference/aimd-inference-mvp:${TAG}

Verify the image digest

After pulling, verify the image digest matches the one listed in the Release Notes for your version:

docker inspect --format='{{index .RepoDigests 0}}' \
  ${REGION}-docker.pkg.dev/${GCP_PROJECT}/aimd-inference/aimd-inference-mvp:${TAG}

Step 2: Configure Environment Variables¶

At minimum, you need:

Variable	Description
`GCP_PROJECT`	Your Google Cloud project ID
`API_KEYS`	Comma-separated API keys for authentication

See Configuration for the full list of environment variables.

Step 3: Deploy to Cloud Run¶

The AIMD service requires a GPU. NVIDIA L4 GPUs are available in these Cloud Run regions: us-central1, us-east4, europe-west1, europe-west4, asia-southeast1.

gcloud run deploy aimd-inference-mvp \
  --image ${REGION}-docker.pkg.dev/${GCP_PROJECT}/aimd-inference/aimd-inference-mvp:${TAG} \
  --region us-central1 \
  --gpu 1 \
  --gpu-type nvidia-l4 \
  --cpu 4 \
  --memory 16Gi \
  --min-instances 0 \
  --max-instances 1 \
  --timeout 300 \
  --concurrency 4 \
  --set-env-vars "GCP_PROJECT=${GCP_PROJECT},API_KEYS=${API_KEY}" \
  --execution-environment gen2 \
  --no-allow-unauthenticated \
  --project ${GCP_PROJECT}

Note

Models are baked into the Docker image — no GCS paths or volume mounts needed.
If prompted about zonal redundancy quota, select Y to deploy without it.
Use --min-instances 0 to scale to zero when idle (cost optimization).

Step 4: Verify the Deployment¶

Get your service URL:

SERVICE_URL=$(gcloud run services describe aimd-inference-mvp \
  --region us-central1 \
  --format 'value(status.url)')

Run a health check (no API key required):

curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  ${SERVICE_URL}/health

Expected response:

{
  "status": "healthy",
  "device": "cuda",
  "models_loaded": 5,
  "xgb_loaded": true,
  "gpu_available": true,
  "gpu_memory_allocated_mb": 1234.5
}

Step 5: Make Your First Prediction¶

curl -X POST ${SERVICE_URL}/predict \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ${API_KEY}" \
  -d '{"gcs_uri": "gs://your-bucket/audio.mp3"}'

See API Reference for the full request/response specification and Usage Examples for more detailed examples.