Skip to content

Getting Started

This guide walks you through pulling the AIMD Docker image, deploying it to Cloud Run, and making your first prediction.

Prerequisites

  • Docker installed
  • Google Cloud SDK (gcloud) installed and authenticated
  • Access granted to the AIMD Artifact Registry repository (provided by Beatdapp)
  • API key for authentication (provided by Beatdapp)

Step 1: Pull the Docker Image

Authenticate Docker with Artifact Registry:

gcloud auth configure-docker ${REGION}-docker.pkg.dev

Pull the image using the tag from the Release Notes:

docker pull ${REGION}-docker.pkg.dev/${GCP_PROJECT}/aimd-inference/aimd-inference-mvp:${TAG}

Verify the image digest

After pulling, verify the image digest matches the one listed in the Release Notes for your version:

docker inspect --format='{{index .RepoDigests 0}}' \
  ${REGION}-docker.pkg.dev/${GCP_PROJECT}/aimd-inference/aimd-inference-mvp:${TAG}

Step 2: Configure Environment Variables

At minimum, you need:

Variable Description
GCP_PROJECT Your Google Cloud project ID
API_KEYS Comma-separated API keys for authentication

See Configuration for the full list of environment variables.

Step 3: Deploy to Cloud Run

The AIMD service requires a GPU. NVIDIA L4 GPUs are available in these Cloud Run regions: us-central1, us-east4, europe-west1, europe-west4, asia-southeast1.

gcloud run deploy aimd-inference-mvp \
  --image ${REGION}-docker.pkg.dev/${GCP_PROJECT}/aimd-inference/aimd-inference-mvp:${TAG} \
  --region us-central1 \
  --gpu 1 \
  --gpu-type nvidia-l4 \
  --cpu 4 \
  --memory 16Gi \
  --min-instances 0 \
  --max-instances 1 \
  --timeout 300 \
  --concurrency 4 \
  --set-env-vars "GCP_PROJECT=${GCP_PROJECT},API_KEYS=${API_KEY}" \
  --execution-environment gen2 \
  --no-allow-unauthenticated \
  --project ${GCP_PROJECT}

Note

  • Models are baked into the Docker image — no GCS paths or volume mounts needed.
  • If prompted about zonal redundancy quota, select Y to deploy without it.
  • Use --min-instances 0 to scale to zero when idle (cost optimization).

Step 4: Verify the Deployment

Get your service URL:

SERVICE_URL=$(gcloud run services describe aimd-inference-mvp \
  --region us-central1 \
  --format 'value(status.url)')

Run a health check (no API key required):

curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  ${SERVICE_URL}/health

Expected response:

{
  "status": "healthy",
  "device": "cuda",
  "models_loaded": 5,
  "xgb_loaded": true,
  "gpu_available": true,
  "gpu_memory_allocated_mb": 1234.5
}

Step 5: Make Your First Prediction

curl -X POST ${SERVICE_URL}/predict \
  -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ${API_KEY}" \
  -d '{"gcs_uri": "gs://your-bucket/audio.mp3"}'

See API Reference for the full request/response specification and Usage Examples for more detailed examples.