docker - Custom Container on Vertex AI Returns "405 Method Not Allowed" for predict Endpoint - Stack Overflow

I'm encountering an issue when deploying my custom container on Vertex AI. Locally my Flask server

I'm encountering an issue when deploying my custom container on Vertex AI. Locally my Flask server (running via Gunicorn) works perfectly—both the /predict and /health endpoints respond as expected. However, when Vertex AI calls the prediction API, I always receive a 405 Method Not Allowed error.

My Setup

  • Container: I use a custom Docker container that exposes port 8080.
  • Model Upload: I upload my model to Vertex AI with the following flags:
    • --container-predict-route=/predict
    • --container-health-route=/health
  • Prediction Call: I call the prediction API using the Google Cloud AI Platform client library.

Observations

  • Vertex AI PredictionService sends requests to a URL like:
    /v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict
    but my server returns 405.
  • If I perform a GET request to the endpoint (for example, via terminal), I receive a valid response
    However, when calling /predict (or even /rawPredict as described in Vertex AI rawPredict docs), I still get a 405.
  • The server is running (i think) since i receive a log every 10 seconds like:
    <GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK>.
  • I've added multiple route definitions (including catch-all routes) to handle URLs such as /v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict, but the error persists.

Below is my code:


Dockerfile:

FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
RUN apt-get update && apt-get install -y --no-install-recommends \
    wget \
    curl \
    python3-dev \
    python3-pip \
    python3-setuptools && \
    rm -rf /var/lib/apt/lists/*
RUN ln -sf /usr/bin/python3 /usr/bin/python
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir torch>=1.12.0 torchvision>=0.13.0 && \
    if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
COPY . .
EXPOSE 8080
CMD ["gunicorn", "-w", "1", "-b", "0.0.0.0:8080", "main:app"]

Function to call Vertex AI API (call_vertex_ai):

def call_vertex_ai(gcs_uri: str, additional_args: dict):
    client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis"}
    client = aiplatform.gapic.PredictionServiceClient(
        client_options=client_options)

    instance = predict.instance.ImageClassificationPredictionInstance(
        content=gcs_uri  # GCS path for image file
    ).to_value()
    instances = [instance]

    parameters = predict.params.ImageClassificationPredictionParams(
        confidence_threshold=additional_args.get("threshold", 0.5),
    ).to_value()

    endpoint = client.endpoint_path(
        project=PROJECT_ID, location=REGION, endpoint=ENDPOINT_ID
    )

    response = client.predict(
        endpoint=endpoint, instances=instances, parameters=parameters)

    return response.predictions

main.py (Vertex AI prediction server):

... some imports ... 
app = Flask(__name__)

def load_model():
    ...

load_model()


def handle_predict():    ... code ...
    detections = [{
        "bbox": bbox.tolist() if isinstance(bbox, np.ndarray) else bbox,
        "class": class_name,
        "score": float(score),
    } for bbox, class_name, score in zip(draw_boxes, pred_classes, scores)]

    return jsonify({"predictions": detections})

@app.post("/predict")
def predict():
    return handle_predict()

@app.route("/health", methods=["GET"])
def health():
    return jsonify({"status": "healthy"})


@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<path:deployed_model_path>", methods=["POST"])
def predict_deployed_model(endpoint_id, deployed_model_path):
    if not deployed_model_path.endswith(":predict"):
        return "Not Found", 404
    return handle_predict()

@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:predict", methods=["POST"])
def predict_deployed_model_direct(endpoint_id, deployed_model_id):
    return handle_predict()

@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:rawPredict", methods=["POST"])
def raw_predict_deployed_model(endpoint_id, deployed_model_id):
    return handle_predict()

@app.before_request
def log_request_info():
    logger.info(f"Received request: {request.method} {request.url}")
    logger.info(f"Headers: {dict(request.headers)}")
    logger.info(f"Body: {request.get_data().decode('utf-8')}")

deploy.sh (the code is semplified):

gcloud builds submit \
  --tag "${IMAGE_NAME}:latest" \
  --gcs-source-staging-dir="gs://$BUCKET_NAME/source" \
  --gcs-log-dir="gs://$BUCKET_NAME/logs"

LATEST_IMAGE="${IMAGE_NAME}:latest"

gcloud ai models upload \
  --region="${REGION}" \
  --display-name="weldpredict-model" \
  --container-image-uri="${LATEST_IMAGE}" \
  --container-ports=8080 \
  --container-predict-route=/predict \
  --container-health-route=/health

ENDPOINT_ID=$(gcloud ai endpoints list --region="${REGION}" --format="value(ENDPOINT_ID)")

DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe "${ENDPOINT_ID}" --region="${REGION}" --format="value(deployedModels.id)")
gcloud ai endpoints undeploy-model "${ENDPOINT_ID}" --deployed-model-id="${DEPLOYED_MODEL_ID}" --region="${REGION}" --quiet

gcloud ai endpoints deploy-model "${ENDPOINT_ID}" \
    --model="${MODEL_ID}" \
    --region="${REGION}" \
    --display-name="weldpredict-deployment" \
    --machine-type=n1-standard-4 \
    --accelerator=type=nvidia-tesla-t4,count=1 \
    --min-replica-count=1 \
    --max-replica-count=1 \
    --traffic-split=0=100

Issue Summary:

  • Problem: When calling Vertex AI predictions, I receive a 405 "Method Not Allowed" error.

  • Observation: Locally, my Flask server correctly handles /predict and /health, and GET requests to the endpoint return a 200 OK. However, when I call /predict (or /rawPredict) on Vertex AI, I get a 405.

  • Setup: I deploy my custom container on Vertex AI with the --container-predict-route=/predict flag, yet Vertex AI sends requests (e.g., /v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict) that are not matched by my routes.

  • Attempts: I have added multiple route definitions—including catch-all routes—to handle URLs like /v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict but still encounter the 405 error.

  • Additional Info:

    • A log every 10 seconds catches:

      GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK
      
    • However, when calling /predict or /rawPredict the server returns 405.

Request:
I need help understanding why Vertex AI's prediction requests are not being handled as expected by my Flask server and how to properly configure my container or routes to resolve the 405 error.

Any guidance or suggestions would be greatly appreciated.

Thanks in advance!

I'm encountering an issue when deploying my custom container on Vertex AI. Locally my Flask server (running via Gunicorn) works perfectly—both the /predict and /health endpoints respond as expected. However, when Vertex AI calls the prediction API, I always receive a 405 Method Not Allowed error.

My Setup

  • Container: I use a custom Docker container that exposes port 8080.
  • Model Upload: I upload my model to Vertex AI with the following flags:
    • --container-predict-route=/predict
    • --container-health-route=/health
  • Prediction Call: I call the prediction API using the Google Cloud AI Platform client library.

Observations

  • Vertex AI PredictionService sends requests to a URL like:
    /v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict
    but my server returns 405.
  • If I perform a GET request to the endpoint (for example, via terminal), I receive a valid response
    However, when calling /predict (or even /rawPredict as described in Vertex AI rawPredict docs), I still get a 405.
  • The server is running (i think) since i receive a log every 10 seconds like:
    <GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK>.
  • I've added multiple route definitions (including catch-all routes) to handle URLs such as /v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict, but the error persists.

Below is my code:


Dockerfile:

FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
RUN apt-get update && apt-get install -y --no-install-recommends \
    wget \
    curl \
    python3-dev \
    python3-pip \
    python3-setuptools && \
    rm -rf /var/lib/apt/lists/*
RUN ln -sf /usr/bin/python3 /usr/bin/python
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir torch>=1.12.0 torchvision>=0.13.0 && \
    if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
COPY . .
EXPOSE 8080
CMD ["gunicorn", "-w", "1", "-b", "0.0.0.0:8080", "main:app"]

Function to call Vertex AI API (call_vertex_ai):

def call_vertex_ai(gcs_uri: str, additional_args: dict):
    client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis"}
    client = aiplatform.gapic.PredictionServiceClient(
        client_options=client_options)

    instance = predict.instance.ImageClassificationPredictionInstance(
        content=gcs_uri  # GCS path for image file
    ).to_value()
    instances = [instance]

    parameters = predict.params.ImageClassificationPredictionParams(
        confidence_threshold=additional_args.get("threshold", 0.5),
    ).to_value()

    endpoint = client.endpoint_path(
        project=PROJECT_ID, location=REGION, endpoint=ENDPOINT_ID
    )

    response = client.predict(
        endpoint=endpoint, instances=instances, parameters=parameters)

    return response.predictions

main.py (Vertex AI prediction server):

... some imports ... 
app = Flask(__name__)

def load_model():
    ...

load_model()


def handle_predict():    ... code ...
    detections = [{
        "bbox": bbox.tolist() if isinstance(bbox, np.ndarray) else bbox,
        "class": class_name,
        "score": float(score),
    } for bbox, class_name, score in zip(draw_boxes, pred_classes, scores)]

    return jsonify({"predictions": detections})

@app.post("/predict")
def predict():
    return handle_predict()

@app.route("/health", methods=["GET"])
def health():
    return jsonify({"status": "healthy"})


@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<path:deployed_model_path>", methods=["POST"])
def predict_deployed_model(endpoint_id, deployed_model_path):
    if not deployed_model_path.endswith(":predict"):
        return "Not Found", 404
    return handle_predict()

@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:predict", methods=["POST"])
def predict_deployed_model_direct(endpoint_id, deployed_model_id):
    return handle_predict()

@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:rawPredict", methods=["POST"])
def raw_predict_deployed_model(endpoint_id, deployed_model_id):
    return handle_predict()

@app.before_request
def log_request_info():
    logger.info(f"Received request: {request.method} {request.url}")
    logger.info(f"Headers: {dict(request.headers)}")
    logger.info(f"Body: {request.get_data().decode('utf-8')}")

deploy.sh (the code is semplified):

gcloud builds submit \
  --tag "${IMAGE_NAME}:latest" \
  --gcs-source-staging-dir="gs://$BUCKET_NAME/source" \
  --gcs-log-dir="gs://$BUCKET_NAME/logs"

LATEST_IMAGE="${IMAGE_NAME}:latest"

gcloud ai models upload \
  --region="${REGION}" \
  --display-name="weldpredict-model" \
  --container-image-uri="${LATEST_IMAGE}" \
  --container-ports=8080 \
  --container-predict-route=/predict \
  --container-health-route=/health

ENDPOINT_ID=$(gcloud ai endpoints list --region="${REGION}" --format="value(ENDPOINT_ID)")

DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe "${ENDPOINT_ID}" --region="${REGION}" --format="value(deployedModels.id)")
gcloud ai endpoints undeploy-model "${ENDPOINT_ID}" --deployed-model-id="${DEPLOYED_MODEL_ID}" --region="${REGION}" --quiet

gcloud ai endpoints deploy-model "${ENDPOINT_ID}" \
    --model="${MODEL_ID}" \
    --region="${REGION}" \
    --display-name="weldpredict-deployment" \
    --machine-type=n1-standard-4 \
    --accelerator=type=nvidia-tesla-t4,count=1 \
    --min-replica-count=1 \
    --max-replica-count=1 \
    --traffic-split=0=100

Issue Summary:

  • Problem: When calling Vertex AI predictions, I receive a 405 "Method Not Allowed" error.

  • Observation: Locally, my Flask server correctly handles /predict and /health, and GET requests to the endpoint return a 200 OK. However, when I call /predict (or /rawPredict) on Vertex AI, I get a 405.

  • Setup: I deploy my custom container on Vertex AI with the --container-predict-route=/predict flag, yet Vertex AI sends requests (e.g., /v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict) that are not matched by my routes.

  • Attempts: I have added multiple route definitions—including catch-all routes—to handle URLs like /v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict but still encounter the 405 error.

  • Additional Info:

    • A log every 10 seconds catches:

      GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK
      
    • However, when calling /predict or /rawPredict the server returns 405.

Request:
I need help understanding why Vertex AI's prediction requests are not being handled as expected by my Flask server and how to properly configure my container or routes to resolve the 405 error.

Any guidance or suggestions would be greatly appreciated.

Thanks in advance!

Share Improve this question asked Mar 4 at 1:50 Giulio ManuzziGiulio Manuzzi 111 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 0

Based on the public documentation for using a custom containers, If your use case requires libraries that aren't included in the prebuilt containers, or maybe you have custom data transformations you want to perform as part of the prediction request, you can use a custom container that you build and push to the Artifact Registry. While custom containers allow for greater customization, the container must run an HTTP server. Specifically, the container must listen and respond to liveness checks, health checks, and prediction requests. In most cases, using a prebuilt container if possible is the recommended and simpler option. For an example of using a custom container, see the notebook PyTorch Image Classification Single GPU using Vertex Training with Custom Container.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745064630a4609180.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信