I'm encountering an issue when deploying my custom container on Vertex AI. Locally my Flask server (running via Gunicorn) works perfectly—both the /predict
and /health
endpoints respond as expected. However, when Vertex AI calls the prediction API, I always receive a 405 Method Not Allowed error.
My Setup
- Container: I use a custom Docker container that exposes port 8080.
- Model Upload: I upload my model to Vertex AI with the following flags:
--container-predict-route=/predict
--container-health-route=/health
- Prediction Call: I call the prediction API using the Google Cloud AI Platform client library.
Observations
- Vertex AI PredictionService sends requests to a URL like:
/v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict
but my server returns 405. - If I perform a GET request to the endpoint (for example, via terminal), I receive a valid response
However, when calling/predict
(or even/rawPredict
as described in Vertex AI rawPredict docs), I still get a 405. - The server is running (i think) since i receive a log every 10 seconds like:
<GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK>
. - I've added multiple route definitions (including catch-all routes) to handle URLs such as
/v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict
, but the error persists.
Below is my code:
Dockerfile:
FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
curl \
python3-dev \
python3-pip \
python3-setuptools && \
rm -rf /var/lib/apt/lists/*
RUN ln -sf /usr/bin/python3 /usr/bin/python
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir torch>=1.12.0 torchvision>=0.13.0 && \
if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
COPY . .
EXPOSE 8080
CMD ["gunicorn", "-w", "1", "-b", "0.0.0.0:8080", "main:app"]
Function to call Vertex AI API (call_vertex_ai):
def call_vertex_ai(gcs_uri: str, additional_args: dict):
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis"}
client = aiplatform.gapic.PredictionServiceClient(
client_options=client_options)
instance = predict.instance.ImageClassificationPredictionInstance(
content=gcs_uri # GCS path for image file
).to_value()
instances = [instance]
parameters = predict.params.ImageClassificationPredictionParams(
confidence_threshold=additional_args.get("threshold", 0.5),
).to_value()
endpoint = client.endpoint_path(
project=PROJECT_ID, location=REGION, endpoint=ENDPOINT_ID
)
response = client.predict(
endpoint=endpoint, instances=instances, parameters=parameters)
return response.predictions
main.py (Vertex AI prediction server):
... some imports ...
app = Flask(__name__)
def load_model():
...
load_model()
def handle_predict(): ... code ...
detections = [{
"bbox": bbox.tolist() if isinstance(bbox, np.ndarray) else bbox,
"class": class_name,
"score": float(score),
} for bbox, class_name, score in zip(draw_boxes, pred_classes, scores)]
return jsonify({"predictions": detections})
@app.post("/predict")
def predict():
return handle_predict()
@app.route("/health", methods=["GET"])
def health():
return jsonify({"status": "healthy"})
@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<path:deployed_model_path>", methods=["POST"])
def predict_deployed_model(endpoint_id, deployed_model_path):
if not deployed_model_path.endswith(":predict"):
return "Not Found", 404
return handle_predict()
@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:predict", methods=["POST"])
def predict_deployed_model_direct(endpoint_id, deployed_model_id):
return handle_predict()
@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:rawPredict", methods=["POST"])
def raw_predict_deployed_model(endpoint_id, deployed_model_id):
return handle_predict()
@app.before_request
def log_request_info():
logger.info(f"Received request: {request.method} {request.url}")
logger.info(f"Headers: {dict(request.headers)}")
logger.info(f"Body: {request.get_data().decode('utf-8')}")
deploy.sh (the code is semplified):
gcloud builds submit \
--tag "${IMAGE_NAME}:latest" \
--gcs-source-staging-dir="gs://$BUCKET_NAME/source" \
--gcs-log-dir="gs://$BUCKET_NAME/logs"
LATEST_IMAGE="${IMAGE_NAME}:latest"
gcloud ai models upload \
--region="${REGION}" \
--display-name="weldpredict-model" \
--container-image-uri="${LATEST_IMAGE}" \
--container-ports=8080 \
--container-predict-route=/predict \
--container-health-route=/health
ENDPOINT_ID=$(gcloud ai endpoints list --region="${REGION}" --format="value(ENDPOINT_ID)")
DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe "${ENDPOINT_ID}" --region="${REGION}" --format="value(deployedModels.id)")
gcloud ai endpoints undeploy-model "${ENDPOINT_ID}" --deployed-model-id="${DEPLOYED_MODEL_ID}" --region="${REGION}" --quiet
gcloud ai endpoints deploy-model "${ENDPOINT_ID}" \
--model="${MODEL_ID}" \
--region="${REGION}" \
--display-name="weldpredict-deployment" \
--machine-type=n1-standard-4 \
--accelerator=type=nvidia-tesla-t4,count=1 \
--min-replica-count=1 \
--max-replica-count=1 \
--traffic-split=0=100
Issue Summary:
Problem: When calling Vertex AI predictions, I receive a 405 "Method Not Allowed" error.
Observation: Locally, my Flask server correctly handles
/predict
and/health
, and GET requests to the endpoint return a 200 OK. However, when I call/predict
(or/rawPredict
) on Vertex AI, I get a 405.Setup: I deploy my custom container on Vertex AI with the
--container-predict-route=/predict
flag, yet Vertex AI sends requests (e.g.,/v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict
) that are not matched by my routes.Attempts: I have added multiple route definitions—including catch-all routes—to handle URLs like
/v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict
but still encounter the 405 error.Additional Info:
A log every 10 seconds catches:
GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK
However, when calling
/predict
or/rawPredict
the server returns 405.
Request:
I need help understanding why Vertex AI's prediction requests are not being handled as expected by my Flask server and how to properly configure my container or routes to resolve the 405 error.
Any guidance or suggestions would be greatly appreciated.
Thanks in advance!
I'm encountering an issue when deploying my custom container on Vertex AI. Locally my Flask server (running via Gunicorn) works perfectly—both the /predict
and /health
endpoints respond as expected. However, when Vertex AI calls the prediction API, I always receive a 405 Method Not Allowed error.
My Setup
- Container: I use a custom Docker container that exposes port 8080.
- Model Upload: I upload my model to Vertex AI with the following flags:
--container-predict-route=/predict
--container-health-route=/health
- Prediction Call: I call the prediction API using the Google Cloud AI Platform client library.
Observations
- Vertex AI PredictionService sends requests to a URL like:
/v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict
but my server returns 405. - If I perform a GET request to the endpoint (for example, via terminal), I receive a valid response
However, when calling/predict
(or even/rawPredict
as described in Vertex AI rawPredict docs), I still get a 405. - The server is running (i think) since i receive a log every 10 seconds like:
<GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK>
. - I've added multiple route definitions (including catch-all routes) to handle URLs such as
/v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict
, but the error persists.
Below is my code:
Dockerfile:
FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
curl \
python3-dev \
python3-pip \
python3-setuptools && \
rm -rf /var/lib/apt/lists/*
RUN ln -sf /usr/bin/python3 /usr/bin/python
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir torch>=1.12.0 torchvision>=0.13.0 && \
if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
COPY . .
EXPOSE 8080
CMD ["gunicorn", "-w", "1", "-b", "0.0.0.0:8080", "main:app"]
Function to call Vertex AI API (call_vertex_ai):
def call_vertex_ai(gcs_uri: str, additional_args: dict):
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis"}
client = aiplatform.gapic.PredictionServiceClient(
client_options=client_options)
instance = predict.instance.ImageClassificationPredictionInstance(
content=gcs_uri # GCS path for image file
).to_value()
instances = [instance]
parameters = predict.params.ImageClassificationPredictionParams(
confidence_threshold=additional_args.get("threshold", 0.5),
).to_value()
endpoint = client.endpoint_path(
project=PROJECT_ID, location=REGION, endpoint=ENDPOINT_ID
)
response = client.predict(
endpoint=endpoint, instances=instances, parameters=parameters)
return response.predictions
main.py (Vertex AI prediction server):
... some imports ...
app = Flask(__name__)
def load_model():
...
load_model()
def handle_predict(): ... code ...
detections = [{
"bbox": bbox.tolist() if isinstance(bbox, np.ndarray) else bbox,
"class": class_name,
"score": float(score),
} for bbox, class_name, score in zip(draw_boxes, pred_classes, scores)]
return jsonify({"predictions": detections})
@app.post("/predict")
def predict():
return handle_predict()
@app.route("/health", methods=["GET"])
def health():
return jsonify({"status": "healthy"})
@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<path:deployed_model_path>", methods=["POST"])
def predict_deployed_model(endpoint_id, deployed_model_path):
if not deployed_model_path.endswith(":predict"):
return "Not Found", 404
return handle_predict()
@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:predict", methods=["POST"])
def predict_deployed_model_direct(endpoint_id, deployed_model_id):
return handle_predict()
@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:rawPredict", methods=["POST"])
def raw_predict_deployed_model(endpoint_id, deployed_model_id):
return handle_predict()
@app.before_request
def log_request_info():
logger.info(f"Received request: {request.method} {request.url}")
logger.info(f"Headers: {dict(request.headers)}")
logger.info(f"Body: {request.get_data().decode('utf-8')}")
deploy.sh (the code is semplified):
gcloud builds submit \
--tag "${IMAGE_NAME}:latest" \
--gcs-source-staging-dir="gs://$BUCKET_NAME/source" \
--gcs-log-dir="gs://$BUCKET_NAME/logs"
LATEST_IMAGE="${IMAGE_NAME}:latest"
gcloud ai models upload \
--region="${REGION}" \
--display-name="weldpredict-model" \
--container-image-uri="${LATEST_IMAGE}" \
--container-ports=8080 \
--container-predict-route=/predict \
--container-health-route=/health
ENDPOINT_ID=$(gcloud ai endpoints list --region="${REGION}" --format="value(ENDPOINT_ID)")
DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe "${ENDPOINT_ID}" --region="${REGION}" --format="value(deployedModels.id)")
gcloud ai endpoints undeploy-model "${ENDPOINT_ID}" --deployed-model-id="${DEPLOYED_MODEL_ID}" --region="${REGION}" --quiet
gcloud ai endpoints deploy-model "${ENDPOINT_ID}" \
--model="${MODEL_ID}" \
--region="${REGION}" \
--display-name="weldpredict-deployment" \
--machine-type=n1-standard-4 \
--accelerator=type=nvidia-tesla-t4,count=1 \
--min-replica-count=1 \
--max-replica-count=1 \
--traffic-split=0=100
Issue Summary:
Problem: When calling Vertex AI predictions, I receive a 405 "Method Not Allowed" error.
Observation: Locally, my Flask server correctly handles
/predict
and/health
, and GET requests to the endpoint return a 200 OK. However, when I call/predict
(or/rawPredict
) on Vertex AI, I get a 405.Setup: I deploy my custom container on Vertex AI with the
--container-predict-route=/predict
flag, yet Vertex AI sends requests (e.g.,/v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict
) that are not matched by my routes.Attempts: I have added multiple route definitions—including catch-all routes—to handle URLs like
/v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict
but still encounter the 405 error.Additional Info:
A log every 10 seconds catches:
GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK
However, when calling
/predict
or/rawPredict
the server returns 405.
Request:
I need help understanding why Vertex AI's prediction requests are not being handled as expected by my Flask server and how to properly configure my container or routes to resolve the 405 error.
Any guidance or suggestions would be greatly appreciated.
Thanks in advance!
Share Improve this question asked Mar 4 at 1:50 Giulio ManuzziGiulio Manuzzi 111 bronze badge1 Answer
Reset to default 0Based on the public documentation for using a custom containers, If your use case requires libraries that aren't included in the prebuilt containers, or maybe you have custom data transformations you want to perform as part of the prediction request, you can use a custom container that you build and push to the Artifact Registry. While custom containers allow for greater customization, the container must run an HTTP server. Specifically, the container must listen and respond to liveness checks, health checks, and prediction requests. In most cases, using a prebuilt container if possible is the recommended and simpler option. For an example of using a custom container, see the notebook PyTorch Image Classification Single GPU using Vertex Training with Custom Container.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745064630a4609180.html
评论列表(0条)