Phi-4-multimodal-instruct model rejects audio input with "invalid input error" in C# Azure.AI.Inference client

admin•2025-04-19 23:13:54•questions•阅读3

I'm trying to use the Phi-4-multimodal-instruct model with audio input through the Azure.AI.Infere

I'm trying to use the Phi-4-multimodal-instruct model with audio input through the Azure.AI.Inference C# client, but I'm getting an "invalid input error" when sending an MP3 file. The same error occurs with both GitHub and Azure endpoints.

Error Message

Azure.RequestFailedException: invalid input error
Status: 422 (Unprocessable Entity)
ErrorCode: Invalid input

Content:
{
    "error": {
        "code": "Invalid input",
        "status": 422,
        "message": "invalid input error",
        "details": [
            {
                "type": "string_type",
                "loc": [
                    "body",
                    "messages",
                    0,
                    "content",
                    "str"
                ],
                "msg": "Input should be a valid string",
                "input": [
                    {
                        "type": "text",
                        "text": "Based on the attached audio, generate a comprehensive text transcription of the spoken content."
                    },
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "...",
                            "format": "mp3"
                        }
                    }
                ]
            },
            {
                "type": "missing",
                "loc": [
                    "body",
                    "messages",
                    0,
                    "content",
                    "list[function-after[validate_content_part(), ContentPart]]",
                    1,
                    "input_audio",
                    "url"
                ],
                "msg": "Field required",
                "input": {
                    "data": "...",
                    "format": "mp3"
                }
            }
        ]
    }
}

Code

using Azure;
using Azure.AI.Inference;

// Azure endpoint configuration
var endpoint = new Uri("https://###.services.ai.azure/models");
var credential = new AzureKeyCredential("###");

var model = "Phi-4-multimodal-instruct";

var client = new ChatCompletionsClient(
    endpoint,
    credential,
    new AzureAIInferenceClientOptions());

ChatMessageContentItem[] userContent = 
{
    new ChatMessageAudioContentItem(audioFilePath: "sample.mp3", AudioContentFormat.Mp3)
};

var requestOptions = new ChatCompletionsOptions()
{
    Messages =
    {
        new ChatRequestSystemMessage("Based on the attached audio, generate a comprehensive text transcription of the spoken content."),
        new ChatRequestUserMessage(userContent),
    },
    Model = model,
    Temperature = 1,
    MaxTokens = 1000,
};

Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Content);

What I've Tried

Tried with both GitHub and Azure endpoints with identical results
Verified the MP3 is valid

Questions

Is Phi-4-multimodal-instruct supposed to support audio input via the C# client?
Is there a different way to format audio input for this specific model?
Are there any known limitations or requirements for audio files with this model?

Any help would be greatly appreciated!

Error Message

Azure.RequestFailedException: invalid input error
Status: 422 (Unprocessable Entity)
ErrorCode: Invalid input

Content:
{
    "error": {
        "code": "Invalid input",
        "status": 422,
        "message": "invalid input error",
        "details": [
            {
                "type": "string_type",
                "loc": [
                    "body",
                    "messages",
                    0,
                    "content",
                    "str"
                ],
                "msg": "Input should be a valid string",
                "input": [
                    {
                        "type": "text",
                        "text": "Based on the attached audio, generate a comprehensive text transcription of the spoken content."
                    },
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "...",
                            "format": "mp3"
                        }
                    }
                ]
            },
            {
                "type": "missing",
                "loc": [
                    "body",
                    "messages",
                    0,
                    "content",
                    "list[function-after[validate_content_part(), ContentPart]]",
                    1,
                    "input_audio",
                    "url"
                ],
                "msg": "Field required",
                "input": {
                    "data": "...",
                    "format": "mp3"
                }
            }
        ]
    }
}

Code

using Azure;
using Azure.AI.Inference;

// Azure endpoint configuration
var endpoint = new Uri("https://###.services.ai.azure/models");
var credential = new AzureKeyCredential("###");

var model = "Phi-4-multimodal-instruct";

var client = new ChatCompletionsClient(
    endpoint,
    credential,
    new AzureAIInferenceClientOptions());

ChatMessageContentItem[] userContent = 
{
    new ChatMessageAudioContentItem(audioFilePath: "sample.mp3", AudioContentFormat.Mp3)
};

var requestOptions = new ChatCompletionsOptions()
{
    Messages =
    {
        new ChatRequestSystemMessage("Based on the attached audio, generate a comprehensive text transcription of the spoken content."),
        new ChatRequestUserMessage(userContent),
    },
    Model = model,
    Temperature = 1,
    MaxTokens = 1000,
};

Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Content);

What I've Tried

Tried with both GitHub and Azure endpoints with identical results
Verified the MP3 is valid

Questions

Is Phi-4-multimodal-instruct supposed to support audio input via the C# client?
Is there a different way to format audio input for this specific model?
Are there any known limitations or requirements for audio files with this model?

Any help would be greatly appreciated!

Share Improve this question edited Mar 10 at 19:24 asked Mar 7 at 20:43 hdev 6,5471 gold badge54 silver badges71 bronze badges

Does code work with a different type of audio file? Issue could be with audio codec driver installed on machine not recognizing the mp3 file format. Some apps use the audio driver on machine and other talk directly with the audio hardware. I do not know which method your code is using. The are two different types of drivers 1) Manufacturers 2) Microsoft Generic Drivers. Often issues like this are with the Microsoft Generic driver and using the vendors drive solves issue. Issue could be connection closed. Or CHAT is expecting ascii data and you are sending binary data – jdweng Commented Mar 8 at 10:52
@jdweng Thank you for your suggestions, but unfortunately they don't address the actual issue: I also tested with WAV format, but received the same error. The NuGet package only encodes audio files to Base64 internally, so suggestions regarding audio codecs or hardware drivers aren't relevant here. Given that the NuGet package, infrastructure, and LLM are all provided by Microsoft, compatibility between these components was naturally expected. – hdev Commented Mar 9 at 8:24
Did you try to use the embedding client instead of the chat client? See following for troubleshooting : github/MicrosoftDocs/azure-docs-sdk-python/blob/main/… – jdweng Commented Mar 9 at 12:36

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

The fix was to provide the audio as a URL rather than inline binary data. The API expects a publicly accessible URI for the audio file, which is why the inline MP3 file caused a 422 error.

Solution:

Upload your MP3 file to a web-accessible location.

Replace the file path constructor with a URI-based constructor:

// Old, problematic code:
// new ChatMessageAudioContentItem(audioFilePath: "sample.mp3", AudioContentFormat.Mp3)

// Updated, working code:
new ChatMessageAudioContentItem(new Uri("https://example/input.mp3"))

This change ensures the model receives the audio in the expected format.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744910380a4600513.html

admin

questions
How to set parameter values to a "onClick" function using javascript or jquery - Stack Overflow
I am trying to set parameters to onClick event inside a html button using javascript.This is how html b
admin
23分钟前
10
questions
Python Version Management with Pyenv on MacBook Pro - Stack Overflow
I'm using pyenv to manage Python versions on my MacBook Pro. I have a question about how pyenv han
admin
22分钟前
30
questions
javascript - weird hidden character in date - Stack Overflow
I had an issue with IE parsing a date , took me two hours that the date I was actually parsing has hidd
admin
20分钟前
00
questions
javascript - DropZone.js Uploading multiples files using ASP.Net MVC - Stack Overflow
I'm using a javascript library DropZone.js but uploading multiples files is somehow not working wi
admin
16分钟前
10
questions
javascript - Reading data from NodeJs Serialport - Stack Overflow
I would like to be able to read data received by the ascii mand sent.Below is the code that sends mand
admin
16分钟前
10
questions
vuejs3 - Google Places API (New) Autocomplete not working - Stack Overflow
I am trying to setup a map using the Places Autocomplete service so I can type an address and plot it o
admin
15分钟前
10
questions
javascript - Highlight line in flot chart - Stack Overflow
Is it possible to highlight a line chart with flot? I only see highlighting of the datapoints but not t
admin
15分钟前
10
questions
plugins - Standard Fail2Ban vs. WP Fail2ban vs. WP Fail2Ban Redux
I am working on revamping my wordpress site security, hosted via cloudways. Cloudways is generally great. I was working
admin
15分钟前
30
questions
Maui Community Toolkit MediaElement full screen opening behind popup - Stack Overflow
When opening full screen of a CommunityToolkit MediaElement video playing in a CommunityToolkit popup t
admin
13分钟前
10
questions
Loop with custom posts, to include information from different custom post type
I currently have a loop to display custom posts for courses. Now I also have a custom post type for universities. Both h
admin
12分钟前
30
questions
mvvm - how to handle keyboard event in maui with CommunityToolkit - Stack Overflow
I am coding an app in window and mac platform.I have finished code like this:<Button x:Name="F
admin
9分钟前
10
questions
javascript - How to JSON.stringify and JSON.parse without getting an empty object? - Stack Overflow
the reason I am asking this question is because I want to use LocalStorage for my objects. And as you m
admin
8分钟前
10
questions
amazon web services - Log insights ql to Opensearch - Stack Overflow
I have the following log insights sqlfields jointStates.jointVariables.posSi.0 as pos0, jointStatesRef
admin
8分钟前
10
questions
Javascript - concatenate field value to onclick button link? - Stack Overflow
I have an issue I need to fix on an existing app that I didn't initially write. Here is a snippet
admin
8分钟前
10
questions
jenkins pipeline - In Jenkinsfile, how do I set environment variables from parsed YAML file? - Stack Overflow
My Jenkinsfile looks like this:pipeline {agent anyparameters {choice( name: 'project_short_code&#
admin
7分钟前
10
questions
visual studio code - VSCode extension "Open Disassembly View" not showing up - Stack Overflow
I'm trying to get the disassembly view working on a VSCode debug adapter, but it's not workin
admin
6分钟前
00
questions
if statement - How to correctly chain OR and AND conditions in Javascript? - Stack Overflow
I am trying to write if (x !== "One" || x !== "Two" || x !== "Three") {x
admin
5分钟前
10
questions
javascript - Adding a New Element to an existing JSON Object - Stack Overflow
i'm trying to add a json object to an existing file in node js: When a members sign up, I want his
admin
4分钟前
10
questions
javascript - slideToggle jQuery Speed - Stack Overflow
<div class='sideBar'><div><ul><li><a href='javascript:void(0)
admin
46秒前
00
questions
android - How can I observe for date or timezone changes to update the UI using MVVM, UseCase and Hilt? - Stack Overflow
I have a interface called TimeZoneMonitor to monitorize Time Zone or Date changes.I also have a UseCas
admin
40秒前
00

发表回复

评论列表（0条）

暂无评论

Phi-4-multimodal-instruct model rejects audio input with "invalid input error" in C# Azure.AI.Inference client

Error Message

Code

What I've Tried

Questions

Error Message

Code

What I've Tried

Questions

1 Answer 1

发表回复

评论列表（0条）

联系我们

400-800-8888

Phi-4-multimodal-instruct model rejects audio input with &quot;invalid input error&quot; in C# Azure.AI.Inference client

Error Message

Code

What I've Tried

Questions

Error Message

Code

What I've Tried

Questions

1 Answer 1

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888

Phi-4-multimodal-instruct model rejects audio input with "invalid input error" in C# Azure.AI.Inference client