I'm trying to use the Phi-4-multimodal-instruct model with audio input through the Azure.AI.Inference C# client, but I'm getting an "invalid input error" when sending an MP3 file. The same error occurs with both GitHub and Azure endpoints.
Error Message
Azure.RequestFailedException: invalid input error
Status: 422 (Unprocessable Entity)
ErrorCode: Invalid input
Content:
{
"error": {
"code": "Invalid input",
"status": 422,
"message": "invalid input error",
"details": [
{
"type": "string_type",
"loc": [
"body",
"messages",
0,
"content",
"str"
],
"msg": "Input should be a valid string",
"input": [
{
"type": "text",
"text": "Based on the attached audio, generate a comprehensive text transcription of the spoken content."
},
{
"type": "input_audio",
"input_audio": {
"data": "...",
"format": "mp3"
}
}
]
},
{
"type": "missing",
"loc": [
"body",
"messages",
0,
"content",
"list[function-after[validate_content_part(), ContentPart]]",
1,
"input_audio",
"url"
],
"msg": "Field required",
"input": {
"data": "...",
"format": "mp3"
}
}
]
}
}
Code
using Azure;
using Azure.AI.Inference;
// Azure endpoint configuration
var endpoint = new Uri("https://###.services.ai.azure/models");
var credential = new AzureKeyCredential("###");
var model = "Phi-4-multimodal-instruct";
var client = new ChatCompletionsClient(
endpoint,
credential,
new AzureAIInferenceClientOptions());
ChatMessageContentItem[] userContent =
{
new ChatMessageAudioContentItem(audioFilePath: "sample.mp3", AudioContentFormat.Mp3)
};
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("Based on the attached audio, generate a comprehensive text transcription of the spoken content."),
new ChatRequestUserMessage(userContent),
},
Model = model,
Temperature = 1,
MaxTokens = 1000,
};
Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Content);
What I've Tried
- Tried with both GitHub and Azure endpoints with identical results
- Verified the MP3 is valid
Questions
- Is Phi-4-multimodal-instruct supposed to support audio input via the C# client?
- Is there a different way to format audio input for this specific model?
- Are there any known limitations or requirements for audio files with this model?
Any help would be greatly appreciated!
I'm trying to use the Phi-4-multimodal-instruct model with audio input through the Azure.AI.Inference C# client, but I'm getting an "invalid input error" when sending an MP3 file. The same error occurs with both GitHub and Azure endpoints.
Error Message
Azure.RequestFailedException: invalid input error
Status: 422 (Unprocessable Entity)
ErrorCode: Invalid input
Content:
{
"error": {
"code": "Invalid input",
"status": 422,
"message": "invalid input error",
"details": [
{
"type": "string_type",
"loc": [
"body",
"messages",
0,
"content",
"str"
],
"msg": "Input should be a valid string",
"input": [
{
"type": "text",
"text": "Based on the attached audio, generate a comprehensive text transcription of the spoken content."
},
{
"type": "input_audio",
"input_audio": {
"data": "...",
"format": "mp3"
}
}
]
},
{
"type": "missing",
"loc": [
"body",
"messages",
0,
"content",
"list[function-after[validate_content_part(), ContentPart]]",
1,
"input_audio",
"url"
],
"msg": "Field required",
"input": {
"data": "...",
"format": "mp3"
}
}
]
}
}
Code
using Azure;
using Azure.AI.Inference;
// Azure endpoint configuration
var endpoint = new Uri("https://###.services.ai.azure/models");
var credential = new AzureKeyCredential("###");
var model = "Phi-4-multimodal-instruct";
var client = new ChatCompletionsClient(
endpoint,
credential,
new AzureAIInferenceClientOptions());
ChatMessageContentItem[] userContent =
{
new ChatMessageAudioContentItem(audioFilePath: "sample.mp3", AudioContentFormat.Mp3)
};
var requestOptions = new ChatCompletionsOptions()
{
Messages =
{
new ChatRequestSystemMessage("Based on the attached audio, generate a comprehensive text transcription of the spoken content."),
new ChatRequestUserMessage(userContent),
},
Model = model,
Temperature = 1,
MaxTokens = 1000,
};
Response<ChatCompletions> response = client.Complete(requestOptions);
System.Console.WriteLine(response.Value.Content);
What I've Tried
- Tried with both GitHub and Azure endpoints with identical results
- Verified the MP3 is valid
Questions
- Is Phi-4-multimodal-instruct supposed to support audio input via the C# client?
- Is there a different way to format audio input for this specific model?
- Are there any known limitations or requirements for audio files with this model?
Any help would be greatly appreciated!
Share Improve this question edited Mar 10 at 19:24 hdev asked Mar 7 at 20:43 hdevhdev 6,5471 gold badge54 silver badges71 bronze badges 3- Does code work with a different type of audio file? Issue could be with audio codec driver installed on machine not recognizing the mp3 file format. Some apps use the audio driver on machine and other talk directly with the audio hardware. I do not know which method your code is using. The are two different types of drivers 1) Manufacturers 2) Microsoft Generic Drivers. Often issues like this are with the Microsoft Generic driver and using the vendors drive solves issue. Issue could be connection closed. Or CHAT is expecting ascii data and you are sending binary data – jdweng Commented Mar 8 at 10:52
- @jdweng Thank you for your suggestions, but unfortunately they don't address the actual issue: I also tested with WAV format, but received the same error. The NuGet package only encodes audio files to Base64 internally, so suggestions regarding audio codecs or hardware drivers aren't relevant here. Given that the NuGet package, infrastructure, and LLM are all provided by Microsoft, compatibility between these components was naturally expected. – hdev Commented Mar 9 at 8:24
- Did you try to use the embedding client instead of the chat client? See following for troubleshooting : github/MicrosoftDocs/azure-docs-sdk-python/blob/main/… – jdweng Commented Mar 9 at 12:36
1 Answer
Reset to default 0The fix was to provide the audio as a URL rather than inline binary data. The API expects a publicly accessible URI for the audio file, which is why the inline MP3 file caused a 422 error.
Solution:
Upload your MP3 file to a web-accessible location.
Replace the file path constructor with a URI-based constructor:
// Old, problematic code: // new ChatMessageAudioContentItem(audioFilePath: "sample.mp3", AudioContentFormat.Mp3) // Updated, working code: new ChatMessageAudioContentItem(new Uri("https://example/input.mp3"))
This change ensures the model receives the audio in the expected format.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744910380a4600513.html
评论列表(0条)