reactjs - Couldn't sync the audio and text in openai-realtime-beta

I am using OpenAI’s real-time API (gpt-4o-realtime-preview-2024-12-17) in a React-based application for live transcription and response generation. However, I am facing an issue where the transcribed text and the generated speech output do not align properly. Sometimes the text appears earlier than expected, or the audio plays with a delay.

Implementation Details:

The application uses WebSockets to stream real-time audio to OpenAI.
I am using the RealtimeClient from OpenAI's API to send and receive live audio responses.
The WavRecorder and WavStreamPlayer are used to handle audio streaming and playback, since the audio is in 16bitPCM format
The text responses are updated dynamically as they arrive via the API.

this is the code for connecting the api

const connectConversation = useCallback(async () => {
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
const wavStreamPlayer = wavStreamPlayerRef.current;

await wavRecorder.begin();
await wavStreamPlayer.connect();

try {
    const response = await client.connect();
    if (response) {
        setLoading(false);
        client.sendUserMessageContent([{ type: "input_text", text: "Hello!" }]);
        
        if (client.getTurnDetectionType() === "server_vad") {
            await wavRecorder.record((data) => client.appendInputAudio(data.mono));
        }
    }
} catch (error) {
    console.error("Error connecting:", error);
}
}, []);

this is the code for getting the response

client.on("conversation.updated", async ({ item, delta }) => {
if (item.role === "assistant" && delta?.audio) {
    wavStreamPlayer.add16BitPCM(delta.audio, item.id);
    textRef.current = item.formatted.transcript; // Text updates immediately
} else if (delta?.text) {
    textRef.current = item.formatted.transcript;
}

if (item.status === "completed" && item.formatted.audio?.length) {
    const wavFile = await WavRecorder.decode(item.formatted.audio, 24000, 24000);
    setAudiosrc(wavFile.url);
}
});

Problem observed

Couldn't scroll the text with sync to the audio

scrolling login based on duration as 150 words per minute

const scrollText = () => {
  if (!scrollContainerRef.current) return;

  const container = scrollContainerRef.current;
  const currentTime = Date.now();
  const elapsed = currentTime - scrollStartTimeRef.current;
  const duration = getScrollDuration(text);

  if (elapsed >= duration) {
    container.scrollTop = container.scrollHeight - container.clientHeight;
    return;
  }

  const progress = elapsed / duration;
  const targetScrollTop = container.scrollHeight - container.clientHeight;

  // Smooth easing function for better scrolling
  const easeInOutQuad = (t) =>
    t < 0.5 ? 2 * t * t : 1 - Math.pow(-2 * t + 2, 2) / 2;

  container.scrollTop = targetScrollTop * easeInOutQuad(progress);
  animationFrameRef.current = requestAnimationFrame(scrollText);
};

Approach taken

Converting 16-bit PCM into an audio source const wavFile = await WavRecorder.decode(item.formatted.audio, 24000, 24000); setAudiosrc(wavFile.url);
- However, conversion takes time depending on the length of the response, causing desynchronization.
Scrolling based on word count (150 WPM rule)

const wordsPerMinute = 150; const words = text.split(" ").length; return (words / wordsPerMinute) * 60 * 1000;

This works for short responses but fails for larger responses due to variation in speech speed.

Questions:

How can I accurately sync the text scroll with the real-time audio playback?
Are there any existing libraries or best practices for handling text-audio synchronization in real-time applications?

Any insights or suggestions would be greatly appreciated!

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744372851a4571040.html

reactjs - Couldn't sync the audio and text in openai-realtime-beta - Stack Overflow

发表回复

评论列表（0条）

联系我们

400-800-8888

reactjs - Couldn&#39;t sync the audio and text in openai-realtime-beta - Stack Overflow

相关推荐

reactjs - Couldn&#39;t sync the audio and text in openai-realtime-beta - Stack Overflow

发表回复

评论列表（0条）

联系我们

400-800-8888

reactjs - Couldn't sync the audio and text in openai-realtime-beta - Stack Overflow

reactjs - Couldn't sync the audio and text in openai-realtime-beta - Stack Overflow