I’m looking at GPT-4o and see options for text and vision but no options for voice. Would that be up coming? Also, would there be an option for audio input and output like you guys demo earlier. Very exciting features. I love hiw you tackled latency issues.

6 Likes

Found this on another thread:

So it seems like we won’t be able to use audio for now…

1 Like

So bummed!!! I was looking for this, too. UGH! What a tease, OpenAI!

2 Likes

Yea, we need Audio in/out. Wondering what would happen if we send encoded Audio rather than image :thinking: since multi modal. We need audio out though to deal with latency problem.

5 Likes

According to the announcement, video and audio inputs will only be available to a small group of partners for now.

How can we get on thay small list :sweat_smile:?

3 Likes

I don’t really know, but I think by partners they mean Microsoft and so on.

Bummer. Talking to chatgpt is mildly interesting, but using the tech to enable my business is most interesting.

It’s not available yet.

gpt-4o is live in the API with support for text and vision modalities. Support for audio is coming in the following weeks to a small group of trusted partners.

3 Likes

And then for everyone I hope?

3 Likes

Buon giorno io non riesco ad inviare istruzioni vocali a chatgpt 4o

None of that was mentioned that I saw in the presentation. This outright deception is getting old. Every company does this every time. They make a promise or show you something and then say “it’s going to be released today” and then you read the fine print and it’s not actually true. Or it’s true to people who pay a lot of money. In other words what we got was an upgrade speed and price. Which is great. But not what was implied and who knows when we will get access to it. This is an example of how the rich get richer and the poor get poorer. All the advantages are given to those who can pay much much more than anyone else. AI is being refined at its current level. I don’t believe we will get much further than this anytime soon. This is not logarithmic growth or exponential growth as we’ve been promised now for two years. This is giving us crumbs left over by the giant fish in the pond. I’ve been using gpt4 since it came out. Probably about 200 hours over the course of almost 2 years. There’s nothing markedly different about gpt4 and gpt4 now. Yes it’s much better but these are refinements. Good ones. But it’s incremental. This is why I’ll never use gpt4 as my main api even though I’m at tier 4. This behavior is getting old by these fake product launches.

3 Likes

Great points. It kindda felt like that for sure. Whatever “a small group of trusted partners” mean. I hear you. Yea, we do the same try to keep our LLM layer flexible and use multiple providers. This latency thing for audio is a game changer though.

1 Like

Is there any possibility to become a trusted Partner ?
Thanks in advance

3 Likes

Can you release the API documentation for voice mode so we can start planning for how to integrate it?

5 Likes

I’m building my ios app, and all I need is that voice api. Please release it soon.

2 Likes

We are a heavy GPT user. How can we become A Trusted Partner?

If you have to ask here, never. Sorry. But on that situation you aren’t important enough to OpenAI.

Is it correct that for now to go from speech2speech we need to run speech2text transcription → chat completion → text2speech synthesis?

I thought they just announced it was available for everyone?.. I wish they would tell the truth in these hyping, back-patting videos.