Sam Altman-run OpenAI on Monday introduced it’s rolling out new voice and picture capabilities in ChatGPT that may now assist the AI chatbot see, hear and converse. These capabilities provide a brand new, extra intuitive kind of interface by permitting you to have a voice dialog or present ChatGPT what you’re speaking about, the corporate stated in a press release. The corporate stated it’s rolling out voice and pictures in ChatGPT to Plus and Enterprise customers over the subsequent two weeks.
“Voice mode and imaginative and prescient for chatGPT! actually value a strive,” Altman posted on X. “Voice is approaching iOS and Android (opt-in in your settings) and pictures can be obtainable on all platforms,” stated the Microsoft-backed firm. The brand new voice functionality is powered by a brand new text-to-speech mannequin, able to producing human-like audio from simply textual content and some seconds of pattern speech. “We collaborated with skilled voice actors to create every of the voices. We additionally use Whisper, our open-source speech recognition system, to transcribe your spoken phrases into textual content,” stated OpenAI.
Picture understanding is powered by multimodal GPT-3.5 and GPT-4. These fashions apply their language reasoning abilities to a variety of photographs, resembling pictures, screenshots, and paperwork containing each textual content and pictures. The brand new voice expertise opens doorways to many artistic and accessibility-focused functions. Nevertheless, “these capabilities additionally current new dangers, such because the potential for malicious actors to impersonate public figures or commit fraud,” the corporate famous.
“Because of this we’re utilizing this expertise to energy a selected use case — voice chat. Voice chat was created with voice actors we now have instantly labored with,” it added.
Spotify is utilizing the ability of this expertise for the pilot of their Voice Translation function, which helps podcasters increase the attain of their storytelling by translating podcasts into extra languages within the podcasters’ personal voices. “We’ve additionally taken technical measures to considerably restrict ChatGPT’s potential to analyse and make direct statements about folks since ChatGPT shouldn’t be all the time correct and these programs ought to respect people’ privateness,” stated the corporate.
— Written with inputs from IANS