ElevenLabs, a startup that provides AI voice cloning and a text-to-speech API, launched the ability to build conversational AI bots on Monday.
The company announced that users can now build complete conversational agents on ElevenLabs’ developer platform, with customizable variables such as tone of voice and response length.
ElevenLabs has mostly worked on providing different voices and AI tools for text-to-speech services. The company’s head of growth, Sam Sklar, told TechCrunch that many of its clients were already using this ability to create conversational AI agents. However, the toughest parts were integrating the knowledge base and handling interruptions from customers. That’s why the company decided to build a full pipeline for conversational bots.
Users can log into their ElevenLabs account and start building a conversation agent by selecting a template or creating a new project. They can choose the agent’s primary language, first message, and system prompt to determine the agent’s persona. Developers also have to select a large language model (Gemini, GPT, or Claude), the temperature of responses (to determine how creative the response should be), and token usage limit.
They can also tune other aspects like voice, latency, stability, authentication criteria, and maximum length of conversation with the AI agent.
Users can add their own knowledge base, like a file, URL, or text block, to power the conversational bot. Plus, they can = integrate their own custom LLM with the bot. ElevenLabs’ SDK is compatible with Python, JavaScript, React, and Swift. The company also offers a WebSocket API for more customization.
Companies can also define criteria to collect certain data items — for instance, name and email of customers speaking to the agent — along with evaluation criteria in natural language to define the success or failure of the call.
ElevenLabs is leverage its existing pipeline for the text-to-speech part. The company has to develop speech-to-text capabilities for the new conversational AI product. The company is not offering its speech-to-text API as a stand-alone product as of now, but it might do that in the future, making it a competitor to Google’s, Microsoft’s, and Amazon’s speech-to-text APIs, as well as specialized APIs, such as OpenAI’s Whisper, AssemblyAI, Deepgram, Speechmatics and Gladia.
The company, which is aiming to raise new funding at a valuation north of $3 billion, also competes with other voice AI startups, such as Vapi and Retell — they are also building conversational agents. More notably, the company will rival OpenAI’s real-time conversational API. However, ElevenLabs believes that its customizations and ability to switch models will give it an edge over OpenAI.