Looking Forward
We are inspired by how the Chatbot Arena has rapidly accelerated research on real-world applications of language models in conversational systems. As we look ahead, we aim to similarly focus the development of speech-enabled language models on user needs, rather than limiting innovation to what current benchmarks can measure.
-
Incorporating Human Preferences in Speech Data We don't currently store any data other than votes right now, but long-term we want to work with the community to build frameworks for consensual data sharing. Speech data requires special care since it can inherently identify individuals or even to train models to mimic their voices. We would love for data from Talk Arena to help directly improve open-source and academic Speech models, but clear consent processes and careful data handling are pre-requisites to make this possible in a way that is both useful and ethical.
-
Managing Free-Form Conversational Dynamics Speech conversations flow differently than text chats - they are more dynamic and less strictly turn-based. These are what make speech compelling for users, but they present challenges for Arena-style evaluation. As more conversational speech systems are released, we are looking at how to assess these natural speech interactions effectively.
-
Developing Robust Static Benchmarks While interactive feedback from users is invaluable, we also recognize that it is often too slow to be used to measure intermediate progress for model developers. Using our qualitative insights from paid participants, as well as looking at general correlations with public ratings, we are hopeful that insights from Talk Arena can be used to design static evaluations that are better aligned with user preferences to provide more rapid and inexpensive feedback.
Collaboration
We are open to collaboration in many ways! If you are interested in contributing to this project, please feel free to contact us at [email protected], [email protected], [email protected]