Technology

Meta unveils SeamlessM4T: An AI model for translation and transcription

SeamlessM4T is capable of handling a diverse range of language-related tasks, such as converting speech to text, speech to speech, text to speech, and text to text translations.

Meta has introduced their newest tech tool, the SeamlessM4T AI model, said to transform the world of multilingual translation and transcription.

With the capacity to redefine cross-language communication, this multilingual multimodal AI model has the power to revolutionise how we interact, as it is likely to provide flawless translation and transcription functions.

According to Meta, the single model is capable of managing a wide variety of language-related assignments, such as converting speech to text, speech to speech, text to speech, and text to text translations.

The model also excels in speech recognition across nearly 100 languages, and its speech-to-text translation feature accommodates nearly 100 input and output languages. The capabilities extend even further to speech-to-speech translation, enabling interactions between nearly 100 input languages and 36 output languages.

Moreover, the model also covers text-to-text translation for almost 100 languages.

Other than this, SeamlessM4T transforms written text into spoken language, spanning close to 100 input languages and 35 output languages. This enhancement broadens communication horizons for users worldwide.

To enhance the model's functionalities, Meta has made the metadata of the SeamlessAlign dataset available. This dataset holds the distinction of being the most extensive open multimodal translation dataset up to this point.

It encompasses 270,000 hours of curated speech and text alignments. This dataset not only forms the foundation of the SeamlessM4T development but also serves as a testament to Meta's dedication to progressing the realm of AI-powered translation and transcription.

The introduction of SeamlessM4T comes after several notable Meta initiatives. The launch of the No Language Left Behind (NLLB) model last year marked a milestone, enabling text-to-text machine translation for an extensive array of 200 languages. The integration of this model into Wikipedia as a translation provider further affirmed its role in connecting languages and promoting accessibility.

Meta also introduced Universal Speech Translator, an effort to safeguard and elevate minority languages. This technology enabled direct speech-to-speech translation for Hokkien, a language without a widely established writing system.

Earlier this year, Massively Multilingual Speech technology was unveiled. This tool brought forth capabilities in speech recognition, language identification, and speech synthesis, spanning for more than 1,100 languages.

Read full story