From text to voice: the evolution of AI interaction with Velvet Speech 2B

Artificial Intelligence

19 February 2026

Today, most GenAI interactions happen through written text. Individuals and professionals type requests asking AI to translate content, draft documents, or summarize information, and they receive text-based responses in return.

This interaction model works well in many business contexts. However, there are numerous situations where AI can support processes in different ways. In these environments, communication primarily happens through voice—either between people or between a person and an automated system. Examples include conferences, corporate meetings, digital communication channels, customer care activities, and especially sensitive fields such as healthcare.

In these scenarios, voice can make a real difference. The ability to interact with an AI system that understands spoken commands allows users to retrieve and organize information, ask questions, and launch analyses, translations, or transcriptions without interrupting their ongoing work.

It is no coincidence that the latest AI models are evolving toward multimodal interaction—combining written input with other forms of communication, particularly voice.

Velvet Speech 2B was created to meet this need. It extends the Velvet family of large language models (LLMs) with a new voice-based interaction capability designed for dynamic, real-world professional environments.

Velvet Speech 2B: interacting with AI through voice

Velvet Speech 2B is the first multimodal model in the Velvet family. Compact and versatile, it is built for dynamic interactions and can process and understand spoken language. Users can submit requests either in writing or through voice input, while responses are delivered in text format.

This innovation builds on established expertise: Almawave has been developing speech and voice recognition technologies in its research labs for years. These capabilities now enhance the evolution of its language models.

From a technical standpoint, Speech 2B retains the strengths of Velvet 2B while adding advanced voice-related features, including automatic speech recognition and spoken queries and question answering.

The model supports both Italian and English, even within mixed-language conversations. It also integrates speech emotion recognition capabilities, enabling it to analyze tone and vocal patterns to better understand context.

Below are the key features that distinguish Velvet Speech 2B.

Automatic speech recognition

Automatic speech recognition enables the model to listen to recordings or live conversations and convert them into written text. This capability is particularly useful when spoken exchanges—such as meetings, public hearings, or interviews—need to be quickly transformed into structured documents.

Spoken queries and question answering

Users can ask questions verbally—for example, “Show me all open cases from the last 30 days”—and the system processes the request exactly as it would a written command, returning a clear and structured response.

Seamless interaction between voice and text

Whether a request is typed or spoken, the system interprets it consistently and delivers coherent responses. There are not two separate systems for voice and text; the experience remains uniform regardless of the input method.

Bilingual support (Italian and English)

Speech 2B can understand and transcribe both Italian and English, even when the two languages alternate within the same conversation. This makes it particularly suitable for institutional and corporate settings where multilingual exchanges are common, ensuring accuracy within a single, unified information-processing workflow.

Speech emotion recognition

Beyond the literal meaning of words, the model analyzes vocal elements such as tone and rhythm to detect emotional signals. This feature is especially valuable in contexts where emotional nuance plays an important and sensitive role, such as doctor–patient interactions or public-facing customer care services.

Compact and versatile design

One of Velvet Speech 2B’s most distinctive characteristics is its compact architecture and internal optimization. It is a lightweight model that can be integrated into infrastructures with limited computing power, without requiring complex environments or external dependencies.

This makes it especially well-suited for organizations where data must remain on-premises—such as public administrations, healthcare institutions, or companies managing sensitive information—ensuring strong data governance and control.

From public administration to operational environments: putting voice to work

Velvet Speech 2B can be successfully deployed in a wide range of public and private settings. Its lightweight design makes it ideal for local infrastructures and edge devices. Combined with its focus on data protection and data quality, it is particularly promising in sectors where personal data management is critical.

These include public administration and healthcare, where sensitive information is handled daily. In such contexts, it is essential to maintain full control over where data resides and who has access to it.

With Velvet Speech 2B, organizations can enable voice interaction without changing their existing infrastructure. Spoken input is converted into text and managed according to established policies, without introducing additional layers of data exposure.

Here are some potential application scenarios.

Public administration: automatic transcription and summarization of public sessions

During city council meetings or public hearings, Velvet Speech 2B can transcribe public discussions and generate official minutes or summaries of key points. This leads to significant time savings, greater administrative efficiency, and immediate access to transparent documentation.

Healthcare: written summaries of doctor–patient consultations

In healthcare settings, physicians often have to manually enter information from consultations into digital systems. With Velvet Speech 2B, doctors can focus entirely on the patient while the model accurately records the conversation and produces summaries that support medical reporting.

Healthcare: structured pre-triage

Pre-triage typically involves collecting basic patient information—an activity that AI models can effectively support. Patients answer guided questions verbally about symptoms, duration, and medical history. Velvet Speech 2B transcribes responses and can automatically generate a preliminary report to be validated by medical staff.

Field operations: regulatory consultation

Construction sites and other high-intensity operational environments are often noisy and require hands-free work. In these settings, voice-based document consultation becomes not just beneficial but essential.

Technicians can verify regulations and procedures by speaking directly to the system, accessing critical information quickly without relying on printed manuals.

Additional use cases

The potential applications of Velvet Speech 2B are limitless. It can streamline workflows, reduce wait times, enhance professional productivity, and minimize errors.

In citizen services or public customer care, for example, it can automatically transcribe calls and organize incoming requests. In corporate operational meetings or technical briefings, it can transform discussions into structured minutes and key takeaways.

Voice does not replace writing—it enhances it. With Velvet Speech 2B, AI expands its interaction capabilities and better adapts to real-world professional environments, where flexibility, speed, and data control are essential.

In a landscape where security, governance, and reliability are central priorities, integrating voice makes AI not only more powerful but also more aligned with the real needs of businesses and institutions.

Would you like to learn more about the Velvet family?

Visit our website

WE ARE

Highlights Almawave - January 2026

Almawave’s CEO receives the Italia Informa Award for her contribution to Italian Excellence

Central Government

Finance & Banking

Healthcare

Tourism

Municipality

Energy & Utilities

Infrastructure & Transportation

Telco & Media

Highlights

Velvet

AIWave

Generative AI

RAG

Velvet

NLQ

Group Platforms

AIWave

DataPortal.AI

D/AI Destinations

SWMS

SGMS

AIWave Cognitive Services

Omnichannel Exchange

Conversation

Speech & Voice

Discovery

Comprehension

AIWave AI Applications

Case Automation

Conversation Studio

Discovery Experience

Interaction Analytics

Data & GIS

Data & GIS

The Data Appeal Company

Sistemi Territoriali

Trusted Knowledge

Trusted Knowledge

OBDA Systems

Velvet

AIWave

RAG

Velvet

NLQ

AIWave

DataPortal.AI

D/AI Destinations

SWMS

SGMS

Omnichannel Exchange

Conversation

Speech & Voice

Discovery

Comprehension

Case Automation

Conversation Studio

Discovery Experience

Interaction Analytics

Data & GIS

The Data Appeal Company

Sistemi Territoriali

Trusted Knowledge

OBDA Systems

Highlights Almawave - January 2026

Almawave’s CEO receives the Italia Informa Award for her contribution to Italian Excellence

IBM and Almawave: Technology Agreement to Accelerate the Adoption of AI and Data Governance in Italian Enterprises

From text to voice: the evolution of AI interaction with Velvet Speech 2B

Velvet Speech 2B: interacting with AI through voice

Automatic speech recognition

Spoken queries and question answering

Seamless interaction between voice and text

Bilingual support (Italian and English)

Speech emotion recognition

Compact and versatile design

From public administration to operational environments: putting voice to work

Public administration: automatic transcription and summarization of public sessions

Healthcare: written summaries of doctor–patient consultations