Speech technologies

Automatic Speech Recognition

Speech recognition technology is widely used – from text document processing, legal or medical report dictation to automated creation of hidden captions (for the Czech Television) or other applications. Research outcomes may be also used by the state security services, such as the police, military and rescue forces and their crisis management. In the field of security, the research is focused on detection of keywords and identification or verification of a speaker.

Text-To-Speech (speech synthesis)

Automatic text-to-speech technology provides many practical applications: it is employed, for example, in text-to-speech voice announcement systems, applications designed to read e-mails or text documents, in automated dialogues as well as in intelligent assistant applications.

Dialogue Systems

Dialogue systems are used for human-machine voice communication. They are used in many ways that include, for example, automated services of contact centres using interactive voice response or access to information through telephone lines.

Voice Biometrics

Identification or verification of a speaker is possible based on unique characteristics of individual voices. It can be used for automatic identity verification or as an additional feature combined with an access password.

Automated Video Archive Processing

We are able to search in large audiovisual archives and we can provide their automated transcription. This technology also makes it possible to search geographical names, or names of persons or products that had not been entered in a dictionary previously. The database search is very quick and the results are available as soon as you enter an enquiry.

Speech Technologies for Public Health

The goal of this research is to improve life conditions of people with disabilities. Apart from voice synthesis and speech recognition it also includes automatic translations into/from the sign language. Automated voice dialogue with a computer can be used to facilitate faster and more complex integration of disabled people into the society and help improve the efficiency of medical staff. The target group consists of people with sensory (vision, hearing and voice) as well as physical disabilities, hospitals and medical facilities.

Audio Visual Speech Recognition and Synthesis

These research activities are focused on spontaneous speech and multimodal recording processing, i.e. a combination of acoustic data and images. The actual synthesis combines automatic speech generation and individual faces (head and lip movements etc.) into a complete avatar.