Portal about sewerage and drainpipes / Plumbing / How to use speech synthesis. Speech synthesizers with Russian voices

How to use speech synthesis. Speech synthesizers with Russian voices

05.11.2021

Modern speech synthesis applications are significantly different in quality from their first analogues ten years ago. A striking example of this is the Balabolka program. This application is free, without any conditions or restrictions. It has such great capabilities that the creators chose to integrate a full-fledged help file into the program, with detailed description all functions.

Installation and configuration of the balabolka program.

The easiest way to get the program is directly from the developer's website. There you can also download additional necessary software. The installation process is simple - the application is copied to the selected directory, system folders are not used. The interface supports many languages, including Russian. But the pronunciation of the text will only be in English for now. To use Russian (or any other, even Ukrainian) language, you must additionally install a computer voice. There are many free and commercial voices available online. They are installed quite simply and quickly.

It may well be that you will also need to install the Microsoft Speech Api 4.0 package

The application is ready to work immediately after launch. If free Russian voices are installed, you must select one of them on the SAPI4 tab. Now you just need to type or paste text in the window and press the F5 key - the text fragment will begin to be read aloud. The cursor must be at the beginning of the text.

Features of the balabolka program

But voicing text is not the only purpose of the program. For example, you can use it to create audio books. The pronunciation of any text in the Balabolka program can be recorded in a sound file. The application supports the following formats: .wav, .mp3, .ogg, .wma, .mp4, .m4a, .m4b, .awb.

Thus, the text you need is easily converted into an audiobook.

By the way, the program allows you to automatically split one large audiobook file into several small ones, in accordance with the selected settings

Settings for saving audiobook files is far from the only option available to the user. In addition, you can set the volume, voice timbre, and speed of pronunciation. After installing additional (free) modules, spell checking is possible; the user can also correct the pronunciation by creating his own “dictionaries”.

Sometimes you need to voice a text that is written on a computer. But how to do that? It is necessary to use special software, which is called a speech synthesizer. With this utility you can turn written text into oral speech. There are a lot of desktop speech synthesizers on the World Wide Web. However, it is better to use online services. After all, in this case you won’t have to download software to your PC, thus clogging up the memory. In this article we will look at the best online talkers.

Speech synthesizers have a fairly wide range of applications. First of all, such programs will be useful to people with disabled. For example, speech synthesizers were originally intended for people who have vision problems and cannot read text from a monitor.

Talking books can be a good assistant in the learning process. For example, they can be used to listen to foreign speech and thus train perception. A speech synthesizer is also used to listen to books while doing household chores.

The best online talkers

Today, web talkers are in no way inferior to desktop programs in terms of playback quality. Internet utilities are capable of reading at different speeds, timbres, etc. Let's consider the most popular text-to-speech converters.

But first, it’s worth noting that most online speech synthesizers limit the possibility of free playback. Web utilities allow you to listen to a couple of hundred characters so that the user can evaluate the quality of the service. For full functionality you will have to pay a certain amount.

Acapela

Acapela is one of the most popular speech synthesizers. The web utility supports more than 30 languages. One of the main advantages of this online resource is the huge number of votes. For the same English, about 20 timbres are available (woman, man, child, teenager, joyful, etc.). Unfortunately, the Russian language has been deprived. Only one female voice is available to reproduce text in Russian.

The web program has a minimum number of settings. Thanks to this, anyone can understand the controls. To reproduce the text you just need to:

Click on the first field. A list will appear in which you need to select a playback language.
Click on the next field. In the list you need to select one of the proposed tones.
In the large field, enter the text that you want to turn into an audio track.
Then you need to agree to the terms of service. To do this, check the box next to the appropriate item. A Listen button will appear, by clicking on which you can listen to the previously entered text.

The sound of the web program is quite decent. The maximum number of characters that can be spoken is 300.

Linguatec

It is also worth paying attention to a service called Linguatec. This is a German Internet resource that is extremely popular outside its homeland. And this is not at all surprising. The web service supports more than 40 languages (of course, Russian is one of them). Interestingly, Linguatec is capable of reproducing different dialects. For example, there are several versions of English: British, American, Australian, Irish, etc. Thanks to this feature, Linguatec is an excellent program for those who want to learn the correct pronunciation of a word in a foreign language.

The text can be played in both male and female voices. The free play limit is 250 characters. To get full functionality, you will have to purchase a desktop speech synthesizer. Its cost is 30 euros.

How to use an online speech synthesizer? The following instructions must be followed:

Click on the drop-down list under Voice Reader and select the playback language.
In the drop-down list, which is located a little lower, define the voice. For example, for the German language there are only a few pronunciation options: male voices - Yannick and Markus, female voices - Petra and Anna.
Now enter the text you want to play in the appropriate field. Remember that its size should not exceed 250 characters (including spaces).
To convert symbols to audio, you need to click on the arrow button.

Oddcast

Oddcast is a fairly well-known company that creates interactive companions for various brands. The company also has its own speech synthesizer, which can be used to reproduce text. The web utility supports about 30 languages. Most have several variations of female and male voice. The program can reproduce text up to 170 characters.

A distinctive feature of this service is its animated model. She follows the cursor and moves her lips as the text plays. The model does not contain any useful functionality. Its purpose is to demonstrate the capabilities of Oddcast.

Working with Oddcast is very simple. It is necessary to configure the basic characteristics of the web utility. In total, the program provides 5 parameters:

Enter Text. Here we write the text that needs to be converted into speech.
Language. Here you need to select the language in which the text will be reproduced.
Voice. Select a voice for reading (their number depends on the selected language).
Effect. Oddcast allows you to add voice effects to spoken text. The choice is quite large. There is a function for acceleration, echo, pitch, etc.
Level. Allows you to customize the selected effect. For example, if you use acceleration, then using this field you can set how fast the text will be played.

By changing the characteristics to suit your needs, you can launch the talker. To do this, click on the Say it button.

iSpeech

Another service that is worth paying attention to is iSpeech. The web utility has a good voice engine, which has a positive effect on audio quality. The service supports about 30 languages. The maximum number of characters that can be spoken is 150.

The service interface is designed in a minimalist style. Everything is done very clearly. To select a language, click on the corresponding flag. If you need to determine the timbre, click on the female or male icon. In addition, the program has three playback modes. You can listen to the text at a slow, normal or accelerated pace. Having set the necessary parameters, you need to click on the Play button. The text to speech conversion will begin.

iSpeech is ideal for learning foreign languages. During playback, the utility highlights the words that were spoken out loud. Thanks to this, you can find out the correct sound of a particular word without being distracted from the topic of the text. Another feature of the service is that the voiced fragment can be downloaded to your PC as an audio track. However, this service is only available to owners of paid accounts, the cost of which is quite high. The cheapest subscription costs $500.

Text-To-Speech

Text-To-Speech is a speech synthesizer that boasts a good voice engine. The Internet utility has a very simple, uncomplicated interface. The program supports about 10 of the most popular languages. Of course, this includes Russian. To work with this web resource it is enough:

Select options for reading text. There are only two of them. To select a language, click on the drop-down list next to Language. Nearby you can see the Speed parameter. It is responsible for reading speed and is installed in a similar way.
Now you need to enter the text in the appropriate field. The web utility is capable of processing fragments whose size does not exceed 1000 characters.
Next, you need to click on the Say it button. The program will produce an audio file with your text. You can listen to it directly on the website.

Google Translate

A web service called Google Translate includes a talker. It's very easy to use. You must enter the text in the appropriate field and click on the speaker icon. Voila - the robot read the specified fragment. Google Translate has a limit on the size of the text. You cannot enter more than 5000 characters.

The main advantage of Google Translate is that it supports a huge number of languages. However, there was a fly in the ointment. Firstly, you cannot change the voice timbre, reading speed and other parameters. Secondly, the playback quality leaves much to be desired.

From-Text-To-Speech

A large amount of text can be processed by a web service called From-Text-To-Speech. The utility is capable of converting up to 50 thousand characters at a time. This is an order of magnitude higher than that of competitors. The web program supports 10 languages, which are the most popular. These include Russian.

To use the web service, you first need to configure the voiceover settings. Luckily there aren't many of them here. First of all, you need to set the language and determine the voice that will read the text. For Russians, only one timbre is available - female. Then you need to adjust the reading speed. There are four options in total: slow, normal, fast and very fast. Having set the appropriate parameters, you need to click on the Create Audio File button.

The conversion process will begin. As a rule, this takes no more than a minute. Once the conversion is complete, you will be taken to a new page. There will be a hyperlink in the form of the inscription Download audio file. You need to right-click on it and select the “Save link as” option from the drop-down list. Choose a location on your PC and download audio. The file is saved in MP3 format.

2 ears

It is impossible not to mention domestic services for converting text into audio. One of the best at this is a website called 2ukha. The main advantage of the service is the ability to work with large volumes of text. If other resources voice small fragments of up to 200-300 characters, then 2ukha is capable of processing 100 KB of text. This is about 100 thousand characters. And, most importantly, everything is completely free.

How to work with the 2ukha website? Everything is very simple. To convert text into spoken language you just need to:

This service definitely deserves attention. The quality of the voiced text is at a quite decent level. And the ability to process huge files is also good news. However, the 2x web service also has disadvantages. For example, the number of available languages. The service only works with Russian.

Oral speech synthesis is the transformation of previously unknown textual information into speech. Speech output is an implementation of a speech interface to simplify the use of the system. In fact, thanks to speech synthesis, another data transmission channel from the computer is provided, mobile phone to a person, similar to a monitor. Of course, it is impossible to convey a drawing with your voice, but listen to email or a daily schedule in some cases is quite convenient, especially if at that time the eye is occupied with something else. For example, coming to work in the morning, preparing for negotiations, you could straighten your tie or hairstyle in the mirror while the computer reads out loud last news, mail or reminds important information for negotiations.

Figure 2.2 - Acoustic signal processing

Speech synthesis technology has found wide application for people with vision problems. For everyone else, it creates a new dimension of ease of use of technology and significantly reduces the load on vision and the nervous system, and allows the use of auditory memory.

Figure 2.3 - Speech synthesis

Any text consists of words separated by spaces and punctuation marks. The pronunciation of words depends on their location in a sentence, and the intonation of a phrase depends on punctuation marks. Finally, pronunciation also depends on the meaning of the word! Accordingly, in order for synthesized speech to sound natural, it is necessary to solve a whole range of problems related to both ensuring the naturalness of the voice at the level of smoothness of sound and intonation, and with the correct placement of stresses, deciphering abbreviations, numbers, abbreviations and special characters, taking into account the peculiarities of Russian grammar language.

There are several approaches to solving the problems:

1) allophone synthesis systems - provide stable, but not enough natural, robotic sound;

2) systems based on the Unit Selection approach - provide a much more natural sound, but may contain fragments of speech with sharp dips in quality, up to loss of intelligibility;

3) hybrid technology based on the Unit Selection approach and supplemented with allophone synthesis units.

Based on this technology, the VitalVoice system was created, which provides stable and natural sound at an acoustic level.

Speech communication is natural and convenient for humans. The goal of speech recognition is to remove the middleman in communication between a person and a computer. Controlling a machine with your voice in real time, as well as entering information through human speech, will greatly simplify the life of a modern person. Teaching a machine to understand without an intermediary the language that people speak among themselves is the task of speech recognition.

Scientists and engineers have been solving the problem of verbal communication between humans and machines for many years. The first speech recognition device appeared in 1952; it could recognize numbers spoken by a person. Commercial speech recognition programs appeared in the early nineties.

All speech recognition systems can be divided into two classes:

1) Speaker-dependent systems - are tuned to the speaker’s speech during the learning process. To work with another speaker, such systems require complete reconfiguration.

Figure 2.4 - Speech recognition

2) Speaker-independent systems - the operation of which does not depend on the speaker. Such systems do not require preliminary training and are capable of recognizing the speech of any speaker.

Initially, the first type of system appeared on the market. In them, the sound image of the team was stored in the form of a holistic standard. Dynamic programming techniques were used to compare the unknown utterance and the command reference. These systems worked well when recognizing small sets of 10-30 commands and understood only one speaker. To work with another speaker, these systems required complete reconfiguration.

In order to understand continuous speech, it was necessary to move on to much larger dictionaries, from several tens to hundreds of thousands of words. The methods used in systems of the first type were not suitable for solving this problem, since it is simply impossible to create standards for such a number of words.

In addition, there was a desire to make a system independent of the speaker. This is a very difficult task, since each person has an individual manner of pronunciation: speech rate, voice timbre, pronunciation features. Such differences are called speech variability. To take it into account, new statistical methods have been proposed, based mainly on the mathematical apparatus of Hidden Markov Models (HMMs) or Artificial Neural Networks. Instead of creating standards for each word, standards are created for the individual sounds that make up the words, so-called acoustic models. Acoustic models are formed by statistical processing large speech databases containing speech recordings of hundreds of people.

Existing speech recognition systems use two fundamentally different approaches:

Lexical recognition

Note that creating speech recognition systems is an extremely difficult task.

Recently I was faced with the problem of choosing a voice speech synthesizer. The main requirements are support for the Russian language and more or less normal pronunciation.
For those who are not aware of what a speech synthesizer is, I’ll tell you - this is a special program, the purpose of which is to convert written text into spoken speech. This is the so-called synthesis.
Why is this necessary? Well, for example, when you need to record a voice message in someone else's voice. It can be useful for foreigners to hear the pronunciation of a particular word. A speech synthesizer is convenient for reading when you need to include a fairy tale for your child that is not in audiobooks. And in general, there are all sorts of situations.
So, in the selection process I found several very useful tools, among which are working online with support for the Russian language, and now I will tell you about them.

Google Translate

This is a truly multi-purpose product that can be used in completely different ways. Main advantages:
— this is a completely free service;
— work online without installation. All you need is Internet access;
— in my opinion, this speech synthesizer has the best voice module, the closest to natural;
- probably the most best team developers and technical support in the world;
— the largest number of supported languages.
Unfortunately, there is only one voice option - female. I didn't find a choice.

RHVoice

An excellent multilingual speech synthesizer from the Russian developer - Olga Yakovleva. There are versions for both Windows and Linux operating systems. The developer of the synthesizer is Olga Yakovleva. The program is distributed completely free of charge and is available on the official website in two versions: as a SAPI5-compatible standalone version and as a module for free program NVDA Screen Access. This voice synthesizer can voice Russian texts in three voices - Elena, Irina and Alexander.

Acapela

Acapela is perhaps one of the most popular and widespread voice synthesizers in the world. The main feature is the voice-over of texts in more than thirty languages. If we consider the Russian language, then there are two voices available - Nikolai and Alena. Moreover, the latter is more perfect and natural in terms of pronunciation. In demo mode, only Alain's voice is available on the site.
The program is available for download on the official website and supports all popular modern operating systems - Windows, Linux, Mac. There are even versions for Android and iOS.

Vokalizer

Milena Women's Voice is another very popular voice speech synthesizer engine from Nuance - it is very high quality and natural sounding. You can hear it in call centers and in various network speech systems, as well as in various applications - such as Moon+ Reader Pro, Full Screen Caller ID, Cool Reader, in navigation programs TomTom, iGo Primo.
Among the advantages are the ability to install various dictionaries, adjust volume, emphasis and reading speed.
The program code is open, you can download it for free on the official website, just like the installer of the program itself.

Festival

Festival is not just another voice speech synthesizer, but an entire speech recognition and synthesis system with various APIs. Developed by the Speech Technology Research Center at the University of Edinburgh.
Festival is designed to support multiple languages. Supports English, Welsh and spanish languages. But it is possible to connect voice packages of other languages: Czech, Finnish, Hindi, Italian, Marathi, Polish, Russian and Telugu.
The program code is open, the voice synthesizer itself is distributed under an open source license and is available only for Linux operating systems. True, there is a ported version for Macintosh.

ESpeak

The last speech synthesis system in my review, the ESpeak program, has been in development for about 8 years. The latest version is 1.48.04 dated April 6, 2014. This voice speech synthesizer is cross-platform - there are versions for Windows, Linux, Mac OS X, and even for RISC OS, although the last two have not been supported for a long time.
Separately, I note that eSpeak is used in mobile operating systems Android systems, however, it has a number of significant errors.
The program supports fifty different languages, support for which is indicated when installing the program.
One of the main disadvantages of this voice synthesizer is that it generates voices only in a WAV file. You can download the program for free on the official website.

On my own behalf, I’ll only add that I liked RHVoice and Vokalizer, although this is largely an individual matter and largely depends on what you want to get. So try it, install it and watch. I think that one of the presented options should definitely suit you.

Related materials:

Site Map