Great Britain flag

How to Transcribe English

Save time by automatically transcribing your English audio or video content to text. Use the best speech-to-text recognition software on the market.

Interested in transcribing other languages?



Inner circlesMid circlesOuter circlesOne number icon

Upload a new file

You need to press the green "New file" button from your account files page. After that, you press the "Upload" input on the modal that pops up. Now, you must select the file you want to upload from your computer.

If you want, there is an easier way to upload a file. Once again, you need to be on your account files page. You drag and drop the file you want to upload on this page. That's it!

Inner circlesMid circlesOuter circlesTwo number icon

Select the language

After you have successfully selected the file you want to upload, by either the drag and drop method or the button one, you will need to choose the language of the file. We will guess it is in English since this tutorial is about transcribing English audio or video. For this, you need to press on the input below the "Select model for transcription" text. In the dropdown that will appear, choose English.

Inner circlesMid circlesOuter circlesThree number icon

Select the model

We are constantly developing new models for both new and existing languages. Usually, the model names are one or two descriptive words.

For example, "General" refers to a model for everyday speech. "Legal" refers to a robust model of legal words for lawyer's meetings or trials. "Medical" is a model that specializes in medical terms, for example, medical exams, operating room, or even medical school courses. And so on. Also, a model might have v1, v2, v3, etc., associated with its name. That is the model version; the higher the number, the better the model.

Inner circlesMid circlesOuter circlesFour number icon

Model options

Please note that not all models have these features.

"Post-processing" switch: If enabled (i.e., the switch is green), this will automatically add punctuation (,.!?) and capitalization (the first letter of a sentence will be written with a capital letter) to your transcript. It will add entity recognition (i.e., brand names, person names, national holidays, etc., written with a capital letter). It will also add a numerals conversion layer model, which will try, based on context, to rewrite numbers from letters to digits (e.g., thirteen will become 13, or two will become 2). It will also disable the disfluencies (e.g. grunts or non-lexical utterances such as "huh", "uh", "erm", "um", "hmm", etc.).

"Speakers Diarization" switch: If enabled (i.e., the switch is green), a speaker recognition model will be added to your transcript. This means that the paragraphs of the resulting transcript will be split based on which speaker is speaking at a given time.

"Multiple Channels" switch: If enabled (i.e., the switch is green), a speech recognition model based on the channel will be added to your transcript. This means that the paragraphs of the resulting transcript will be split based on the channels of that file. This is very useful for files that come from call centers, as these always have two channels - the client channel and the agent channel. Please note that you may use only one of the "Speakers Diarization" or "Multiple Channels" switches.

"Add words to your custom vocabulary" input: This is useful when you want to tell our models to watch out for some words, and if it finds them in your transcript, to keep them as they are (e.g., both "ate" and "eight" words sound the same. By writing "ate" in this input, you will tell our model that when it hears either "ate" or "eight," you want to keep "ate"). Note you can add multiple words.

"Select a boost param for these words" selector/input: This is tied up with the previous option. The higher this number is, the higher the chance that the word you want will be kept (e.g., once again, we will use the "ate" and "eight" words. Our model, when checks the sound of a word, it gives accuracy. If that accuracy is lower than your boost param, then the word you added to the above input will be chosen. For example, our model gives "ate" an accuracy of 4.73 and "eight" an accuracy of 8.31. You add the word "ate" in the above input, and you give it a boost of 9. In this case, the model will choose the word "ate" with an accuracy of 4.73, over the word "eight" with an accuracy of 8.31. Suppose you added the bost to 6, then the model would have chosen the word "eight"). Note that all words will have the same boost.

"Save as default configuration" checkbox: If you check this box, then everything in the upload modal pop-up (from language to boost param) will be kept for you next time you upload a new file, so you won't have to switch again, add custom vocabulary, etc.

Inner circlesMid circlesOuter circlesFive number icon

Send the file with its options

After checking the options you want or need, you need to press the green "Upload" button and wait for the file to be sent to our servers to be automatically transcripted. After that, you can sit back and relax. We should also note that the file usually gets transcripted in a quarter of its length (e.g., if the file has 4 hours, it will be done in about 1 hour).

Experimentează Astăzi Viitorul Recunoașterii Vocale

Încearcă Vatis acum, fără card de credit.

Waveform visual