GUIDELINES

On this page we provide our general guidelines of how to format and structure your scripts so that you can make the best use of our technology.

SUBMITTING A SCRIPT
SELECTING VOICES
CREATING DIALOGS
ABOUT THE CONTENT

Below is a video version of these guidelines:

SUBMITTING A SCRIPT

For purposes of submitting to VoicedScripts.com, a script includes the text that you want to create a text to speech mp3 (TTS file) of, with natural voices. It can be a file (which we'll call a manuscript), a website address, or any text that you enter into our submission textbox.

You can upload a manuscript as a plain text file (*.txt), as a Microsoft Word document (*.docx), as an Adobe PDF file (*.pdf), or as an e-Book in EPUB (*.epub) format. Please make sure before submitting it that it is a text document and does not contain any photographs, images, charts, diagrams, graphs, captions, tables, embedded audio or video, macros, or hyperlinks, etc. While our system is able to simply ignore these and process only the textual content, multi-media elements can significantly slow down natural text to speech generation.

If rarely, it can happen that PDF documents and e-Books, especially old ones, are so poorly structured, that our system cannot read them. In this case, we recommend that you copy the document's content and insert it as plain text. Obviously, we cannot read password protected and encrypted documents and these will cause the natural TTS production to halt with an error message.

There is a 2MB limit to the size of the manuscript file. This corresponds roughly to the text file size of a novel with 1,200 pages. In addition, your manuscript should not be smaller than 36 Bytes (i.e. it should be a text with at least 36 letters).

Alternatively, you can specify a web page address, a URL that starts with http:// or with https://. Please do note in this case, that our system is not configured to be used as a screen-reader and attempts to extract the actual content of the web page rather than reading navigation menu items, page headers or footers etc. Unfortunately, many web designers have not yet adopted HTML5 content structuring rules, so that still, we cannot guarantee that all unnecessary elements will be eliminated leaving behind clean content. But, you can make use of our system's ability to generate a downloadable text file, which you can edit, eliminating unwanted elements.

Some websites prevent automated processing of their content, when our system will fail completely with website to audio conversion. In this case, we recommend that you copy the website's content and insert it as plain text. For this purpose and the general, simplest use case, we have provided a textbox where you can just enter some text between 36 and 36,000 characters and submit it for text to mp3 processing.

Please make sure that your script does not contain spelling or punctuation mistakes, as these will lead to reading errors.

We currently create downloadable TTS mp3 of documents written in English only, which also means that they should not contain foreign language words that are not in the English dictionary.

SELECTING VOICES

What makes VoicedScripts.com unique is that you can assign different voices to different characters in your script. Please note that you have to enter character names exactly as they appear in your script, except that capitalization does not matter. For example, if you enter Mister Bing as a character name, and in your script, he appears as Mr. Bing, our system cannot make a match. In addition, there is the restriction that character names must start with a letter or the hash sign (#) and contain only letters of the English Alphabet, numbers, spaces, or one of the following: . - ' ( ) & # which means, names, numbers, and titles, such as "Dr. Phil's (First) Secretary" or "Handy-men #1 & #2". Where names in your script include diacritics, such as in Martina Navrátilová, Björn Borg, Renée Zellweger, or Zoë Baird, please replace the diacritics with English counterparts, which means Martina Navratilova, Bjorn Borg, Renee Zellweger, Zoe Baird.

If you do not wish to designate character voices, you don't have to. In this case, a narrator voice, either designated by you or assigned by default, will read all sections. Provide the character name you wish to assign in the corresponding field; leave the other fields blank.

There are four pre-defined identifiers that you can assign to any voice with any script. These are Narrator, Header, Quote, and Direction. In relation to web pages, a header is a text segment enclosed in <h1> to <h6> HTML tags. Otherwise, a header is recognized where text not assigned to a character starts, continues, and ends in ALL CAPS. In relation to web pages, a quote is a text segment enclosed in <q> or <blockquote> HTML tags. Otherwise, a quote is recognized where text not assigned to a character appears enclosed by double quotation marks ("quotation"). All text sections not assigned to a character, to header, or quote are assigned to the narrator.

There is one exception to this rule in relation to correctly formatted screenplays, where short directives are inserted in parentheses within character dialogs. These are called parenthetical directions and usually instruct an actor to read the text in a specific way. In voiced scripts, these segments are assigned to the voice identified with the keyword: Direction. In case no voice has been marked with the keyword: Direction, parenthetical directions are ignored and skipped over. In relation to correctly formatted screenplays, our software also knows how to interpret reading instructions following character names such as (V.O.) for "voice over", (O.S.) for "off-screen", and (O.C.) for "off-camera", and can also spell out location acronyms such as INT. for "interior" and EXT. for "exterior".

The nationality attribute provided with each voice determines the accent that the English text is read with. Currently available are American, British, Australian, and Indian accents. Voices designated as tourist voices will not read text in a foreign language but read English with a typical accent of someone from the corresponding country. We do NOT recommend choosing a tourist voice as a narrator voice.

Each voice also has a gender attribute. We offer a large number of standard, high, and low pitched, female and male, natural voices.

Please pay attention that our system will remember voice selections and suggest the same voices for a subsequent round of processing, within the same session. If you start creating a new voiced script and the clear all earlier selections button appears, you should click it to remove any old assignments.

CREATING DIALOGS

When you assign a character name to a voice while completing our submission form, our TTS generation system understands that it should look for that character name in your script. Once it identifies the character name, it designates the subsequent portion of the text to the corresponding voice.

Our text to speech software is able to interpret the following as dialog segments:

A character name followed by a space and double quotes, or a colon and double quotes, or a colon, a space and double quotes, in each case ending in double quotes. Examples:

John "This is something I said."
John:"This is something I said."
John: "This is something I said."

These can appear at the beginning of a new line or somewhere within text following a punctuation mark. For example:

Jane was looking out. John "What do you see?"

The first sentence will be read by the Narrator, while the voice assigned to John will read the quoted text.

A character name starting a new line, followed by a colon and ending in a new line, or a colon and a new line and ending in a new line, or a new line and ending in double new lines. Examples:

John: This is something I said.
Jane: Yes, you said that.
John:
This is something I said.
Jane:
Yes, you said that.
John
This is something I said.
But I said it a long time ago.

Jane
Yes, you said that.

These different structures can be mixed to achieve a number of different results, such as quotations within character dialogs or narrations interspersed with dialogs. If you have quotations within character segments, make sure to choose a structure from the latter options.

Below screenshot image might apply to the script that follows it:

It was a sunny afternoon. Minnie and her mother Jane were walking through the park. Minnie asked her mother:

Girl: "Where are we going?"

Mother: "To the playground."

In this example, the narration (meaning the text that starts with "It was a sunny afternoon.") will be read by Joanna, since she has been assigned the role of Narrator. She will NOT read "Girl:" and will NOT read "Mother:" as these characters have been assigned to voices. Ivy will read the text that is enclosed by the double quotation marks (" ") following "Girl: " and Sandra will read the text that is enclosed by the double quotation marks (" ") following "Mother: ". As you can see, it is not the real names of the characters that matter but how they are addressed in the dialog.

If no voice is assigned to a character or characters in general, the Narrator will read the dialog, but in this case speak out the character names as well, such as "Girl:" and "Mother:" in the above example. Where, on the other hand, more than one voice is assigned to a character or the Narrator by mistake, the last assignment will be the significant one.

You can easily insert pauses of up to nine seconds, simple melodies between fifteen and twenty-five seconds, and common sound effects into your voiced scripts. To insert a pause anywhere in your text, simply write {PAUSE-x} where x is between 1 and 9. To insert a melody or sound effect, simply write {SND-xxx} where xxx is a three digit code. Sounds are inserted sequentially between audio segments, which means that the text-to-speech is paused while the inserted file is playing. For example: "They heard a loud noise. {SND-600} Lightning had struck." inserts a short thunder sound between the two speech segments. Codes for melodies and sound effects are listed on a separate page.

Make sure to avoid the use of quotation marks except to demarcate dialogs. This also means that text such as 5" (five inches) should be avoided, as this may inadvertently lead to erroneous interpretations. Where these do occur, write them out (for example writing out 5 inches).

ABOUT THE CONTENT

Even though our system can read headers and segmented text, content lists and tables, especially those embedded in PDF documents or EBooks, can at times lead to false interpretations. PDF files that have been obtained by scanning old documents, those with poor paragraph and line structure, and badly coded web pages are especially prone to errors. That's why we offer a confirmation page and plain text downloads of your scripts, so that you can easily edit them before producing a downloadable TTS mp3. Please do note, that the confirmation text may slightly differ from your original, since our system converts some symbols to their readily readable textual counterparts, and replaces some punctuation marks by others, in an effort to make your audio easier to synthesize by an AI.

Our natural text to speech system is not trained to handle mathematical equations or formulae, such that a minus sign may be interpreted as a dash and a division sign as a slash, etc. Therefore, where mathematical expressions or any other domain-specific special characters have to be used, these should be written out. For example, instead of " < " write "less than" and instead of " > " write "greater than", etc. Furthermore, numbers are read as cardinals, which means that, for example, "12345" will be converted to speech as "twelve thousand three hundred forty five". If you want it to be read as an ordinal, you should write it out as "one two three four five", etc. Something similar applies to the spelling of characters: Where you want the letters of the word "can" to be read, you should write "C A N".

As for dates and times, these will be spoken out in the customary way of reading dates and times based on the nationality of the voice selected. If you have a specific way in mind how these should be read, you should write it out instead of assuming. For example, writing "September Third, 2020" specifies better how you want the date to be read than writing "3.9.2020".

If your manuscript relies mainly on charts, diagrams, data, symbols, etc. it will not convert well to audio. Poems and scripts that are based on unusual grammer or sentence structures are also not ideal candidates for creating voiced scripts.

HTML tags are ignored by our website to audio conversion software and so are text enclosed in brackets or braces, as these are used for inserting reading instructions and file codes.

If you look at the text version of the content of a website, you will notice that links are followed by double exclamation marks, which causes most voices to read text enclosed within anchor tags with emphasis. In general, double exclamation marks following a word or text segment is interpreted as requiring strong intonation when reading, whether followed by a full stop or not. In the case of the former, a short break will follow naturally; in the case of the latter, speech will simply return to normal. Please use this feature sparingly, only where needed, as emphasizing words or text segments this way often disrupts the natural flow of human-like speech.

Parenthesis, ellipsis, underscores, and em-dashes are handled similar to en-dashes flanked by spaces, as is customary in reading text aloud (which means, the voice will NOT say, for example, "open parenthesis" or something similar. If your intention is that this should be spoken out, then you should write it out.)

Please do NOT expect our system to know the expansions of uncommon abbreviations. If you want these to be spelled out, you have to write them out. A dot at the end of an abbreviation will be interpreted as a full stop.

A final warning: Our web application was not designed for creation of multiple voiced scripts simultaneously. If you start creating a new voiced script in a tab in the same browser window, where another one is already being edited, the two processes may interfere with one another, with unexpected results.