Using Amazon Polly to read a Dialogue from Julius Caesar

Jan 30, 2023 # Amazon Polly, dotNet, AWS, Text to Speech, AI

Want to learn more about AWS Lambda and .NET? Check out my A Cloud Guru course on ASP.NET Web API and Lambda.

I’ve written a couple of posts about Amazon Polly, the text-to-speech service. In those, I used Polly to read text that I typed in at the command line.

In this post, I am going to read a file containing text from the play Julius Caesar where Cassius and Brutus discuss Caesar. Amazon Polly will then convert the text to speech, in two different voices. Finally, I’ll play dialogue.

Prerequisites

An AWS account with the permission “AmazonPollyReadOnlyAccess”.

The Code

If you don’t want to copy/paste along with this post, see the attached zip file for the full source code.

The code is not complicated, I set up two Polly voices, read the text from the file, and depending on who is speaking, I request that Polly converts the text to speech with the appropriate voice.

Then I copy that line of audio to a file that contains the full dialogue.

Create a console application.

dotnet new console -o TwoVoices

NuGet Packages

Add three NuGet packages to the project - AWSSDK.Polly, CsvHelper, NAudio, and NetCoreAudio.

Model for lines of dialogue

Create a file named, PlayLine.cs and add the following code. This represents a line of dialogue from the play as stored in the CSV file.

public class PlayLine 
{
    public string Id {get;set;}
    public string Play {get;set;}
    public string CharacterLineNumber {get;set;}
    public string ActSceneLine {get;set;}
    public string Character {get;set;}
    public string Line {get;set;}
}

Using Statements

Add the following using statements to the Program.cs file.

using Amazon.Polly;
using Amazon.Polly.Model;
using NetCoreAudio;
using CsvHelper;
using System.Globalization;
using CsvHelper.Configuration;
using NAudio.Wave;

Polly Voices

Create the Polly client and set up two voices, one for Cassius and one for Brutus.

var pollyClient = new AmazonPollyClient();

// Cassius
var matthewRequest = new SynthesizeSpeechRequest
{
    OutputFormat = OutputFormat.Mp3,
    VoiceId = VoiceId.Matthew,
    Engine = Engine.Neural,
};

// Brutus
var brianRequest = new SynthesizeSpeechRequest
{
    OutputFormat = OutputFormat.Mp3,
    VoiceId = VoiceId.Brian,
    Engine = Engine.Neural,
};

Create a file to store the full dialogue

Each line of audio will be concatenated to the end of this file.

File.Create("dialogue.mp3").Dispose(); // create the file and release it

Read the dialogue from the file

In the attached zip there is a CSV file containing the dialogue.

var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    Delimiter = ";"
};

var reader = new StreamReader("dialogue.csv");
var csv = new CsvReader(reader,config);

var linesFromPlay = csv.GetRecords<PlayLine>().ToList();

Convert the text to speech

Pass each line of the dialogue to the Polly client, and depending on who is speaking, use the appropriate voice.

foreach (var lineFromPlay in linesFromPlay)
{
    if (lineFromPlay.Character == "BRUTUS")
    {
        await GetLineAsSpeechFromPolly(lineFromPlay.Line, brianRequest);
    }
    else
    {
        await GetLineAsSpeechFromPolly(lineFromPlay.Line, matthewRequest);
    }
    CombineLineWithDialogue("line.mp3", "dialogue.mp3");
}

async Task GetLineAsSpeechFromPolly(string line, SynthesizeSpeechRequest request)
{
    request.Text = line;

    var response = await pollyClient.SynthesizeSpeechAsync(request);
    using (var fileStream = new FileStream("line.mp3", FileMode.Create, FileAccess.Write))
    {
        response.AudioStream.CopyTo(fileStream);
        response.AudioStream.Flush();
    }
}

Combine new line with the full dialogue

The Polly client returns the audio as a stream. I save that stream to a file and then use NAudio to combine the new line, with the full dialogue in the existing file.

static void CombineLineWithDialogue(string sourceFileName, string destFileName)
{
    using (var destFileStream = new FileStream(destFileName, FileMode.Append, FileAccess.Write))
    {
        using (Mp3FileReader sourceFile = new Mp3FileReader(sourceFileName))
        {
            Mp3Frame frame;

            while ((frame = sourceFile.ReadNextFrame()) != null)
            {
                destFileStream.Write(frame.RawData, 0, frame.RawData.Length);
                destFileStream.Flush();
            }
        }
    }
}

Play the audio

Finally, play the audio using the NetCoreAudio package.

var player = new Player();
await player.Play("dialogue.mp3");
Console.WriteLine("Press any key to exit.");    
Console.ReadKey();

Conclusion

It is very easy to use Polly to read text in multiple voices, this example could be extended to read the whole play in a variety of voices.

Download full source code.

Amazon Polly dotNet AWS Text to Speech AI