A Simple Example of Amazon Transcribe with .NET

Want to learn more about AWS Lambda and .NET? Check out my A Cloud Guru course on ASP.NET Web API and Lambda.

Download full source code.

This is the first of a few posts on Amazon Transcribe, a service that converts audio to text.

Here you will see how to send a file for transcription. This process can take a while, so you will poll for completion, then download the transcription.

The downloaded transcript is in JSON format. At the top of the file is a transcript that does NOT differentiate between speakers, it is one lone paragraph with everything everyone said and no speaker attribution.

In a subsequent post, I will show how to parse the JSON to identify who said what, and when.

But for now, the basics.

Transcription has a few steps.

  1. Upload the file to S3
  2. Start the transcription job
  3. Poll the job until it is complete
  4. Download the transcription

0. Some Setup

Create a new console application and add the AWSSDK.TranscribeService, and AWSSDK.S3 NuGet packages.

Add a few using statements, and create the Transcribe client and S3 TransferUtility.

 1using System.Net;
 2using Amazon.S3;
 3using Amazon.S3.Transfer;
 4using Amazon.TranscribeService;
 5using Amazon.TranscribeService.Model;
 6
 7string bucketName = "your-bucket-name";
 8string fileToTranscribe = "your-audio-file.mp3";
 9
10IAmazonTranscribeService amazonTranscribeService = new AmazonTranscribeServiceClient();
11TransferUtility transferUtility = new TransferUtility(new AmazonS3Client());

1. Upload the file to S3

Before you can upload a file to S3, you need to create a bucket.

See this blog post for more information on creating an S3 bucket.

I will use the S3 TransferUtility to upload the file. This is a simple wrapper around the S3 client that makes uploading files to S3 very easy.

Uploading the file to S3 is a single line of code, and its S3 URI is predictable.

await transferUtility.UploadAsync(fileToTranscribe, bucketName);
string s3Uri = $"s3://{bucketName}/{fileToTranscribe}";

After the file is uploaded, you call the StartTranscriptionJob method.

string jobName = Guid.NewGuid().ToString();
var startTranscriptionJobResponse = await StartTranscriptionJob(s3Uri, jobName);

2. Start the transcription job

To start the transcription job, you use the AmazonTranscribeService instance, passing a StartTranscriptionJobRequest that details the language of the audio, where the audio file is located, and where the transcription should be stored.

async Task<HttpStatusCode> StartTranscriptionJob(string s3Uri, string transcriptionJobName)
{
    var startTranscriptionJobRequest = new StartTranscriptionJobRequest()
    {
        TranscriptionJobName = transcriptionJobName,
        LanguageCode = LanguageCode.EnUS,
        Media = new Media()
        {
            MediaFileUri = s3Uri
        },
        OutputBucketName = bucketName
    };

    var startTranscriptionJobResponse = await amazonTranscribeService.StartTranscriptionJobAsync(startTranscriptionJobRequest);
    return startTranscriptionJobResponse.HttpStatusCode;
}

3. Poll the job until it is complete

The transcription job can take a while, in this example, you will poll for completion with a simple while loop. But in a subsequent post, I’ll show how to get notified upon completion.

The GetTranscriptionJob method returns the status of the job. When the job is complete, the status will be COMPLETED.

This is the polling loop.

 1async Task<bool> PollTranscriptionJob(string jobName)
 2{
 3    while (true)
 4    {
 5        var getTranscriptionJobResponse = await amazonTranscribeService.GetTranscriptionJobAsync(new GetTranscriptionJobRequest()
 6        {
 7            TranscriptionJobName = jobName
 8        });
 9
10        var transcriptionJobStatus = getTranscriptionJobResponse.TranscriptionJob.TranscriptionJobStatus;
11        if (transcriptionJobStatus == TranscriptionJobStatus.COMPLETED)
12        {
13            Console.WriteLine("Transcription job completed");
14            Console.WriteLine($"Output is available at {getTranscriptionJobResponse.TranscriptionJob.Transcript.TranscriptFileUri}");
15            return true;
16        }
17        else if (transcriptionJobStatus == TranscriptionJobStatus.FAILED)
18        {
19            Console.WriteLine("Transcription job failed");
20            return false;
21        }
22        else
23        {
24            Console.WriteLine("Transcription job not completed yet");
25        }
26        await Task.Delay(5000);
27    }
28}

4. Download the transcription

When polling tells you the job is complete, download the file.

if (await PollTranscriptionJob(jobName))
{
    await transferUtility.DownloadAsync($"{jobName}.json", bucketName, $"{jobName}.json");
}
else 
{
    Console.WriteLine("Transcription job failed");
}

You can open the downloaded JSON file in a text editor to see the transcription.

Conclusion

It’s not very difficult to send a file for transcription and get the results back. In the next post, I’ll show how to parse the JSON when there is more than one speaker and produce a document that shows who said what, with timestamps. Download full source code.

comments powered by Disqus

Related