Indexing the Works of Shakespeare in ElasticSearch – Part 4, Searching via Web API in .NET 5

Full source code available here.

This is part four of my four part series on indexing the works of Shakespeare in ElasticSearch.

In this I’ll show how to use the ElasticSearch “low level client” to perform the search. In the past I wrote a blog showing how to use a HttpClient to perform the search using Json, and this works fine, but Steve Gordon suggested I try to the Elastic client as it supports things like connection pooling and still lets me use Json directly with ElasticSearch.

To go along with the Elastic “low level client” there is a “high level client” called NEST, I have tried both and prefer to stick with Json, but you may find them more useful.

Because I develop on a few languages, Json is the natural choice for me. I use it when querying from Node.js, inside a HTTP client (Fiddler, Rest Client, etc) when figuring out my queries and I want to use it in .NET.

But Json and C# don’t go together very well, you have to jump through hoops to make it work with escaping. Or, as I have doe, use a creative approach to deserializing via dynamic objects (I know some people won’t like this), I find this much more convenient than converting my Json queries to the Elastic client syntaxes.

This examples shows how to use the a Web API application to search for a piece of text in isolation or within specific play.

The setup
There is very little to do here.

In Startup.cs add the following to the ConfigureServices(..) method –

services.AddSingleton<ElasticLowLevelClient>(new ElasticLowLevelClient(new ConnectionConfiguration(new Uri("http://localhost:9200"))));

In the SearchController add the following to pass the ElasticSearch client in via dependency injection –

public class SearchController : ControllerBase
{
    private readonly ElasticLowLevelClient _lowLevelClient;
    public SearchController(ElasticLowLevelClient lowLevelClient)
    {
        _lowLevelClient = lowLevelClient;
    }
//snip ..

I have two action methods, one to search for a play and line, and one to search for a line across all plays (I know they could be combined into a single action method, I want keep things simple) –

[HttpGet("Line")]
public ActionResult Line(string text)
{
    string queryWithParams = GetLineQuery(text);
    var lines = PerformQuery(queryWithParams);
    
    return Ok(lines);
}

[HttpGet("PlayAndLine")]
public ActionResult PlayAndLine(string play, string text)
{
    string queryWithParams = GetPlayAndLineQuery(play, text);
    var lines = PerformQuery(queryWithParams);

    return Ok(lines);
}

All very straightforward so far, but now comes the “creative” approach to handling the Json problems.

I put my ElasticSearch queries into their own files. The first is Line.Json

{
    "query": {
        "match_phrase_prefix" :{
            "Line": ""
        }
    }
} 

And the second is PlayAndLine.Json

{
    "query":{
        "bool": {
            "must": [
                { "match": { "Play": "" } }
               ,{ "match_phrase_prefix": { "Line": "" } }
            ]
        }
    }
}

These Json queries are loaded into dynamic objects and the relevant values are set in C#.
See lines 5 and 14 & 15.

private string GetLineQuery(string text)
{
    string elasticSearchQuery = System.IO.File.ReadAllText($"Queries/Line.json");
    dynamic workableElasticSearchQuery = JsonConvert.DeserializeObject(elasticSearchQuery);
    workableElasticSearchQuery.query.match_phrase_prefix.Line = text;

    return workableElasticSearchQuery.ToString();
}

private string GetPlayAndLineQuery(string play, string text)
{
    string elasticSearchQuery = System.IO.File.ReadAllText($"Queries/PlayAndLine.json");
    dynamic workableElasticSearchQuery = JsonConvert.DeserializeObject(elasticSearchQuery);
    workableElasticSearchQuery.query.@bool.must[0].match.Play = play;
    workableElasticSearchQuery.query.@bool.must[1].match_phrase_prefix.Line = text;

    return workableElasticSearchQuery.ToString();
}

The strings the above methods return are the queries that will be sent to ElasticSearch.

The below method makes the request, and deserializes the response into the ESResponse class. That class was generated by using https://json2csharp.com/.

private ESResponse PerformQuery(string queryWithParams)
{
    var response = _lowLevelClient.Search<StringResponse>("shakespeare", queryWithParams);
    ESResponse esResponse = System.Text.Json.JsonSerializer.Deserialize<ESResponse>(response.Body);
    return esResponse;
}

You might have noticed that I use System.Text.Json and Newtonsoft, this is because System.Text.Json does not support dynamic deserialization, see this discussion – https://github.com/dotnet/runtime/issues/29690.

That’s it, searching, and parsing of ElasticSearch results via a Web API application, feels a bit messy, but hope it helps.

Full source code available here.

Indexing the Works of Shakespeare in ElasticSearch – Part 3, Sending the Lines to Kinesis

Full source code available here.

In this, the third part of the series, I show how to read from the Shakespeare CSV file where each row represents a line from a play, (download here), and send these lines to Kinesis. The lambda in AWS will pick up the lines from Kinesis and forward them to ElasticSearch for indexing.

You need to configure the script to point to your ElasticSearch server (line 4) and to the Kinesis stream (line 97);

The script itself is fairly simple, it checks if the ElasticSearch index for the plays already exists, if not, it creates one using the mapping document provided.

Next, it reads from the CSV file and, row by row, converts the lines line from the play to Kinesis records and sends them to Kinesis.

I could have written a script that sends the lines directly to ElasticSearch, but there are a few drawbacks –

  1. If I have a small ElasticSearch servers (as is the case if you are following along from part 1 where I used Pulumi to setup the infrastructure), sending tens of thousands of index requests directly to ElasticSearch could overwhelm it, I have manged to do this a few times. To alleviate this, I could send in bulk, but I wanted to something with one more added piece of resilience – retries.
  2. If ElasticSearch is temporarily down or there is a networking issue, Kinesis and the lambda will retry the indexing request to ElasticSearch. This is taken care of out of the box, all I had to do was specify how many retries should be performed by the lambda.

The lambda and this code are coupled to an ElasticSearch index named “Shakespeare”, but it would be a simple thing to break this. In the code below, all you would need to do is add an index name to the kinesisRecord, and in the lambda, pass this name to the bulk indexing function (see part 2 for the code).

Unzip the attached file and run this to install the necessary modules –

npm install

This will look at the packages.json file and download what is needed.

Below is the code needed to send the rows of the CSV to Kinesis (this is included in the zip).

const AWS = require('aws-sdk');
const { Client } = require('@elastic/elasticsearch');
const elasticSearchClient = new Client({
    node: "https://your elastic search server"
});

const csv=require('csvtojson');
const awsRegion = 'us-east-1';

let kinesis = new AWS.Kinesis({region: awsRegion});

function readCSVAndSendToKinesis(streamName){
    csv({delimiter:';'})
    .fromFile('./shakespeare_plays_small.csv')
    .then((json) => {
        json.forEach((row) =>
        {   
            sendEntryToKinesisStream(row, streamName);
        });
    })
}

//read the file, foreach over and pass to sendEntryToKinesisStream
function sendEntryToKinesisStream(entry, streamName){
    var kinesisRecord = {
        Data: JSON.stringify(entry),
        PartitionKey: entry.Id.toString(),
        StreamName: streamName
    };
    kinesis.putRecord(kinesisRecord, function (err, data){
        if(err){
            console.log('ERROR ' + err);
        }
        else {
            console.log(entry.Id + ' added to Kinesis stream');
        }
    });
}

async function createIndexIfNotExists(indexName) {
    let result = await elasticSearchClient.indices.exists({
        index: indexName
    });
    if (result.body === true) {
        console.log(indexName + ' already exists');
    } else {
        console.log(indexName + ' will be created');
        await createIndex(indexName);
    }
}

async function createIndex(indexName) {
    let result = await elasticSearchClient.indices.create({
        index: indexName,
        body: {
            "mappings": {
                "properties": {
                    "Id": {
                        "type": "integer"
                    },
                    "play": {
                        "type": "text",
                        "fields": {
                            "raw": {
                                "type": "keyword" 
                            }
                        }
                    },
                    "characterLineNumber": {
                        "type": "integer"
                    },
                    "actSceneLine": {
                        "type": "text"
                    },
                    "character": {
                        "type": "text",
                        "fields": {
                            "raw": {
                                "type": "keyword"
                            }
                        }
                    },
                    "line": {
                        "type": "text"
                    },
                }
            }
        }
    });
    console.log(result.statusCode);
}

async function seed(indexName, streamName) {
    await createIndexIfNotExists(indexName);
    readCSVAndSendToKinesis(streamName);
}

seed("shakespeare", "you kinesis stream name");

That’s it, now you have infrastructure as code, a lambda to bulk index in ElasticSearch, and a small application to send data to Kinesis. All that’s left is to add an API to perform a search, I have done this before using HttpClient as shown in this post, but in the next post I’m going to use and ElasticSearch client to for .NET to perform the search.

Full source code available here.

Indexing the Works of Shakespeare in ElasticSearch – Part 2, Bulk Indexing

Full source code available here, look inside the lambda folder.

This is part two of my series on indexing the works of Shakespeare in ElasticSearch. In part one I setup the infrastructure as code where I created all the necessary AWS resources, including the lambda that is used for indexing data in bulk. But in that post I didn’t explain how the lambda works.
Indexing in bulk is a more reliable and scalable way to index as it is easy to overwhelm small ElasticSearch instances if you are indexing a high volume of documents one at time.

The Lambda
The code provided is based on an example from Elastic Co.

I add the AWS SDK and an ElasticSearch connector. The connector provides makes it possible to use IAM authentication when calling ElasticSearch.

'use strict'
const AWS = require('aws-sdk');

require('array.prototype.flatmap').shim();

const { Client } = require("@elastic/elasticsearch"); // see here for more  https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/bulk_examples.html
const elasticSearchConnector = require('aws-elasticsearch-connector');

const client = new Client({
    ...elasticSearchConnector(AWS.config),
    node: "https://" + process.env.esUrl
});

The handler is simple, it reads the incoming records from Kinesis adds them to an array and calls the bulk indexing method –

exports.handler = (event, context, callback) => {
    let dataToIndex = [];

    if(event.Records != null){
        event.Records.forEach(record => {
            let rawData = Buffer.from(record.kinesis.data, 'base64').toString("ascii");
            let obj = JSON.parse(rawData);
            dataToIndex.push(obj);
        });

        if(dataToIndex.length > 0) {
            indexDataInElasticSearch(dataToIndex, 'shakespeare'); // this could be passed in from via th stream data too
        }
    }
    callback(null, "data indexed");
};
async function indexDataInElasticSearch(dataToIndex, indexName) { 
    console.log('Seeding...' + dataToIndex[0].Id + " - " + dataToIndex[dataToIndex.length - 1].Id);
    const body = dataToIndex.flatMap(entry => [{ index: { _index: indexName, _id: entry.Id, _type: '_doc' } }, entry]);
    const { body: bulkResponse } =   await client.bulk({ refresh: true, body });
}

That’s it, now all we need is a way of sending data to Kinesis, and that will be in the next post.

Full source code available here, look inside the lambda folder.

Indexing the Works of Shakespeare in ElasticSearch – Part 1, Infrastructure as Code

Full source code available here.

WARNING – be careful when using this, Kinesis costs money and is not on the AWS free tier. At time of writing a couple ElasticSearch instance types are included with the free tier, but you can only have one instance running at time. I made a mistake and spun up two ElasticSearch instances for a few days and ran up a small bill. I got in touch with AWS support, explained what I was doing, and they refunded me, very helpful and understanding.

This is part one of a three parter where I’m going to show how to index the complete works of Shakespeare in ElasticSearch. This first part will setup the infrastructure on AWS. The second will go through the lambda that bulk loads data into ElasticSearch. The third will show how to, in Node.js, create the index on the ElasticSearch domain, read the works of Shakespeare from CSV and send to Kinesis.

Introduction
A few weeks ago I wrote a post describing how to get ElasticSearch up and running on AWS using Pulumi. It was a simple approach with expectation that the user would send documents directly to ElasticSearch for indexing. This is fine if all you are doing are some experiments, but if you are loading a lot of data and especially if you are sending the data in one document at a time you can easily overwhelm a small ElasticSearch instance and get socket timeouts or run simply run out of sockets.

One way of to reduce the likelihood of this occurring is to make bulk requests to index documents in ElasticSearch – instead of sending one document per request, send 100 or 1,000.

Another approach is to use AWS Kinesis in combination with bulk indexing.

Kinesis is reliable service that you can send thousands (or millions) of individual documents to, these documents are picked up in batches by a lambda that in turn sends them in bulk to ElasticSearch.

If for some reason the lambda fails to process the documents, Kinesis will deliver them to the lambda again to retry indexing.

What Infrastructure is Needed
Quite a lot is needed to get this up and running.

On the IaC side –

    An AWS Role
    An AWS policy attachment
    A Kinesis stream
    An ElasticSearch instance
    An AWS Lambda
    The code the Lambda will execute
    A mapping between the Kinesis stream and the Lambda

Outside of IaC, the following is needed and will be shown in upcoming posts –

    An ElasticSearch mapping document
    A tool to send data to Kinesis for indexing

I also want to limit access to ElasticSearch service to my IP address, this is easy to figure ouw with a call to an API like api.ipify.org.

The lambda needs a zip file with all my Node.js code. Normally this would be of your CI/CD pipeline, but I want to do this all in one so it’s included here. BEWARE, I was not able to create a valid zip for the AWS lambda with ZipFile.CreateFromDirectory, instead I used Ionic.Zip.

By default, Pulumi suffixes the names of your resources with random string, I don’t like this so I explicitly set names on everything. There is a little method at the top of the class to help adding a prefix to resource names like “test-01-“, or whatever you want.

The Stack

The IaC code starts with a query to a third party API to get the IP address my computer is using, there doesn’t seem to be an easy way to avoid using a call .Result on the httpClient.GetStringAsync call.

public MyStack()
{
    HttpClient httpClient = new HttpClient()
    {
        BaseAddress = new System.Uri("https://api.ipify.org/")
    };
    string myIPAddress  = httpClient.GetStringAsync("?format=text").Result;

I then zip up the Lambda source, this is not what you would normally when deploying a serious application but it’s useful for this demo. As mentioned above I’m using Ionic.Zip because I could not get the zip file created by System.IO.Compression.ZipFile.CreateFromDirectory(..).

File.Delete("index.zip");
using (ZipFile zip = new ZipFile())
{
    zip.AddDirectory("lambda");
    zip.Save("index.zip");
}

IP address figured out, zip file in place, now I can start building the infrastructure.

Here it all is – this creates the role, policy attachment, Kinesis Stream, ElasticSearch domain, Lambda, Kinesis mapping to Lambda, and output the URL of the ElasticSearch domain.

var elasticsearch_indexing_role = new Aws.Iam.Role(PrefixName("elasticsearch_indexing_role"), new Aws.Iam.RoleArgs
{
    Name = PrefixName("elasticsearch_indexing_role"),
    AssumeRolePolicy = @"{
                            ""Version"": ""2012-10-17"",
                            ""Statement"": [
                                {
                                ""Action"": ""sts:AssumeRole"",
                                ""Principal"": {
                                    ""Service"": ""lambda.amazonaws.com""
                                },
                                ""Effect"": ""Allow"",
                                ""Sid"": """"
                                }
                            ]
                        }",
});

var lambdaKinesisPolicyAttachment = new Aws.Iam.PolicyAttachment(PrefixName("lambdaKinesisPolicyAttachment"), new Aws.Iam.PolicyAttachmentArgs
{
    Name = PrefixName("lambdaKinesisPolicyAttachment"),
    Roles =
    {
        elasticsearch_indexing_role.Name
    },
    PolicyArn = "arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole",
});

var elasticsearch_kinesis = new Aws.Kinesis.Stream(PrefixName("elasticsearch_kinesis"), new Aws.Kinesis.StreamArgs
{
    Name = PrefixName("elasticsearch_kinesis"),
    RetentionPeriod = 24,
    ShardCount = 1,
    ShardLevelMetrics =
    {
        "IncomingBytes",
        "OutgoingBytes",
    },
});

string esDomainName = PrefixName("elasticsearch");
var config = new Config();
var currentRegion = Output.Create(Aws.GetRegion.InvokeAsync());
var currentCallerIdentity = Output.Create(Aws.GetCallerIdentity.InvokeAsync());
var esDomain = new ElasticSearch.Domain(esDomainName, new ElasticSearch.DomainArgs
{
    DomainName = esDomainName,
    ClusterConfig = new ElasticSearch.Inputs.DomainClusterConfigArgs
    {
        InstanceType = "t2.small.elasticsearch",
    },
    EbsOptions = new DomainEbsOptionsArgs()
    {
        EbsEnabled = true,
        VolumeSize = 10,
        VolumeType = "gp2"
    },
    ElasticsearchVersion = "7.8",
    AccessPolicies = Output.Tuple(currentRegion, currentCallerIdentity, elasticsearch_indexing_role.Arn).Apply(values =>
    {
        var currentRegion = values.Item1;
        var currentCallerIdentity = values.Item2;
        return $@"
        {{
            ""Version"": ""2012-10-17"",
            ""Statement"": [
                {{
                    ""Effect"": ""Allow"",
                    ""Principal"": {{
                        ""AWS"": ""{values.Item3}""
                    }},
                    ""Action"": ""es:*"",
                    ""Resource"": ""arn:aws:es:{currentRegion.Name}:{currentCallerIdentity.AccountId}:domain/{esDomainName}/*""
                }},
                {{
                    ""Action"": ""es:*"",
                    ""Principal"": {{
                        ""AWS"": ""*""
                    }},
                    ""Effect"": ""Allow"",
                    ""Resource"": ""arn:aws:es:{currentRegion.Name}:{currentCallerIdentity.AccountId}:domain/{esDomainName}/*"",
                    ""Condition"": {{
                        ""IpAddress"": {{""aws:SourceIp"": [""{myIPAddress}""]}}
                    }}
                }}
            ]
        }}
        ";
    }),
});
this.ESDomainEndpoint = esDomain.Endpoint;


var lambdaEnvironmentVariables = new Aws.Lambda.Inputs.FunctionEnvironmentArgs();
lambdaEnvironmentVariables.Variables.Add("esUrl", esDomain.Endpoint);

var elasticsearch_indexing_function = new Aws.Lambda.Function(PrefixName("elasticsearch_indexing_function"), new Aws.Lambda.FunctionArgs
{
    Handler = "index.handler",
    MemorySize = 128,
    Name = PrefixName("elasticsearch_indexing_function"),
    Publish = false,
    ReservedConcurrentExecutions = -1,
    Role = elasticsearch_indexing_role.Arn,
    Runtime = "nodejs12.x",
    Timeout = 4,
    Code = new FileArchive("index.zip"),
    Environment = lambdaEnvironmentVariables
});

var elasticsearch_indexing_event_source = new Aws.Lambda.EventSourceMapping(PrefixName("elasticsearch_indexing_event_source"), new Aws.Lambda.EventSourceMappingArgs
{
    EventSourceArn = elasticsearch_kinesis.Arn,
    FunctionName = elasticsearch_indexing_function.Arn,
    StartingPosition = "LATEST",
});

Finally I want to print out the URL of the ElasticSearch domain.

[Output]
public Output<string> ESDomainEndpoint { get; set; }

That’s it, now all that is left is to deploy and wait…quite…a…while…for ElasticSearch to startup, sometimes as much as 20 minutes.

To deploy run –

pulumi up

And now you wait.

Part two coming soon.

Full source code available here.

Working with JSON in .NET, a better way?

Full source code available here.

Two recent experiences with C# and JSON frustrated me with how difficult it is to work JSON inside an application. I have also been learning Node.js and contrasting the ease of use there with C# is, shocking. In C# the developer is generally expected to create class structures that represent the JSON they want to produce or consume and for most of my career that has been fine, I usually had to work on quite fixed JSON, with quite fixed classes.

An example might be JSON that represents customers, orders and order items. Easy enough to make C# classes that represent them, and it having classes means its is easy to work with the customer, order or order item inside your code.

But more recently I have been working with ElasticSearch and Pulumi.

In the case of ElasticSearch, querying it is done through HTTP requests with complex JSON that can change significantly between requests. The JSON can be many layers deep and combine searching across multiple fields, sorting, paging, specifying fields to return, and other functionality.

Here is a simple query, I built this using Visual Studio Rest Client. To use this inside a C# application I have to escape all the “, {, and } characters and I have do it such a way that allows me substitute in the values I want.

This is the raw JSON –

{
    "query": {
        "match_phrase_prefix": {
            "fullName" : "Joe"
        }
    },
    "from": 0,
    "size": 2
}

Escaping and getting it to work with a request from HttpClient took a while, and to my mind it looks awful –

string query = @"
                {{
                    ""query"": {{
                        ""match_phrase_prefix"": {{
                            ""fullName"" : ""{0}""
                        }}
                    }},
                    ""from"": {1},
                    ""size"": {2}
                }}";

Here is a more realistic and not so complicated query with ElasticSearch, now try to escape that support substitutions for each value!

{
    "query":{
        "bool": {
            "must": [
                { "match": { "address.city": "New York" } }
               ,{ "match_phrase_prefix": { "lastName": "Sanders" } }
            ]
            ,"must_not": [
                {"range": {"dateOfBirth" : {"gte": "1980-01-01", "lte": "2000-01-01" }}}
            ]
        }
    }
    ,"sort": { "customerId" : {"order": "asc"} }
    ,"size": 100
    ,"from": 0 
    ,"_source": ["firstName", "lastName"]
}

You might rightly ask why I don’t use the provided libraries from the Elastic company. Well, I am working on a system that uses multiple languages, I do my experiments and testing with a HTTP client, and the last thing I want to do is convert everything from JSON to a significantly different formats for each programming language. JSON is also the first class citizen of ElasticSearch, I don’t want to find out later that the .NET client has not kept up with features provided by ElasticSearch. JSON is also very easy to share with colleagues.

What To Do
I am going to store my JSON in a file that becomes part of my source code, deserialize it into a dynamic object, set the values on the fields I want to change, serialize it back to a string and use that string in my requests. It is not as complicated as that might sound and way better than escaping the JSON.

Let’s take the first ElasticSearch query, here again is the raw JSON, I save it to file named ElasticSearchQuery.json.

{
  "query": {
      "match_phrase_prefix": {
          "fullName" : ""
      }
  },
  "from": 0,
  "size": 0
}

And here is how I read, set values and serialize it again –

private string GetElasticSearchQuery(string fullName, int from, int size)
{
    string elasticSearchQuery = File.ReadAllText("ElasticSearchQuery.json");
    dynamic workableElasticSearchQuery = JsonConvert.DeserializeObject(elasticSearchQuery);

    workableElasticSearchQuery.query.match_phrase_prefix.fullName = fullName;
    workableElasticSearchQuery.from = from;
    workableElasticSearchQuery.size = size;

    return workableElasticSearchQuery.ToString();
}

Line 3 reads the JSON file into a string.
Line 4 turns the string into a dynamic object.
Lines 6,7,8 set the values I want.
Line 10 returns a nice JSON string that can be used with a HttpClient to make request to ElasticSearch.

But some ElasticSearch queries are a little harder to work with because a query can include a bool. This example is in the file ElasticSearchQuery.json.

{
    "query": {
        "bool": {
            "must": [
                {"match_phrase_prefix": { "lastName" : "" } }
                ,{"match": { "address.state" : ""} } 
            ]
        }
    }
}

The dynamic object will not allow us to use “bool” because it is reserved word in C#, but you can put an “@” in front of it, and now it will work –

private string GetElasticSearchQuery2(string lastName, string state)
{
    string elasticSearchQuery2 = File.ReadAllText("ElasticSearchQuery2.json");
    dynamic workableElasticSearchQuery2 = JsonConvert.DeserializeObject(elasticSearchQuery2);

    workableElasticSearchQuery2.query.@bool.must[0].match_phrase_prefix.lastName = lastName;
    workableElasticSearchQuery2.query.@bool.must[1].match = new JObject(new JProperty("address.state", state));

    return workableElasticSearchQuery2.ToString();
}

And again the string produced can be used with a HttpClient.

Full source code available here.

Getting Started with ElasticSearch, Part 3 – Deploying to AWS with Pulumi

Full source code available here.

This is part 3 of my short introduction to ElasticSearch. In the first part I showed how to create an ElasticSearch index, mapping, and seeded it with data. In the second I used HttpClientFactory and a typed client to query the index. In this part I going to show you how to setup ElasticSearch in AWS using infrastructure as code. Be careful, AWS charges for these things.

A few months ago Pulumi added C# to their list of supported languages. If you haven’t heard of them, they are building a tool that lets you create the IaC in a familiar programming language, at the time of writing they support TypeScript, JavaScript, Python, Go and C#. Writing in a programming language makes it easy to work with things like loops and conditionals, if you are unfamiliar with IaC, those two simple things can be extremely challenging or impossible with other tools.

I’m going to write my IaC in C#.

I’m not going to walk you through installing Pulumi, their site has all the info you need for that.

The IaC Project
Once you have installed Pulimi and tested that the command works, create a new directory called ElasticSearchDeploy.

Change to that directory and run –

pulumi new aws-csharp

Follow the instructions and open the project in VS Code or Visual Studio.

Delete the MyStack.cs file.
Create a file named MyElasticSearchStack.cs.

Paste in the below code –

using Pulumi;
using ElasticSearch = Pulumi.Aws.ElasticSearch;
using Aws = Pulumi.Aws;
using Pulumi.Aws.ElasticSearch.Inputs;

class MyElasticSearchStack : Stack
{
    public MyElasticSearchStack()
    {
        string myIPAddress = "x.x.x.x" you need to put your IP address here;
        string esDomainName = "myelasticesearch";
        var config = new Config();
        var currentRegion = Output.Create(Aws.GetRegion.InvokeAsync());
        var currentCallerIdentity = Output.Create(Aws.GetCallerIdentity.InvokeAsync());
        var esDomain = new ElasticSearch.Domain(esDomainName, new ElasticSearch.DomainArgs
        {
            DomainName = esDomainName,
            ClusterConfig = new ElasticSearch.Inputs.DomainClusterConfigArgs
            {
                InstanceType = "t2.small.elasticsearch",
            },
            EbsOptions = new DomainEbsOptionsArgs()
            {
                EbsEnabled = true,
                VolumeSize = 10,
                VolumeType = "gp2"
            },
            ElasticsearchVersion = "7.7",
            AccessPolicies = Output.Tuple(currentRegion, currentCallerIdentity).Apply(values =>
            {
                var currentRegion = values.Item1;
                var currentCallerIdentity = values.Item2;
                return @$"
                {{
                    ""Version"": ""2012-10-17"",
                    ""Statement"": [
                        {{
                            ""Action"": ""es:*"",
                            ""Principal"": {{
                                ""AWS"": ""*""
                            }},
                            ""Effect"": ""Allow"",
                            ""Resource"": ""arn:aws:es:{currentRegion.Name}:{currentCallerIdentity.AccountId}:domain/{esDomainName}/*"",
                            ""Condition"": {{
                                ""IpAddress"": {{""aws:SourceIp"": [""{myIPAddress}""]}}
                            }}
                        }}
                    ]
                    }}
                ";
            }),
        });
        this.ESDomainEndpoint =  esDomain.Endpoint;
    }
    [Output]
    public Output<string> ESDomainEndpoint { get; set; }
}

Note on line 10, you need to put in the IP address you are using. Checking this with a site like https://ipstack.com/.

In Program.cs change the reference my MyStack to MyElasticSearchStack.

That’s it.

Deploying
Go to the command line, run –

pulumi up

Select ‘yes’ and then wait about 10 to 15 minutes as AWS gets your ElasticSearch domain up and running. In the output of the command you willsee the url of the ElasticSearch domain you just created, use that in the scripts from part 1 of this series.

You can also go to the AWS console, you should see something like –

There you go – ElasticSearch index creating, seeding, querying, and infrastructure as code.

In a follow up post I’ll show you how to deploy ElasticSearch with Terraform.

The JSON Problem
For those of you that dislike horribly escaped blocks of JSON inside C#, as I do, I am working on a post that will make this much nicer to look at, and to work with.

Full source code available here.

Getting Started with ElasticSearch, Part 2 – Searching with a HttpClient

Full source code available here.

In the previous blog post I showed how to setup ElasticSearch, create and index and seed the index with some sample documents. That is not a lot of use without the ability to search it.

In this post I will show how to use a typed HttpClient to perform searches. I’ve chosen not to use the two libraries provided by the Elasticsearch company because I want to stick with JSON requests that I can write and test with any tool like Fiddler, Postman or Rest Client for Visual Studio Code.

If you haven’t worked with HttpClientFactory you can check out my posts on it or the Microsoft docs page.

The Typed HttpClient
A typed HttpClient lets you, in a sense, hide away that you are using a HttpClient at all. The methods the typed client exposes are more business related than technical – the the type of request, the body of the request, how the response is handled are all hidden away from the consumer. Using a typed client feels like using any other class with exposed methods.

This typed client will expose three methods, one to search by company name, one to search my customer name and state, and one to return all results in a paged manner.

Start with an interface that specifies the methods to expose –

public interface ISearchService
{
    Task<string> CompanyName(string companyName);
    Task<string> NameAndState(string name, string state);
    Task<string> All(int skip, int take, string orderBy, string direction);
}

Create the class that implements that interface and takes a HttpClient as a constructor parameter –

public class SearchService : ISearchService
{
    private readonly HttpClient _httpClient;
    public SearchService(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }
    //snip...

Implement the search functionality (and yes, I don’t like the amount of escaping I need to send a simple request with a JSON body) –

public async Task<string> CompanyName(string companyName)
{
    string query = @"{{
                        ""query"": {{
                            ""match_phrase_prefix"": {{ ""companyName"" : ""{0}"" }} 
                        }}
                    }}";
    string queryWithParams = string.Format(query, companyName);
    return await SendRequest(queryWithParams);
}

public async Task<string> NameAndState(string name, string state)
{
    string query = @"{{
                        ""query"": {{
                            ""bool"": {{
                                ""must"": [
                                    {{""match_phrase_prefix"": {{ ""fullName"" : ""{0}"" }} }}
                                    ,{{""match"": {{ ""address.state"" : ""{1}""}}}} 
                                ]
                            }}
                        }}
                    }}";
    string queryWithParams = string.Format(query, name, state);
    return await SendRequest(queryWithParams);
}

public async Task<string> All(int skip, int take, string orderBy, string direction)
{
    string query = @"{{
                    ""sort"":{{""{2}"": {{""order"":""{3}""}}}},
                    ""from"": {0},
                    ""size"": {1}
                    }}";

    string queryWithParams = string.Format(query, skip, take, orderBy, direction);
    return await SendRequest(queryWithParams);
}

And finally send the requests to the ElasticSearch server –

private async Task<string> SendRequest(string queryWithParams)
{
    var request = new HttpRequestMessage()
    {
        Method = HttpMethod.Get,
        Content = new StringContent(queryWithParams, Encoding.UTF8, "application/json")
    };
    var response = await _httpClient.SendAsync(request);
    var content = await response.Content.ReadAsStringAsync();
    return content;
}

The Setup
That’s the typed client taken care of, but it has be added to the HttpClientFactory, that is done in Startup.cs.

In the ConfigureServices(..) method add this –

services.AddHttpClient<ISearchService, SearchService>(client =>
{
    client.BaseAddress = new Uri("http://localhost:9200/customers/_search");
});

I am running ElasticSearch on localhost:9200.

That’s all there is to registering the HttpClient with the factory. Now all that is left is to use the typed client in the controller.

Searching
The typed client is passed to the controller via constructor injection –

[ApiController]
[Route("[controller]")]
public class SearchController : ControllerBase
{
    private readonly ISearchService _searchService;
    public SearchController(ISearchService searchService)
    {
        _searchService = searchService;
    }
    //snip...

Add some action methods to respond to API requests and call the methods on the typed client.

[HttpGet("company/{companyName}")]
public async Task<ActionResult> GetCompany(string companyName)
{
    var result = await _searchService.CompanyName(companyName);
    return Ok(result);
}

[HttpGet("nameAndState/")]
public async Task<ActionResult> GetNameAndState(string name, string state)
{
    var result = await _searchService.NameAndState(name, state);
    return Ok(result);
}

[HttpGet("all/")]
public async Task<ActionResult> GetAll(int skip = 0, int take = 10, string orderBy = "dateOfBirth", string direction = "asc")
{
    var result = await _searchService.All(skip, take, orderBy, direction);
    return Ok(result);
}

That’s it. You are up and running with ElasticSearch, a seeded index, and an API to perform searches.

In the next post I’ll show you how to deploy ElasticSearch to AWS with some nice infrastructure as code.

Full source code available here.

Getting Started with ElasticSearch, Part 1 – Seeding

Full source code available here.

This is the first in a short series of blog posts that will get you started with ElasticSearch. In this you will deploy seed and query ElasticSearch from your own computer. The next will add a .NET Core API into the mix as a ‘frontend’ for ElasticSearch. And the last will show how to deploy an ElasticSearch domain on AWS using an infrastructure as code tool.

This post will show you how to create a simple document mapping, seed the ElasticSearch index and perform some simple queries. It is not a substitute for reading the docs, it is more of step up to help you get going.

Getting Started
Download and install the latest version of ElasticSearch, this post was written when 7.7 was the up to date version, I mention this because if you are reading this when a version > 7 is available the steps may not work – ElasticSearch is known for making major breaking changes in major releases.

Start up ElasticSearch by changing to its directory and running –

bin/elasticsearch

It will start on localhost:9200 by default.

If you are using Visual Studio Code I suggest installing the Rest Client extension. The elasticSearch.http file in attached zip contains examples of how to create and delete indexes, add mappings, and perform queries.

At the top of my elasticSearch.http I have two variables that will be used throughout the rest of the file, these define the host where ElasticSearch is running and the name of the index I’m working with

@elasticSearchHost = http://localhost:9200
@index = customers

To see the indexes that are already in place run this –

GET {{elasticSearchHost}}/_cat/indices?v&pretty

Adding an Index with Rest Client

Let’s add the customer index with Visual Studio Rest Client, of course you can use Postman, Fiddler or any tool of your choosing –

PUT {{elasticSearchHost}}/{{index}}
Content-Type: application/json

{
  "mappings": {
    "properties": {
      "companyName": {
        "type": "text"
      },
      "customerId": {
        "type": "integer"
      },
      "dateOfBirth": {
        "type": "date"
      },
      "email": {
        "type": "text"
      },
      "firstName": {
        "type": "text",
        "copy_to": "fullName"
      },
      "middleName": {
        "type": "text",
        "copy_to": "fullName"
      },
      "lastName": {
        "type": "text",
        "copy_to": "fullName"
      },
      "fullName": {
        "type": "text"
      },      
      "mobileNumber": {
        "type": "text"
      },
      "officeNumber": {
        "type": "text"
      },
      "address": {
        "properties": {
          "line1": {
            "type": "text",
            "copy_to": "fullAddress"
          },
          "line2": {
            "type": "text",
            "copy_to": "fullAddress"
          },
          "city": {
            "type": "text",
            "copy_to": "fullAddress"
          },
          "state": {
            "type": "text",
            "copy_to": "fullAddress"
          },
          "zip": {
            "type": "text",
            "copy_to": "fullAddress"
          }
        }
      },
      "fullAddress": {
          "type": "text"
      }
    }
  }
}

Run the request to list indexes again and you will see the customers index.

GET {{elasticSearchHost}}/_cat/indices?v&pretty

Now you have an index full of nothing, docs.count is 0 –

health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customers RDWD3Q75TVqf7VkvO932mA   1   1          0            0       208b           208b

Seeding
Time to switch to the seeder. This is Node.js program that checks if the customers index exists, creates it if it does not, and seeds the index with 5000 customer documents.

I’m not going to go into how it works as I am learning Node.js now and I’m sure it is not as good as it should be. You can execute it by running –

npm install
node seed.js

Now you should see a different result when you look at the indexes on the ElasticSearch server.

GET {{elasticSearchHost}}/_cat/indices?v&pretty

health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customers wsdQcJ1IQNOXQ-QQIX_a_Q   1   1       5000            0      2.4mb          2.4mb

Here are a few examples of requests you can make to ElasticSearch.

###
# delete an index, BE CAREFUL WITH THIS ONE
DELETE {{elasticSearchHost}}/{{index}}?pretty

###
# retrieve a document from the index by its id
GET {{elasticSearchHost}}/{{index}}/_doc/1

###
# search the index with no query, this will match all documents, but return only the first few
GET {{elasticSearchHost}}/{{index}}/_search

###
# retrieve a page of results with no query
GET {{elasticSearchHost}}/{{index}}/_search
Content-Type: application/json

{
  "sort":{"dateOfBirth": {"order":"asc"}},
  "from": 0,
  "size": 10
}

###
# search for company names that match the word 'Turcotte', you might need to change this name
GET {{elasticSearchHost}}/{{index}}/_search
Content-Type: application/json

{
    "query": {
        "match_phrase_prefix": {
            "companyName" : "Turcotte" 
        } 
    }
}

###
# search for people in Utah with the name Keith (first, middle or last), you might need to change these parmas.
GET {{elasticSearchHost}}/{{index}}/_search
Content-Type: application/json

{
    "query": {
        "bool": {
            "must": [
                {"match_phrase_prefix": { "fullName" : "Keith" } }
               ,{"match": { "address.state" : "Utah"}} 
            ]
        }
    }
}

That’s it for now.

In the next blog post I will show you how to use `HttpClientFactory` and typed clients to perform searches in .NET Core.

Full source code available here.