Indexing the Works of Shakespeare in ElasticSearch – Part 2, Bulk Indexing

Full source code available here, look inside the lambda folder.

This is part two of my series on indexing the works of Shakespeare in ElasticSearch. In part one I setup the infrastructure as code where I created all the necessary AWS resources, including the lambda that is used for indexing data in bulk. But in that post I didn’t explain how the lambda works.
Indexing in bulk is a more reliable and scalable way to index as it is easy to overwhelm small ElasticSearch instances if you are indexing a high volume of documents one at time.

The Lambda
The code provided is based on an example from Elastic Co.

I add the AWS SDK and an ElasticSearch connector. The connector provides makes it possible to use IAM authentication when calling ElasticSearch.

'use strict'
const AWS = require('aws-sdk');

require('array.prototype.flatmap').shim();

const { Client } = require("@elastic/elasticsearch"); // see here for more  https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/bulk_examples.html
const elasticSearchConnector = require('aws-elasticsearch-connector');

const client = new Client({
    ...elasticSearchConnector(AWS.config),
    node: "https://" + process.env.esUrl
});

The handler is simple, it reads the incoming records from Kinesis adds them to an array and calls the bulk indexing method –

exports.handler = (event, context, callback) => {
    let dataToIndex = [];

    if(event.Records != null){
        event.Records.forEach(record => {
            let rawData = Buffer.from(record.kinesis.data, 'base64').toString("ascii");
            let obj = JSON.parse(rawData);
            dataToIndex.push(obj);
        });

        if(dataToIndex.length > 0) {
            indexDataInElasticSearch(dataToIndex, 'shakespeare'); // this could be passed in from via th stream data too
        }
    }
    callback(null, "data indexed");
};
async function indexDataInElasticSearch(dataToIndex, indexName) { 
    console.log('Seeding...' + dataToIndex[0].Id + " - " + dataToIndex[dataToIndex.length - 1].Id);
    const body = dataToIndex.flatMap(entry => [{ index: { _index: indexName, _id: entry.Id, _type: '_doc' } }, entry]);
    const { body: bulkResponse } =   await client.bulk({ refresh: true, body });
}

That’s it, now all we need is a way of sending data to Kinesis, and that will be in the next post.

Full source code available here, look inside the lambda folder.

Indexing the Works of Shakespeare in ElasticSearch – Part 1, Infrastructure as Code

Full source code available here.

WARNING – be careful when using this, Kinesis costs money and is not on the AWS free tier. At time of writing a couple ElasticSearch instance types are included with the free tier, but you can only have one instance running at time. I made a mistake and spun up two ElasticSearch instances for a few days and ran up a small bill. I got in touch with AWS support, explained what I was doing, and they refunded me, very helpful and understanding.

This is part one of a three parter where I’m going to show how to index the complete works of Shakespeare in ElasticSearch. This first part will setup the infrastructure on AWS. The second will go through the lambda that bulk loads data into ElasticSearch. The third will show how to, in Node.js, create the index on the ElasticSearch domain, read the works of Shakespeare from CSV and send to Kinesis.

Introduction
A few weeks ago I wrote a post describing how to get ElasticSearch up and running on AWS using Pulumi. It was a simple approach with expectation that the user would send documents directly to ElasticSearch for indexing. This is fine if all you are doing are some experiments, but if you are loading a lot of data and especially if you are sending the data in one document at a time you can easily overwhelm a small ElasticSearch instance and get socket timeouts or run simply run out of sockets.

One way of to reduce the likelihood of this occurring is to make bulk requests to index documents in ElasticSearch – instead of sending one document per request, send 100 or 1,000.

Another approach is to use AWS Kinesis in combination with bulk indexing.

Kinesis is reliable service that you can send thousands (or millions) of individual documents to, these documents are picked up in batches by a lambda that in turn sends them in bulk to ElasticSearch.

If for some reason the lambda fails to process the documents, Kinesis will deliver them to the lambda again to retry indexing.

What Infrastructure is Needed
Quite a lot is needed to get this up and running.

On the IaC side –

    An AWS Role
    An AWS policy attachment
    A Kinesis stream
    An ElasticSearch instance
    An AWS Lambda
    The code the Lambda will execute
    A mapping between the Kinesis stream and the Lambda

Outside of IaC, the following is needed and will be shown in upcoming posts –

    An ElasticSearch mapping document
    A tool to send data to Kinesis for indexing

I also want to limit access to ElasticSearch service to my IP address, this is easy to figure ouw with a call to an API like api.ipify.org.

The lambda needs a zip file with all my Node.js code. Normally this would be of your CI/CD pipeline, but I want to do this all in one so it’s included here. BEWARE, I was not able to create a valid zip for the AWS lambda with ZipFile.CreateFromDirectory, instead I used Ionic.Zip.

By default, Pulumi suffixes the names of your resources with random string, I don’t like this so I explicitly set names on everything. There is a little method at the top of the class to help adding a prefix to resource names like “test-01-“, or whatever you want.

The Stack

The IaC code starts with a query to a third party API to get the IP address my computer is using, there doesn’t seem to be an easy way to avoid using a call .Result on the httpClient.GetStringAsync call.

public MyStack()
{
    HttpClient httpClient = new HttpClient()
    {
        BaseAddress = new System.Uri("https://api.ipify.org/")
    };
    string myIPAddress  = httpClient.GetStringAsync("?format=text").Result;

I then zip up the Lambda source, this is not what you would normally when deploying a serious application but it’s useful for this demo. As mentioned above I’m using Ionic.Zip because I could not get the zip file created by System.IO.Compression.ZipFile.CreateFromDirectory(..).

File.Delete("index.zip");
using (ZipFile zip = new ZipFile())
{
    zip.AddDirectory("lambda");
    zip.Save("index.zip");
}

IP address figured out, zip file in place, now I can start building the infrastructure.

Here it all is – this creates the role, policy attachment, Kinesis Stream, ElasticSearch domain, Lambda, Kinesis mapping to Lambda, and output the URL of the ElasticSearch domain.

var elasticsearch_indexing_role = new Aws.Iam.Role(PrefixName("elasticsearch_indexing_role"), new Aws.Iam.RoleArgs
{
    Name = PrefixName("elasticsearch_indexing_role"),
    AssumeRolePolicy = @"{
                            ""Version"": ""2012-10-17"",
                            ""Statement"": [
                                {
                                ""Action"": ""sts:AssumeRole"",
                                ""Principal"": {
                                    ""Service"": ""lambda.amazonaws.com""
                                },
                                ""Effect"": ""Allow"",
                                ""Sid"": """"
                                }
                            ]
                        }",
});

var lambdaKinesisPolicyAttachment = new Aws.Iam.PolicyAttachment(PrefixName("lambdaKinesisPolicyAttachment"), new Aws.Iam.PolicyAttachmentArgs
{
    Name = PrefixName("lambdaKinesisPolicyAttachment"),
    Roles =
    {
        elasticsearch_indexing_role.Name
    },
    PolicyArn = "arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole",
});

var elasticsearch_kinesis = new Aws.Kinesis.Stream(PrefixName("elasticsearch_kinesis"), new Aws.Kinesis.StreamArgs
{
    Name = PrefixName("elasticsearch_kinesis"),
    RetentionPeriod = 24,
    ShardCount = 1,
    ShardLevelMetrics =
    {
        "IncomingBytes",
        "OutgoingBytes",
    },
});

string esDomainName = PrefixName("elasticsearch");
var config = new Config();
var currentRegion = Output.Create(Aws.GetRegion.InvokeAsync());
var currentCallerIdentity = Output.Create(Aws.GetCallerIdentity.InvokeAsync());
var esDomain = new ElasticSearch.Domain(esDomainName, new ElasticSearch.DomainArgs
{
    DomainName = esDomainName,
    ClusterConfig = new ElasticSearch.Inputs.DomainClusterConfigArgs
    {
        InstanceType = "t2.small.elasticsearch",
    },
    EbsOptions = new DomainEbsOptionsArgs()
    {
        EbsEnabled = true,
        VolumeSize = 10,
        VolumeType = "gp2"
    },
    ElasticsearchVersion = "7.8",
    AccessPolicies = Output.Tuple(currentRegion, currentCallerIdentity, elasticsearch_indexing_role.Arn).Apply(values =>
    {
        var currentRegion = values.Item1;
        var currentCallerIdentity = values.Item2;
        return $@"
        {{
            ""Version"": ""2012-10-17"",
            ""Statement"": [
                {{
                    ""Effect"": ""Allow"",
                    ""Principal"": {{
                        ""AWS"": ""{values.Item3}""
                    }},
                    ""Action"": ""es:*"",
                    ""Resource"": ""arn:aws:es:{currentRegion.Name}:{currentCallerIdentity.AccountId}:domain/{esDomainName}/*""
                }},
                {{
                    ""Action"": ""es:*"",
                    ""Principal"": {{
                        ""AWS"": ""*""
                    }},
                    ""Effect"": ""Allow"",
                    ""Resource"": ""arn:aws:es:{currentRegion.Name}:{currentCallerIdentity.AccountId}:domain/{esDomainName}/*"",
                    ""Condition"": {{
                        ""IpAddress"": {{""aws:SourceIp"": [""{myIPAddress}""]}}
                    }}
                }}
            ]
        }}
        ";
    }),
});
this.ESDomainEndpoint = esDomain.Endpoint;


var lambdaEnvironmentVariables = new Aws.Lambda.Inputs.FunctionEnvironmentArgs();
lambdaEnvironmentVariables.Variables.Add("esUrl", esDomain.Endpoint);

var elasticsearch_indexing_function = new Aws.Lambda.Function(PrefixName("elasticsearch_indexing_function"), new Aws.Lambda.FunctionArgs
{
    Handler = "index.handler",
    MemorySize = 128,
    Name = PrefixName("elasticsearch_indexing_function"),
    Publish = false,
    ReservedConcurrentExecutions = -1,
    Role = elasticsearch_indexing_role.Arn,
    Runtime = "nodejs12.x",
    Timeout = 4,
    Code = new FileArchive("index.zip"),
    Environment = lambdaEnvironmentVariables
});

var elasticsearch_indexing_event_source = new Aws.Lambda.EventSourceMapping(PrefixName("elasticsearch_indexing_event_source"), new Aws.Lambda.EventSourceMappingArgs
{
    EventSourceArn = elasticsearch_kinesis.Arn,
    FunctionName = elasticsearch_indexing_function.Arn,
    StartingPosition = "LATEST",
});

Finally I want to print out the URL of the ElasticSearch domain.

[Output]
public Output<string> ESDomainEndpoint { get; set; }

That’s it, now all that is left is to deploy and wait…quite…a…while…for ElasticSearch to startup, sometimes as much as 20 minutes.

To deploy run –

pulumi up

And now you wait.

Part two coming soon.

Full source code available here.

VS Code Bug – Interpolation and Commented Lines, Workaround

This is a quick post with a workaround to a bug in VS Code that occurs when you use string with @$ – the verbatim character and the interpolation characters.

If you have a piece of code that looks like this –

string text = @$"any text as long as there is a new line
/*";

Everything that comes after it will look commented out, like so –

This is a bit annoying because all coloring is lost below that /*.

You might not have hit this, but if start using Pulumi for IaC in AWS, you will because many permissions on resources often end with /*, or if you want to print some slashes and stars to the console.

Fortunately, there is a very easy workaround! Use $@ instead of @$.

string text = $@"any text as long as there is a new line
/*";

Much better!

Reading CSV Files into Objects with Node.js

Full source code available here.

As I am learning Node.Js I am constantly surprised by how easy it is to do some things, and how difficult to do others, and how poor the examples out there are.

Case in point, I want to read some data from as CSV file and then send it off to ElasticSearch for indexing. The ElasticSearch part I have figured out already, and in Node.js that is easy.
I expected it to be trivial to find an example of reading a file, creating objects based on the rows, processing the objects, and doing all that in chunks of say 1000 rows. This kind of stuff is easy in C# and there are plenty of examples. Not the case is Node.js.

Here two ways of doing it.

You can download a CSV with all of Shakespeare’s plays, but the example you can download from above has just the first 650 lines from that file.

I want to read the file 100 rows at a time, put the rows into objects and then process the rows.

csv-parse
The first module I came across was csv-parse.

Run this from the console –

npm install csv-parse

I have a Entry class that represents a row from the CSV file. Its constructor takes six parameters that represent each of the columns in the CSV file.

In this example processing the data in chunks is not necessary, but when I process the full file with over 130,000 rows, chunking becomes important.

var fs = require('fs');
var parse = require('csv-parse');


function readCSV() {
    let entries = [];
    let count = 0;

    fs.createReadStream('shakespeare_plays_sample.csv')
        .pipe(parse({ delimiter: ';', from_line: 2 }))
        .on('data', function (row) {
            count++;
            entries.push(new Entry(row[0], row[1], row[2], row[3], row[4], row[5]))

            if (count % 100 == 0) {
                processEntries(entries);
                count = 0;
                entries = []; // clear the array
            }
        })
        .on('end', function () {
            processEntries(entries);
        });
}

function processEntries(entries) {
    console.log(entries[0].Id + "  to " + entries[entries.length - 1].Id);
}

class Entry {
    constructor(id, play, characterLineNumber, actSceneLine, character, line) {
        this.Id = id;
        this.Play = play;
        this.CharacterLineNumber = characterLineNumber;
        this.ActSceneLine = actSceneLine;
        this.Character = character;
        this.Line = line;
    }
}

readCSV();

Note how I have to make sure that the last chuck is also processed on line 22.

This approach seems fine, but then I found the csvtojson module.

csvtojson
This module makes what I’m trying to do a little easier by skipping over the need to explicitly construct an object with the data from the rows in the file.

First install the module –

npm install csvtojson

This is the code –

const csv=require('csvtojson');

function readCSV(){
    let entries = [];
    let count = 0;

    csv({delimiter:';'})
    .fromFile('./shakespeare_plays_sample.csv')
    .then((json)=>{
        json.forEach((row)=>
        {   
            count++;
            entries.push(row);
            if(count % 100 == 0){
                processEntries(entries);
                count = 0;
                entries = []; // clear the array
            }
        });
        processEntries(entries);
    })
    return entries;
}

function processEntries(entries){
    console.log(entries[0].Id + "  to " + entries[entries.length - 1].Id);
}

readCSV();

Again note how I process the chunks of 100, and then final chunk on line 20.

All of this is the aim of indexing all of Shakespeare’s works in ElasticSearch, and that I will show in another post.

Full source code available here.

Working with JSON in .NET, Infrastructure as Code with Pulumi

Full source code available here.

This is a follow up to my previous post where I used dynamic and JSON files to make querying ElasticSearch with a HttpClient much easier.

To deploy my ElasticSearch domain on AWS I used Pulumi. ElasticSearch requires a JSON policy to define the permissions. In the post linked above, I have a heavily escaped that This policy can be complex and needs values substituted into it. In the example below I need to pass in the region, account id, domain name and allowed IP address.

Here is a very simple policy with four substitutions –

"{{
""Version"": ""2012-10-17"",
""Statement"": [
    {{
        ""Action"": ""es:*"",
        ""Principal"": {{
            ""AWS"": ""*""
        }},
        ""Effect"": ""Allow"",
        ""Resource"": ""arn:aws:es:{currentRegion.Name}:{currentCallerIdentity.AccountId}:domain/{esDomainName}/*"",
        ""Condition"": {{
            ""IpAddress"": {{""aws:SourceIp"": [""{myIPAddress}""]}}
        }}
    }}
]
}}"

Just escaping this is not easy, and very prone to error. A more realistic policy would be significantly longer and would need more substitutions.

Using a JSON file
Here is what I think is an easier way. As in the previous post, the JSON file becomes part of my source code. It is deserialized into a dynamic object and the required values are set.

Here is the AWS policy as it appears in my JSON file. The resource (made up of region, account, and domain name) and IpAddress are left blank, but the structure of the policy is the same as you would paste into the AWS console.

{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "AWS": "*"
        },
        "Action": "es:*",
        "Resource": "",
        "Condition": {
          "IpAddress": {
            "aws:SourceIp": ""
          }
        }
      }
    ]
}

In my C# I read the file, deserialize, and set the values with simple C#.

Here is an example –

private string GetAWSElasticSearchPolicy(string region, string account, string elasticSearchDomainName, string allowedIPAddress)
{
    string blankPolicy = File.ReadAllText("AWSPolicy.json");
    dynamic awsElasticSearchPolicy = JsonConvert.DeserializeObject(blankPolicy);

    awsElasticSearchPolicy.Statement[0].Resource = $"arn:aws:es:{region}:{account}:domain/{elasticSearchDomainName}/*";
    awsElasticSearchPolicy.Statement[0].Condition.IpAddress = new JObject(new JProperty("aws:SourceIp", allowedIPAddress));

    return awsElasticSearchPolicy.ToString(); // this is correctly formatted JSON that can be used with Pulumi.
}

Line 3 reads the JSON file into a string.
Line 4 turns the string into a dynamic object.
Lines 6 & 7 set the values I want.
Line 9 returns a nice JSON string that can be used with Pulumi.

This is much cleaner than the heavily escaped version in this post.

Full source code available here.

Working with JSON in .NET, a better way?

Full source code available here.

Two recent experiences with C# and JSON frustrated me with how difficult it is to work JSON inside an application. I have also been learning Node.js and contrasting the ease of use there with C# is, shocking. In C# the developer is generally expected to create class structures that represent the JSON they want to produce or consume and for most of my career that has been fine, I usually had to work on quite fixed JSON, with quite fixed classes.

An example might be JSON that represents customers, orders and order items. Easy enough to make C# classes that represent them, and it having classes means its is easy to work with the customer, order or order item inside your code.

But more recently I have been working with ElasticSearch and Pulumi.

In the case of ElasticSearch, querying it is done through HTTP requests with complex JSON that can change significantly between requests. The JSON can be many layers deep and combine searching across multiple fields, sorting, paging, specifying fields to return, and other functionality.

Here is a simple query, I built this using Visual Studio Rest Client. To use this inside a C# application I have to escape all the “, {, and } characters and I have do it such a way that allows me substitute in the values I want.

This is the raw JSON –

{
    "query": {
        "match_phrase_prefix": {
            "fullName" : "Joe"
        }
    },
    "from": 0,
    "size": 2
}

Escaping and getting it to work with a request from HttpClient took a while, and to my mind it looks awful –

string query = @"
                {{
                    ""query"": {{
                        ""match_phrase_prefix"": {{
                            ""fullName"" : ""{0}""
                        }}
                    }},
                    ""from"": {1},
                    ""size"": {2}
                }}";

Here is a more realistic and not so complicated query with ElasticSearch, now try to escape that support substitutions for each value!

{
    "query":{
        "bool": {
            "must": [
                { "match": { "address.city": "New York" } }
               ,{ "match_phrase_prefix": { "lastName": "Sanders" } }
            ]
            ,"must_not": [
                {"range": {"dateOfBirth" : {"gte": "1980-01-01", "lte": "2000-01-01" }}}
            ]
        }
    }
    ,"sort": { "customerId" : {"order": "asc"} }
    ,"size": 100
    ,"from": 0 
    ,"_source": ["firstName", "lastName"]
}

You might rightly ask why I don’t use the provided libraries from the Elastic company. Well, I am working on a system that uses multiple languages, I do my experiments and testing with a HTTP client, and the last thing I want to do is convert everything from JSON to a significantly different formats for each programming language. JSON is also the first class citizen of ElasticSearch, I don’t want to find out later that the .NET client has not kept up with features provided by ElasticSearch. JSON is also very easy to share with colleagues.

What To Do
I am going to store my JSON in a file that becomes part of my source code, deserialize it into a dynamic object, set the values on the fields I want to change, serialize it back to a string and use that string in my requests. It is not as complicated as that might sound and way better than escaping the JSON.

Let’s take the first ElasticSearch query, here again is the raw JSON, I save it to file named ElasticSearchQuery.json.

{
  "query": {
      "match_phrase_prefix": {
          "fullName" : ""
      }
  },
  "from": 0,
  "size": 0
}

And here is how I read, set values and serialize it again –

private string GetElasticSearchQuery(string fullName, int from, int size)
{
    string elasticSearchQuery = File.ReadAllText("ElasticSearchQuery.json");
    dynamic workableElasticSearchQuery = JsonConvert.DeserializeObject(elasticSearchQuery);

    workableElasticSearchQuery.query.match_phrase_prefix.fullName = fullName;
    workableElasticSearchQuery.from = from;
    workableElasticSearchQuery.size = size;

    return workableElasticSearchQuery.ToString();
}

Line 3 reads the JSON file into a string.
Line 4 turns the string into a dynamic object.
Lines 6,7,8 set the values I want.
Line 10 returns a nice JSON string that can be used with a HttpClient to make request to ElasticSearch.

But some ElasticSearch queries are a little harder to work with because a query can include a bool. This example is in the file ElasticSearchQuery.json.

{
    "query": {
        "bool": {
            "must": [
                {"match_phrase_prefix": { "lastName" : "" } }
                ,{"match": { "address.state" : ""} } 
            ]
        }
    }
}

The dynamic object will not allow us to use “bool” because it is reserved word in C#, but you can put an “@” in front of it, and now it will work –

private string GetElasticSearchQuery2(string lastName, string state)
{
    string elasticSearchQuery2 = File.ReadAllText("ElasticSearchQuery2.json");
    dynamic workableElasticSearchQuery2 = JsonConvert.DeserializeObject(elasticSearchQuery2);

    workableElasticSearchQuery2.query.@bool.must[0].match_phrase_prefix.lastName = lastName;
    workableElasticSearchQuery2.query.@bool.must[1].match = new JObject(new JProperty("address.state", state));

    return workableElasticSearchQuery2.ToString();
}

And again the string produced can be used with a HttpClient.

Full source code available here.

Getting Started with ElasticSearch, Part 3 – Deploying to AWS with Pulumi

Full source code available here.

This is part 3 of my short introduction to ElasticSearch. In the first part I showed how to create an ElasticSearch index, mapping, and seeded it with data. In the second I used HttpClientFactory and a typed client to query the index. In this part I going to show you how to setup ElasticSearch in AWS using infrastructure as code. Be careful, AWS charges for these things.

A few months ago Pulumi added C# to their list of supported languages. If you haven’t heard of them, they are building a tool that lets you create the IaC in a familiar programming language, at the time of writing they support TypeScript, JavaScript, Python, Go and C#. Writing in a programming language makes it easy to work with things like loops and conditionals, if you are unfamiliar with IaC, those two simple things can be extremely challenging or impossible with other tools.

I’m going to write my IaC in C#.

I’m not going to walk you through installing Pulumi, their site has all the info you need for that.

The IaC Project
Once you have installed Pulimi and tested that the command works, create a new directory called ElasticSearchDeploy.

Change to that directory and run –

pulumi new aws-csharp

Follow the instructions and open the project in VS Code or Visual Studio.

Delete the MyStack.cs file.
Create a file named MyElasticSearchStack.cs.

Paste in the below code –

using Pulumi;
using ElasticSearch = Pulumi.Aws.ElasticSearch;
using Aws = Pulumi.Aws;
using Pulumi.Aws.ElasticSearch.Inputs;

class MyElasticSearchStack : Stack
{
    public MyElasticSearchStack()
    {
        string myIPAddress = "x.x.x.x" you need to put your IP address here;
        string esDomainName = "myelasticesearch";
        var config = new Config();
        var currentRegion = Output.Create(Aws.GetRegion.InvokeAsync());
        var currentCallerIdentity = Output.Create(Aws.GetCallerIdentity.InvokeAsync());
        var esDomain = new ElasticSearch.Domain(esDomainName, new ElasticSearch.DomainArgs
        {
            DomainName = esDomainName,
            ClusterConfig = new ElasticSearch.Inputs.DomainClusterConfigArgs
            {
                InstanceType = "t2.small.elasticsearch",
            },
            EbsOptions = new DomainEbsOptionsArgs()
            {
                EbsEnabled = true,
                VolumeSize = 10,
                VolumeType = "gp2"
            },
            ElasticsearchVersion = "7.7",
            AccessPolicies = Output.Tuple(currentRegion, currentCallerIdentity).Apply(values =>
            {
                var currentRegion = values.Item1;
                var currentCallerIdentity = values.Item2;
                return @$"
                {{
                    ""Version"": ""2012-10-17"",
                    ""Statement"": [
                        {{
                            ""Action"": ""es:*"",
                            ""Principal"": {{
                                ""AWS"": ""*""
                            }},
                            ""Effect"": ""Allow"",
                            ""Resource"": ""arn:aws:es:{currentRegion.Name}:{currentCallerIdentity.AccountId}:domain/{esDomainName}/*"",
                            ""Condition"": {{
                                ""IpAddress"": {{""aws:SourceIp"": [""{myIPAddress}""]}}
                            }}
                        }}
                    ]
                    }}
                ";
            }),
        });
        this.ESDomainEndpoint =  esDomain.Endpoint;
    }
    [Output]
    public Output<string> ESDomainEndpoint { get; set; }
}

Note on line 10, you need to put in the IP address you are using. Checking this with a site like https://ipstack.com/.

In Program.cs change the reference my MyStack to MyElasticSearchStack.

That’s it.

Deploying
Go to the command line, run –

pulumi up

Select ‘yes’ and then wait about 10 to 15 minutes as AWS gets your ElasticSearch domain up and running. In the output of the command you willsee the url of the ElasticSearch domain you just created, use that in the scripts from part 1 of this series.

You can also go to the AWS console, you should see something like –

There you go – ElasticSearch index creating, seeding, querying, and infrastructure as code.

In a follow up post I’ll show you how to deploy ElasticSearch with Terraform.

The JSON Problem
For those of you that dislike horribly escaped blocks of JSON inside C#, as I do, I am working on a post that will make this much nicer to look at, and to work with.

Full source code available here.

The Simplest Hello World in Node.js

Full source code available here.

I am learning Node.js and have found it a bit of a struggle to locate good, simple documentation. It feels like most people writing in the space assume a lot of existing knowledge, like you know how plenty of JavaScript, or how to effectively use web browser debug tools, have a good understanding of HTML and CSS. If you go down the rabbit hole of asynchronous programming in Node.js the writers often assume you know about the previous Node.js approach of asynchronous, try reading about async/await and you’ll be pointed to examples with promises, promisify, microtasks and callbacks, again assuming a lot of knowledge on the part of the reader.

I even asked a friend who was using Node.js to build a complex web site to write a simple Hello World program for me that I could run from the command line, but he didn’t know. This was a very experienced developer working in the industry for almost thirty years, but he started with Node.js inside a an existing web site.

If you search for how to write a “Hello world” in Node.js, you’ll find examples that setup web servers, take requests, return response, and probably use callbacks.

Because of what I perceive as a lack of documentation, I’m going write some simple blog posts about things I learn but could not find a simple example of.

To kick off, here is Hello World in Node.js.

I’m going assume you have Node and the Node Package Manager installed, and that you can run node and npm from the command line. If you do not have them installed there is plenty of help out there for that!

Hello World
Create a new directory called SimpleHelloWorld.

Open a terminal/console inside that directory.

Run –

npm init -y

This creates a package.json file, you will edit this in a moment.

Create a new file called helloworld.js.

Add a single line to the file –

console.log("Hello World");

At this point you can run –

node helloworld.js

And you will see “Hello World” printed. There you a simple Hello World program.

Let’s do one more thing, open package.json.

Replace the scripts section with –

 "scripts": {
    "start": "node helloworld.js"
  },

Now you can run –

npm start

Full source code available here.

Entity Framework Core 3.1 Bug vs 2.2, Speed and Memory During Streaming

Full source code available here.

A while ago I wrote a blog post about the advantages of streaming results from Entity Framework Core as opposed to materializing them inside a controller and the returning the results. I saw very significant improvements in memory consumption when streaming as shown in the chart below.

For this post I updated the code to use Entity Framework Core 3.1.8 on .NET Core 3.1 and was very surprised to see that streaming data did not work as before. Memory usage was high and the speed of the response was much poorer than I expected.

I tried Entity Framework Core 2.2.6 on .NET Core 3.1, and it behaved as I expected, fast and low memory.

Below is how I carried out my comparison.

The Setup
A simple Web API application using .NET Core 3.1.

A local db seeded with 10,000 rows of data using AutoFixture.

An API controller with a single action method that returns data from the database. Tracking of Entity Framework entities is turned off.

[Route("api/[controller]")]
[ApiController]
public class ProductsController : ControllerBase
{
    private readonly SalesContext _salesContext;

    public ProductsController(SalesContext salesContext)
    {
        _salesContext = salesContext;
    }

    [HttpGet("streaming/{count}")]
    public ActionResult GetStreamingFromService(int count)
    {
        IQueryable<Product> products = _salesContext.Products.OrderBy(o => o.ProductId).Take(count).AsNoTracking();
        return Ok(products);
    }
}

It’s a very simple application.

The Test
I “warm up” the application by making a few requests.

Then I use Fiddler to hit the products endpoint 20 times, each request returns 8,000 products.

I do the same for both Entity Framework Core 3.1.8 and Entity Framework Core 2.2.6.

The Results
Framework Core 2.2.6 out performs Framework Core 3.1.8 in both speed of response and memory consumption.

Here is the comparison of how quickly they both load the full set of results.

Time to First Byte (TTFB) –

  • EF 2.2.6 returns data before we even reach 10ms in all cases and before 2ms in most cases.
  • EF 3.1.8 is never faster than 80ms and six requests take more than 100ms.

Overall Elapsed –

  • EF 2.2.6 returns most requests in the 26ms to 39ms range, with only two taking more than 40ms.
  • EF 3.1.8 returns four requests in less than 100ms. The rest are in the 100ms to 115ms range.

The speed difference between EF Core 2.2.6 and EF Core 3.1.8 is significant, to say the least.

Memory Usage –

Here is the graph of memory usage during the test.

  • EF Core 2.2.6 maintains low memory usage.
  • EF Core 3.1.8 consumes significantly more memory to do the same job.

Remember, they are both using the same application code, it’s only the versions of Entity Framework that differ.

I also performed tests with other versions of Entity Framework Core 3.1.*, and Entity Framework 2.* and same very similar results. It seems something in the 3.1 stack is done very differently than the 2.* stack.

As always, you can try it for yourself with the attached zip.

Full source code available here.

Getting Started with ElasticSearch, Part 2 – Searching with a HttpClient

Full source code available here.

In the previous blog post I showed how to setup ElasticSearch, create and index and seed the index with some sample documents. That is not a lot of use without the ability to search it.

In this post I will show how to use a typed HttpClient to perform searches. I’ve chosen not to use the two libraries provided by the Elasticsearch company because I want to stick with JSON requests that I can write and test with any tool like Fiddler, Postman or Rest Client for Visual Studio Code.

If you haven’t worked with HttpClientFactory you can check out my posts on it or the Microsoft docs page.

The Typed HttpClient
A typed HttpClient lets you, in a sense, hide away that you are using a HttpClient at all. The methods the typed client exposes are more business related than technical – the the type of request, the body of the request, how the response is handled are all hidden away from the consumer. Using a typed client feels like using any other class with exposed methods.

This typed client will expose three methods, one to search by company name, one to search my customer name and state, and one to return all results in a paged manner.

Start with an interface that specifies the methods to expose –

public interface ISearchService
{
    Task<string> CompanyName(string companyName);
    Task<string> NameAndState(string name, string state);
    Task<string> All(int skip, int take, string orderBy, string direction);
}

Create the class that implements that interface and takes a HttpClient as a constructor parameter –

public class SearchService : ISearchService
{
    private readonly HttpClient _httpClient;
    public SearchService(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }
    //snip...

Implement the search functionality (and yes, I don’t like the amount of escaping I need to send a simple request with a JSON body) –

public async Task<string> CompanyName(string companyName)
{
    string query = @"{{
                        ""query"": {{
                            ""match_phrase_prefix"": {{ ""companyName"" : ""{0}"" }} 
                        }}
                    }}";
    string queryWithParams = string.Format(query, companyName);
    return await SendRequest(queryWithParams);
}

public async Task<string> NameAndState(string name, string state)
{
    string query = @"{{
                        ""query"": {{
                            ""bool"": {{
                                ""must"": [
                                    {{""match_phrase_prefix"": {{ ""fullName"" : ""{0}"" }} }}
                                    ,{{""match"": {{ ""address.state"" : ""{1}""}}}} 
                                ]
                            }}
                        }}
                    }}";
    string queryWithParams = string.Format(query, name, state);
    return await SendRequest(queryWithParams);
}

public async Task<string> All(int skip, int take, string orderBy, string direction)
{
    string query = @"{{
                    ""sort"":{{""{2}"": {{""order"":""{3}""}}}},
                    ""from"": {0},
                    ""size"": {1}
                    }}";

    string queryWithParams = string.Format(query, skip, take, orderBy, direction);
    return await SendRequest(queryWithParams);
}

And finally send the requests to the ElasticSearch server –

private async Task<string> SendRequest(string queryWithParams)
{
    var request = new HttpRequestMessage()
    {
        Method = HttpMethod.Get,
        Content = new StringContent(queryWithParams, Encoding.UTF8, "application/json")
    };
    var response = await _httpClient.SendAsync(request);
    var content = await response.Content.ReadAsStringAsync();
    return content;
}

The Setup
That’s the typed client taken care of, but it has be added to the HttpClientFactory, that is done in Startup.cs.

In the ConfigureServices(..) method add this –

services.AddHttpClient<ISearchService, SearchService>(client =>
{
    client.BaseAddress = new Uri("http://localhost:9200/customers/_search");
});

I am running ElasticSearch on localhost:9200.

That’s all there is to registering the HttpClient with the factory. Now all that is left is to use the typed client in the controller.

Searching
The typed client is passed to the controller via constructor injection –

[ApiController]
[Route("[controller]")]
public class SearchController : ControllerBase
{
    private readonly ISearchService _searchService;
    public SearchController(ISearchService searchService)
    {
        _searchService = searchService;
    }
    //snip...

Add some action methods to respond to API requests and call the methods on the typed client.

[HttpGet("company/{companyName}")]
public async Task<ActionResult> GetCompany(string companyName)
{
    var result = await _searchService.CompanyName(companyName);
    return Ok(result);
}

[HttpGet("nameAndState/")]
public async Task<ActionResult> GetNameAndState(string name, string state)
{
    var result = await _searchService.NameAndState(name, state);
    return Ok(result);
}

[HttpGet("all/")]
public async Task<ActionResult> GetAll(int skip = 0, int take = 10, string orderBy = "dateOfBirth", string direction = "asc")
{
    var result = await _searchService.All(skip, take, orderBy, direction);
    return Ok(result);
}

That’s it. You are up and running with ElasticSearch, a seeded index, and an API to perform searches.

In the next post I’ll show you how to deploy ElasticSearch to AWS with some nice infrastructure as code.

Full source code available here.