Getting Started with ElasticSearch, Part 1 – Seeding

Full source code available here.

This is the first in a short series of blog posts that will get you started with ElasticSearch. In this you will deploy seed and query ElasticSearch from your own computer. The next will add a .NET Core API into the mix as a ‘frontend’ for ElasticSearch. And the last will show how to deploy an ElasticSearch domain on AWS using an infrastructure as code tool.

This post will show you how to create a simple document mapping, seed the ElasticSearch index and perform some simple queries. It is not a substitute for reading the docs, it is more of step up to help you get going.

Getting Started
Download and install the latest version of ElasticSearch, this post was written when 7.7 was the up to date version, I mention this because if you are reading this when a version > 7 is available the steps may not work – ElasticSearch is known for making major breaking changes in major releases.

Start up ElasticSearch by changing to its directory and running –

bin/elasticsearch

It will start on localhost:9200 by default.

If you are using Visual Studio Code I suggest installing the Rest Client extension. The elasticSearch.http file in attached zip contains examples of how to create and delete indexes, add mappings, and perform queries.

At the top of my elasticSearch.http I have two variables that will be used throughout the rest of the file, these define the host where ElasticSearch is running and the name of the index I’m working with

@elasticSearchHost = http://localhost:9200
@index = customers

To see the indexes that are already in place run this –

GET {{elasticSearchHost}}/_cat/indices?v&pretty

Adding an Index with Rest Client

Let’s add the customer index with Visual Studio Rest Client, of course you can use Postman, Fiddler or any tool of your choosing –

PUT {{elasticSearchHost}}/{{index}}
Content-Type: application/json

{
  "mappings": {
    "properties": {
      "companyName": {
        "type": "text"
      },
      "customerId": {
        "type": "integer"
      },
      "dateOfBirth": {
        "type": "date"
      },
      "email": {
        "type": "text"
      },
      "firstName": {
        "type": "text",
        "copy_to": "fullName"
      },
      "middleName": {
        "type": "text",
        "copy_to": "fullName"
      },
      "lastName": {
        "type": "text",
        "copy_to": "fullName"
      },
      "fullName": {
        "type": "text"
      },      
      "mobileNumber": {
        "type": "text"
      },
      "officeNumber": {
        "type": "text"
      },
      "address": {
        "properties": {
          "line1": {
            "type": "text",
            "copy_to": "fullAddress"
          },
          "line2": {
            "type": "text",
            "copy_to": "fullAddress"
          },
          "city": {
            "type": "text",
            "copy_to": "fullAddress"
          },
          "state": {
            "type": "text",
            "copy_to": "fullAddress"
          },
          "zip": {
            "type": "text",
            "copy_to": "fullAddress"
          }
        }
      },
      "fullAddress": {
          "type": "text"
      }
    }
  }
}

Run the request to list indexes again and you will see the customers index.

GET {{elasticSearchHost}}/_cat/indices?v&pretty

Now you have an index full of nothing, docs.count is 0 –

health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customers RDWD3Q75TVqf7VkvO932mA   1   1          0            0       208b           208b

Seeding
Time to switch to the seeder. This is Node.js program that checks if the customers index exists, creates it if it does not, and seeds the index with 5000 customer documents.

I’m not going to go into how it works as I am learning Node.js now and I’m sure it is not as good as it should be. You can execute it by running –

npm install
node seed.js

Now you should see a different result when you look at the indexes on the ElasticSearch server.

GET {{elasticSearchHost}}/_cat/indices?v&pretty

health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customers wsdQcJ1IQNOXQ-QQIX_a_Q   1   1       5000            0      2.4mb          2.4mb

Here are a few examples of requests you can make to ElasticSearch.

###
# delete an index, BE CAREFUL WITH THIS ONE
DELETE {{elasticSearchHost}}/{{index}}?pretty

###
# retrieve a document from the index by its id
GET {{elasticSearchHost}}/{{index}}/_doc/1

###
# search the index with no query, this will match all documents, but return only the first few
GET {{elasticSearchHost}}/{{index}}/_search

###
# retrieve a page of results with no query
GET {{elasticSearchHost}}/{{index}}/_search
Content-Type: application/json

{
  "sort":{"dateOfBirth": {"order":"asc"}},
  "from": 0,
  "size": 10
}

###
# search for company names that match the word 'Turcotte', you might need to change this name
GET {{elasticSearchHost}}/{{index}}/_search
Content-Type: application/json

{
    "query": {
        "match_phrase_prefix": {
            "companyName" : "Turcotte" 
        } 
    }
}

###
# search for people in Utah with the name Keith (first, middle or last), you might need to change these parmas.
GET {{elasticSearchHost}}/{{index}}/_search
Content-Type: application/json

{
    "query": {
        "bool": {
            "must": [
                {"match_phrase_prefix": { "fullName" : "Keith" } }
               ,{"match": { "address.state" : "Utah"}} 
            ]
        }
    }
}

That’s it for now.

In the next blog post I will show you how to use `HttpClientFactory` and typed clients to perform searches in .NET Core.

Full source code available here.

DynamoDb, Reading and Writing Data with .Net Core – Part 2

Full source code available here.

A few weeks ago I posted about reading and writing data to DynamoDb. I gave instruction on how to get create tables on localstack and how to use the AWS Document Model approach. I also pointed out that I was not a big fan of this, reading data look like –

[HttpGet("{personId}")]
public async Task<IActionResult> Get(int personId)
{
    Table people = Table.LoadTable(_amazonDynamoDbClient, "People");
    var person = JsonSerializer.Deserialize<Person> ((await people.GetItemAsync(personId)).ToJson());
    return Ok(person);
}

You have to cast to JSON, then deserialize, I think you should be able be able to do something more like – people.GetItemAsync(personId), but you can’t

And writing data looked like –

[HttpPost]
public async Task<IActionResult> Post(Person person)
{
    Table people = Table.LoadTable(_amazonDynamoDbClient, "People");
    
    var document = new Document();
    document["PersonId"] = person.PersonId;
    document["State"] = person.State;
    document["FirstName"] = person.FirstName;
    document["LastName"] = person.LastName;
    await people.PutItemAsync(document);
    
    return Created("", document.ToJson());
}

For me this feels even worse, having to name the keys in the document, very error prone and hard.

Luckily there is another approach that is a little better. You have to create a class with attributes that indicate what table the class represents and what properties represent the keys in the table.

using Amazon.DynamoDBv2.DataModel;

namespace DynamoDbApiObjectApproach.Models
{
    
    [DynamoDBTable("People")]
    public class PersonDynamoDb
    {

        [DynamoDBHashKey]
        public int PersonId {get;set;}
        public string State {get;set;}
        public string FirstName {get;set;}
        public string LastName {get;set;}
    }
}

Because of these attributes I don’t want to expose this class too broadly, so I create a simple POCO to represent a person.

public class Person
{
    public int PersonId {get;set;}
    public string State {get;set;}
    public string FirstName {get;set;}
    public string LastName {get;set;}
}

I use AutoMapper to map between the two classes, never exposing the PersonDynamoDb to the outside world. If you need help getting started with AutoMapper I wrote a couple of posts recently on this.

Here’s how reading and writing looks now –

[HttpGet("{personId}")]
public async Task<IActionResult> Get(int personId)
{
    var personDynamoDb = await _dynamoDBContext.LoadAsync<PersonDynamoDb>(personId);
    var person = _mapper.Map<Person>(personDynamoDb);
    return Ok(person);
}

[HttpPost]
public async Task<IActionResult> Post(Person person)
{
    var personDynamoDb = _mapper.Map<PersonDynamoDb>(person);
    await _dynamoDBContext.SaveAsync(personDynamoDb);
    return Created("", person.PersonId);
}

This is an improvement, but still not thrilled with the .NET SDK for DynamoDb.

Full source code available here.