Seeding a Large Database with Entity Framework
Download full source code.
You will often see that Entity Framework is not suitable for bulk operations, this has been the case since I started using it back when it was Entity Framework 5 back around 2012.
But with many of my posts, I like to seed a database, and sometimes I want to seed it with a few hundred thousand or even more rows.
This is where Entity Framework can be slow, run out of memory, or timeout.
As you add more and more rows to the context, it will keep track of them all. The memory allocated to the application grows and grows.
Turning off tracking won’t help as it applies only to the entities that are returned from the database, not the ones you add to the context.
My usual seeding pattern is very simple -
- Create a new context
- Make sure the database and tables exist
- Generate the data
- Add the data to the context
- Save the changes
This works fine when the amount of data is small, but not when generating large amounts of seed data.
When I tried seeding a table with 200,000 rows, I timed out after the default connection timeout period for SQL Server.
Attempt 1
I made a small change to the pattern, starting at 3 -
Start a loop
- Generate a subset of the data
- Add the data to the context
- Save the changes
This worked, but it was slow as I added more data, and I noticed that my memory usage grew and grew as the loop progressed. The memory kept on growing because I kept adding data to the context. It doesn’t matter that I was saving data, because I was reusing the same context, so of course it was tracking more and more data.
Turning off tracking doesn’t help here either because the context always tracks entities added to it.
Attempt 2
I made another small change to the pattern, starting at 3 -
Start a loop
- Generate a subset of the data
- Add the data to the context
- Save the changes
- Dispose of the context
- Create a new context
This worked well, it was faster, and the memory profile stayed flat.
The code
The full source code is attached, so I’m not going into everything here, just the two main parts.
The first part creates the database, and tables, calls the seeder in a loop, and creates/disposes/recreates the context every 10 loops.
1using System.Diagnostics;
2using Microsoft.EntityFrameworkCore;
3using SeedDatabase.Data;
4
5SalesContext salesContext = new SalesContext();
6
7await salesContext.Database.EnsureDeletedAsync(); // optional
8await salesContext.Database.EnsureCreatedAsync();
9
10int productsToInsertPerLoop = 1000;
11int seedLoops = 20;
12Stopwatch sw = new Stopwatch();
13
14for (int i = 1; i <= seedLoops; i++)
15{
16 sw.Start();
17 Console.WriteLine($"Seeding loop {i} of {seedLoops}");
18 await salesContext.SeedAsync(productsToInsertPerLoop).ConfigureAwait(false);
19 sw.Stop();
20 Console.WriteLine($"Written {i * productsToInsertPerLoop} rows in {sw.ElapsedMilliseconds} ms");
21 sw.Reset();
22
23 if(i % 10 == 0)
24 {
25 salesContext.Dispose();
26 Console.WriteLine("Context disposed");
27 salesContext = new SalesContext();
28 }
29}
30
31Console.WriteLine($"Total rows {await salesContext.Products.CountAsync()}");
The second part is the seeder which generates the data, adds it to the context, and saves it to the database. The SeedAsync
method is an extension method on the SalesContext
class.
1using Bogus;
2
3namespace SeedDatabase.Data
4{
5 public static class Seeder
6 {
7 public static async Task SeedAsync(this SalesContext salesContext, int productCount)
8 {
9 var faker = new Faker<Product>()
10 .RuleFor(p => p.Name, f => f.Commerce.ProductName())
11 .RuleFor(p => p.Description, f => f.Commerce.ProductDescription())
12 .RuleFor(p => p.Price, f => f.Random.Decimal(0, 100))
13 .RuleFor(p => p.SKU, f => f.Commerce.Ean13())
14 .RuleFor(p => p.Code, f => f.Commerce.Ean8())
15 .RuleFor(p => p.ProductCategory, f => f.PickRandom<ProductCategory>());
16
17 var products = faker.Generate(productCount);
18 await salesContext.AddRangeAsync(products);
19 await salesContext.SaveChangesAsync().ConfigureAwait(false);
20 }
21 }
22}
There you go, a simple way to seed a large database with Entity Framework without timing out, slowing down, or running out of memory.
Download full source code.