Performance tuning Checklist

23 October 2018

I did some performance tuning tasks recently, and summarized a checklist for performance tuning in the below:

Checklist

Before Start

These are the things you need to prepare or keep in mind before start working.

Prepare same data size with Production database.
Know about the user’s behaviour, simulate user’s operation to reproduce issue.
Use tools to identify issue, do not make assumption.
Validate often during tuning.
Speed is not the only thing, please also check CPU, memory, Disk I/O, etc.
Sometime, it is only because the server is not powerful enough.

It’s about how to improve performance without changing code :)

Upgrade CPU, memory, hard disk.
Upgrade Database version, framework.
Separate application into different servers. For example, put time consuming background tasks into a separate server.
Load balancing for heavy traffic applications

SQL Server is the database that most of the projects I have worked on, but some points apply to other databases.(I am not an expert of Database, and these are the things that I usually check firstly).

Check if index is missing, or need rebuild.
Check execution plan to get more information
Select just enough data, don’t select unrelated data.
Beware the Scalar-valued Function. It hurts when handling huge data. There are Scalar-valued Function, Inline Table-valued Function and Multi-Statement Table-valued Function. Try to avoid the first and the third. Check Here and Here
Avoid N+1 Select
COMPATIBILITY_LEVEL of SQL Server, sometimes it matters, check the details
MAXDOP, usually you don’t need to change this, or don’t change it if you are not sure. I did found the instance MAXDOP set to 2 on one production database. Check Here and Here
ORM over Stored Procedure

There are some basic rules for good front-end performance, and you can easily get a list from google.

There are also many front-end frameworks like angular, react, etc, and they have their own performance analysis tools.

Actually there are two very useful functions, which can record the timing.

console.time('A');
console.timeEnd('A'); 
output> A: 9308.794921875ms
//The time is the interval between these two method calls.

I created a wrapper function:

function timing(fn, parentName) {

    var name = new Date().toJSON() + "  ";
    if (parentName) {
        name = name + parentName + "-";
    }
    name = name + fn.name;

    return function () {
        console.time(name);
        const result = fn.apply(this, arguments);
        console.timeEnd(name);
        return result;
    };
};

// You can test like this:
// The name is combined by: [timestamp  ] + [parentName-] + [methodname] 
// Timestamp is used to avoid duplicated name
timing(await sleep)(1) 
output> 2018-10-26T11:59:43.402Z  sleep: 1001.0029296875ms
timing(await sleep, 'Second')(1)  
output> 2018-10-26T12:01:00.057Z  Second-sleep: 1000.207275390625ms

use cache if needed.
Move expensive filters out of loop
Avoid N+1(usually when executing DB query with ORM)
Dictionary over List for item lookup.
Asynchronous and In Parallel (In a Non-Blocking Fashion) for multiple time consuming queries in one request.

A real-world performance tuning example

There is a download API, which is used to download data for some departments, and store the result in an excel file. We noticed that the api is slow even only for one department. It gets slower when there are more data in the result. It never finishes when downloading all departments data. Before we look into the issue, we thought it might be a database query performance issue, because there are lots of joins and calcuation in the sql query.

This is my local machine configuration:

Property	Value
Operation System	Windows 10 Pro
Memory	32G
CPU	i7 2.8GHz, 8 Core
Hard Disk	1T Toshiba SSD
Database	SQL Server 2017

Step 1 - Identify bottle neck

Use backup of production DB.
Check the performance for different data size.
Use dotTrace to get the time distribution.

There are 3 main steps in the API:

SQL: Execute SQL scripts to load data from DB
C#: Reorganize the result to match the Excel data format
Excel Writer: Use Open XML SDK to write the data to Excel

Here is the timing for different queries:

Sample No	Excel Size	Total Time(ms)	SQL(ms)	C#(ms)	Excel Writer(ms)
Size 1(2019-1-001)	4335 * AO	32336	20141	8913	3234
Size 2(2019-1-008)	4148 * AL	25963	16033	7171	2733
Size 3(2019-1-012)	6036 * CR	63652	17399	`35465`	`10721`
Size 4(2019-1-ALL)	93000 * HG	-	65065	-	-

From the results, we can see:

SQL gets slow when data size increase, but not the bottle neck
C# get worse dramatically when data size increase, need to solve
Excel Writer get worse dramatically when data size increase, need to solve.

So the action is to fix the part of C# and Excel Writer first.

Step 2 - Fix `C#`

After looked into the code, we found the following bad smells:

There are some reference data list used in several level nested loop, use dictionaries to index the data with the query, move them out of loops
Extract some variable operation out of the loop, as the value don’t change in the loop.

This is the result after this round of refactoring:

Sample No	Excel Size	Total Time(ms)	SQL(ms)	C#(ms)	Excel Writer(ms)
Size 4(2019-1-ALL)	93000 * HG	435788	37205	`12854`	`385061`

This is amazing! We reduced the time dramatically by using Dictionary and moving codes out of loops. Now the download can be finished in ~ 7 min.

Step 3 - Fix `Excel Writer`

The library used for excel generation is DocumentFormat.OpenXml.Spreadsheet, and I found cell.DataType = CellValues.InlineString; costs lots of time, so I tried different ways:

Set CellType By	Excel Writer(ms)
cell.DataType = CellValues.InlineString	385061
//Don’t set	150316
cell.SetAttribute(new OpenXmlAttribute(“”, “t”, “”, “inlineStr”));	168932

Based on the above result, I choosed the third approach.

But it still costs ~ 3 minutes, and at the same time I found the memory consumption is very high (~13 G). After further investigation, I found that there are two ways to write excel: Open XML SDK DOM Approach and Open XML SDK SAX-Like Approach. And it’s said that the later consumes less memory¹.

Here is the result for different approaches:

Approach	Excel Writer(ms)	CPU	Memory
Open XML SDK DOM Approach	168932	>60%	~13G
Open XML SDK SAX-Like Approach	85478	<30%	~2G

Now The Excel Writer only cost ~ 85s. The total download time is ~ 2 min, which is much better than before. After talked with PO, this is an acceptable result.

Result

Here is the comparison between the old codes and tuned ones for downloading all departments data:

Measurement	Before	After
Time	Still not finished after one hour	2 min
CPU	>90%	<30%
Memory	13G	2G

When look back the whole performance tuning process, we found that the issue is totally different with what we have thought at the beginning. By following the right process, we identified and fixed issue quickly.

https://blogs.msdn.microsoft.com/brian_jones/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk/ ↩

development 9

Performance tuning Checklist

Checklist

Before Start

Sever/System Related

Database Related

Front-End Code Related

Back-End Code Related

A real-world performance tuning example

Step 1 - Identify bottle neck

Step 2 - Fix C#

Step 3 - Fix Excel Writer

Result

Step 2 - Fix `C#`

Step 3 - Fix `Excel Writer`