At Bennetts we’ve often been approached to help out on projects with difficulties, and more often than not we are required to ‘optimize’ the application because of speed related issues. Performance is important, as any user who has waited for some application to give them the data they require will tell you.

Waiting is just not an option. If you had to add up all the time you spend waiting for things to happen on your computer, I think you’d be unpleasantly surprised to find out how much of your life you’ve wasted.

This is a problem for employers as well. If you have a hundred employees who wait on average 15 minutes each day for their programs to respond or websites to be displayed, that’s 25 man hours per day.

Developers are notorious for not thinking about performance. I know, I’ve been on projects where performance was an afterthought, and tons of extra effort had to go into optimization to get the program working. We often think that performance is all about hardware. If the application is slow, then put in some more memory, or upgrade the CPU’s, or add another server. It’s easy to fall into this trap, but I believe that if we spent a little extra time up front thinking about performance, we’d have better applications and happier customers, which in turn means happy developers.

I think that there are 3 areas we need to consider whenever we develop an application:

1. Database Design and Optimization
2. Application Design and Optimization
3. Load testing

In this post I’ll visit each of these areas and illustrate a few techniques and tips you can implement to make sure your application performs well from the get-go saving you a ton of time in the long run.

Database Design and Optimization

For the purposes of this post I am going to assume you are using Microsoft SQL Server 2000 or later, but some of the principles still apply to any relational database.

1. Indexes

One of the things that I see way too often on databases I have worked on is the complete lack of indexing. You can see the developer has just created tables using the table designer and assigned a primary key (if you’re lucky).

So, why are indexes important? Lets start with what an index is and what it does…

An index is basically a data structure, often stored in memory, that is made up of one or more columns of a database table, with pointers to specific rows in a table. So, instead of opening the whole table and looping through it to find every record that matches the criteria in your SQL statement (called a full table scan) the query optimizer will open the index, go through it, and find the pointers to the data that match your criteria and fetch only those rows from the table.

This makes lookups and joins on tables much faster, and in my experience, properly tuning your indexes makes a massive difference to your applications performance. It’s often said that 20% of your effort will yield 80% of the results, and it is definitely true in this case.

For an in-depth look at indexes, visit http://www.odetocode.com/Articles/70.aspx

Also, if you have an existing database that you want to optimize, use the SQL Server Database Tuning Advisor (SQL Index Tuning Wizard in SQL 2000). For a very nice tutorial on this, visit: http://www.exforsys.com/tutorials/sql-server-2005/sql-server-database-tuning-advisor.html

2. Choose the correct data types

Another thing that I find fascinating are databases that have every column defaulted to Varchar(100). So, no matter what is stored in a specific column, it’s going to be stored as character data. This is not very efficient, and worse, the query optimizer can’t optimize queries properly.

We once worked on a project where the client had written an entire application to use tab delimited text files as data storage, and we were tasked to convert these text files into SQL and update the application to read the data from the database. The instruction though was to make every column CHAR(100) (as thats what it was in the text files) and not to change the way we read data as he wanted to maintain the system himself and he didn’t understand databases very well. Imagine returning every record from the database, looping through each record and finding data the same way he used to with his text files. Not pretty. Functions that took a few seconds to execute took hours on the ‘SQL Version’ of the file (not to mention the size of the database), but he only changed his mind once we’d demonstrated this. Once we’d converted the table to use the correct datatypes, indexed the tables properly and implemented SQL queries instead of treating tables like files, the execution time consistently beat his text file lookups by a large margin (from 4 seconds to sub 1 second times).

3. Avoid unnecessary user defined functions

An application I worked on recently had a function called GetCustomerName which accepted the Customers primary key and returned the customers name. This was used in a query that returned 15000 rows. So, for each row, the system executed the GetCustomerName function when the CustomerName could quite easily have been added using a standard join. Once again, after the change, the resulting query was orders of magnitude faster.

4. Don’t over-normalize your data

Or rather, think carefully about how your data is normalized. We’re all kinda brainwashed at University that the correct design is always 3NF, but this is not always the case.There are some instances where not normalizing your data could seriously improve performance.

Read a bit more here:

http://www.codinghorror.com/blog/archives/001152.html

and

http://www.25hoursaday.com/weblog/CommentView.aspx?guid=cc0e740c-a828-4b9d-b244-4ee96e2fad4b

Application Design and Optimization

1. Caching

This relates to the database in a way because what we’re trying to do with caching is essentially minimize the number of times we need to fetch data from the database. Do you really need to hit the database to get a users name every time your application displays his name? How often does this guy change his name anyway? Hardly ever. So, if the first time you read his name from the database you store it in the cache, and only purge it from the cache when it changes, you’re bound to save a few trips to the database.

On heavily used applications caching frequently used data can make a big difference. We have an in house developed system for our invoicing, and we had performance issues when processing large batches of invoices. After looking at the code, we realized that if we simply cached the customers information, the General Ledger codes and a few other small bits of data that never changed we could take our processing time from 15 minutes to 22 seconds. It makes that big a difference. Design your application to take advantage of caching from the beginning.

One thing to consider when using caching though is that the cached data needs to be stored somewhere, and this is normally in memory. Make sure that when you design your cache, you take this into account.

2. Measure

The only way you’re ever going to know how your application performs and where its bottlenecks are is to measure how long things take, so time your code using System.Diagnostics.Stopwatch and write the execution time to some sort of log (log4net is awesome). This will be a great help when you’re load testing your application.

Look into professional profiling tools as well. These not only show you where things are getting bogged down, but help you identify memory leaks, where garbage collection is causing problems and a host of other issues.

3. Think

This is probably the most important. I’ve seen some really stupid code (a lot of which was written by me) over the last few years. Coding can often be a monotonous task, especially if its fairly easy work, and it’s here that problems creep in. I once wrote an a bit of code that would filter a combobox depending on the data that was entered. This was fine when there were only a few entries in the database, but when it grew it slowed down to a crawl. The fix was simply to delay the filter based on whether the user was still typing or not instead of on every keypress. On average the user was typing 4 characters before they found what they wanted, which meant 4 hits on the database. Just by delaying the filter by a few milliseconds after a keypress this was reduced to 1 hit on the database. Multiply that by the 40 concurrent users using the system and it made a huge difference to the application overall.

Some awesome resources to read:

http://msdn.microsoft.com/en-us/library/ms998544.aspx

http://media.datadirect.com/download/docs/dotnet2_2/dotnetref/dotnetdesign.html

http://blogs.msdn.com/alikl/archive/2008/11/03/asp-net-performance-by-design-takeaways-from-pdc.aspx

Load testing

It’s very unlikely that your application is going to be used by just one user at a time, so it’s really important that you make sure your application performs well under the expected load. Load testing an application (especially a Windows application) is quite difficult, but it can be done. What I prefer doing is to deploy the application at the clients site for a parallel run. Here the client gets to test the system and become comfortable with it, and you get a bunch of data you can use to optimize and improve your applications performance. This isn’t always possible though, so I normally will try and simulate load as best I can using the applications existing unit tests, SQL profiling (to record an replay workloads) and a few extra bodies around the office :)

Conclusion

Performance is easy if you design and your develop your application with performance in mind. It’s important that you spend your time doing the 20% of the things that are going to bring you the 80% of the results, and I hope I’ve illustrated what I consider to be my 20%. You can go much further with optimization if you want to, but the laws of diminishing returns applies. Implement the basics, and spend your time making a awesome application and optimize only when absolutely necessary.