A checklist for a better SQL Server setup (2012-2016)
There are a lot of performance best practices with Microsoft SQL Server that can add up to a significant impact; some of them involve the machine, some of them involve Windows. Some of them involve the SQL server instance, and some of them involve the database: many of the defaults when setting up a new SQL instance and database are suboptimal, probably for legacy reasons. (Thankfully, SQL 2016 finally improves all this!)
I've collected as much as I can here into a "one stop shopping" checklist. And you can always contact us for help!
Hardware/OS
These two data storage sections can't be changed without re-creating the partitions and file system, so if it's too late for that to be practical, skip ahead to OS Security Options. That said, getting the alignment right (the OS section) can have a significant performance impact. Even if it's a headache, strongly consider putting in the effort to correct this, even if that involves stashing a backup somewhere, nuking the original disk layout from orbit, and restoring from backup. (Test the restore first?)
Data Storage (Physical)
If possible, separate the Data, Log, and tempdb onto different physical drives (and certainly have it all separate from C:\, even if it's another partition on the same hardware).
If you didn't separate the data from C:, you can shrink the Windows partition and create a new one afterwards.
Log files: these are primarily sequentially accessed. RAID 1 is a pretty good fit.
Data files: RAID 1+0 are generally better than RAID 6 or RAID 5, because write performance of the latter aren't so great. Data tends to have more random access, but that depends on workload.
tempdb: similar characteristics to data files, but typically crucial as a bottleneck because there's only one. SSD would be nice here, otherwise, something high-performance. Note that reliability is unimportant since the data doesn't need to persist; it's only important from an availability perspective. You could get away with RAID 0 as long as you can fail over in a pinch.
Note that tempdb also has a log and data files, so it might be worth putting the tempdb log alongside the other log files, since, again, it isn't as "random-access" -- but if you're using e.g. an SSD, it's probably not worth it to split them up since the SSD will be fast.
The server/controller should have a battery-backed cache. Otherwise write caching must *not* be enabled on the drive containing the log files. This is another reason why it's good to use separate drives for the separate data: if your log files are on a separate drive, you can get away with enabling write caching on the data and (especially tempdb).
Data Storage (OS/Controller)
Partition Alignment (Critical)
Make sure the partitions are aligned on disk. New installations of Windows Server 2008 and later shouldn't have problems with this, but you might if the partitions were created in older days. You can check with
wmic partition get BlockSize,StartingOffset,Name,Index
Pasted from <https://technet.microsoft.com/en-us/library/dd758814(v=sql.100).aspx>
The partition offset should be evenly divisible by the RAID Stripe Size, and the disk physical sector size. For example, an offset of 1 MB and a stripe size of 256 KB are evenly divisible. An offset of 31.5 KB (Windows Server 2003 and earlier) would not be evenly divisible. Nor would an offset of 128 KB (128 KB is not evenly divisible by 256 KB). This is the general idea:
File System (Less Critical)
The file system cluster size should be larger than the default of 4K. The general recommendation is 64 KB. (In certain cases the data volumes would be better as 32 KB). You can check with
fsutil fsinfo ntfsinfo c:
Pasted from <https://technet.microsoft.com/en-us/library/dd758814(v=sql.100).aspx>
(While you're here, you should also see the Disk Physical Sector Size).
The rationale for this recommendation is that SQL issues I/O in units of Extents. Since SQL will be running I/O in 64 KB chunks, there's little point in subdividing the file system further than that (especially since the SQL data should not be commingled with other data on the same partition).
Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB.
Pasted from <https://msdn.microsoft.com/en-us/library/ms190969(v=sql.105).aspx>
Other Alignment Considerations
Be sure the RAID stripe size is a multiple of 64 KB (the extent size above). I've read larger is better (to a point), e.g. 256 KB.
If you're using a SAN you should also look into the alignments against the LUN divisions
If you're using a VM you must consider this filesystem layer too. Avoid a dynamic size VHD, by the way.
More information about all of this here: http://www.blueshiftblog.com/?p=300
Unfortunately none of this can easily be changed after the fact without recreating the volumes.
OS Security Options
There are some permissions that you may wish to give the SQL Server service account.
"Perform volume maintenance tasks" -- this allows SQL to grow data files without waiting for the OS to zero out the data (aka "Instant File Initialization") -- note that this doesn't apply to log files. This is generally a good thing -- the only concern is that previously deleted data won't be zeroed out
Turn it on with gpedit.msc: Computer Configuration -> Security Settings -> Local Policies -> User Rights Assignment: "Perform volume maintenance tasks"
You can enable this during SQL 2016 setup
Lock pages in memory -- this allows SQL server to have more control over memory, and possibly use large pages. If the server is dedicated to SQL and you've set the memory limits appropriately, it's usually good to enable this. Also in various places it's documented that this allows SQL server to use more-efficient AWE API calls for allocating memory (e.g. https://blogs.msdn.microsoft.com/psssql/2009/09/11/fun-with-locked-pages-awe-task-manager-and-the-working-set/). Make sure Max Server Memory is set. This may be inappropriate in a virtual environment.
Turn it on with gpedit.msc: Computer Configuration -> Security Settings -> Local Policies -> User Rights Assignment: "Lock pages in memory"
SQL Server itself
During the installation
Security
Try not to use SQL authentication if you don't need it: defense in depth!
Choose a very strong password for sa, if you must use SQL authentication. Consider disabling the account after installation
It's better to handle administration permissions with Windows accounts, e.g. make 'Administrators' have access to the SQL server rather than using sa
Data
As mentioned earlier: try to use different drives for the different data directories
SQL 2016 has the following additional screen:
Finally, SQL 2016 has some reasonably sane defaults for tempdb. Probably the number of files is a bit high, and the initial size a bit low, but these are okay.
Splitting into multiple files helps only with a specific kind of wait: PAGELATCH (on db 2:x:x). 1/2 or 1/4 of the number of cores is probably okay: more files could result in data being spread across multiple files, which depending on the disk setup could be slower. If you're not on SSDs for tempdb, you might want to lower this to 4 or 2 (otherwise, more seeking).
Note that these allocations can be reduced by explicitly declaring the temp tables, rather than using SELECT INTO. Think of tempdb having a pool of tables that can be used, and when they're dropped (or go out of scope) they return to the pool.
After installation
Memory
Be sure to set the server's max memory option. It should probably be most of main memory, but leave some room for the OS, and anything else that's running (web? reporting services? etc)
Trace Flags
Trace flags control certain behaviors in SQL Server. To enable a trace flag globally, go to SQL Configuration Manager and add -T#### as another startup parameter
Here are some TRACE flags to consider. SQL 2016 minimizes the need for many of them.
Flag |
Info |
Applicable Versions |
Enable? |
Considerations |
-T460 |
Improves "string or binary value would be truncated" message |
2016 SP2 CU6+, 2017 CU12+ |
Yes |
The message id itself changes from 8152 to 2628. But now it contains the table, the column, and the value affected -- praise be! |
-T1117 |
Auto grow all files (mostly important for having the multiple files in tempdb) |
2012, 2014 |
Yes |
No longer needed in SQL 2016 (controlled by SET AUTOGROW_SINGLE_FILE / SET AUTOGROW_ALL_FILES) |
-T1118 |
2012, 2014 |
Yes |
No longer needed in SQL 2016 (controlled by SET MIXED_PAGE_ALLOCATION) |
|
-T2312 |
Force-enables the SQL 2014 cardinality estimator even when the database level is 110 (2012) |
2014, 2016 |
Probably* |
*SQL 2014+ has a generally improved cardinality estimator, but if the database is still set to '2012', for compatibility's sake, the old behavior is used. May be worth trying when upgrading the SQL instance. If you enable it, you can use flag 9481 to force older behavior in necessary queries. In SQL 2016 SP1+ you can use a query hint to force the old behavior |
-T2371 |
2012, 2014 |
Yes |
Not necessary in SQL 2016 if the database is set to compatibility level 130 |
|
-T2453 |
All |
Maybe |
Can be helpful if you use big table variables a lot. See article. |
|
-T2562 |
All |
Probably |
Depends on whether your environment needs a smaller impact during DBCC CHECKDB, or if you are examining giant databases (tempdb size can grow to 5% of checked database) |
|
-T4199 |
2012, 2014 |
Maybe |
In SQL 2016, using the latest compatibility level enables the cumulative query optimizer improvements. May be worth enabling for SQL 2014 and 2012 to opt-in to optimizer hotfixes, but requires testing. |
|
-T6532 -T6533 -T6534 |
2012, 2014 |
Probably |
Not needed in SQL 2016. If you use spatial queries in 2014/2012, this opts in to the improvements |
|
-T7412 |
2016 SP1, 2017 |
Probably |
CPU overhead is minimal (2%), and allows for always accessing the live execution plan |
|
-T8048 |
Improved NUMA behavior on heavy machines (8 CPUs per NUMA node) |
2012, 2014* |
Probably not* |
*Specific workloads on specific classes of machines. No effect starting with 2014 SP2 |
Database Files
Background Info:
SQL 2016 starts with more sensible defaults, finally, but unless all the databases were newly created from a fresh 2016 instance, make sure your databases have a reasonable file structure.
The following is the default up through 2014, and it's pretty much horrible. These settings will lead to a lot of fragmentation, both from a file system point of view, and SQL internals. Also the "10 percent" will result in a larger (and slower) growth each time for the log -- remember that the log can't be "instant-initialized"
SQL 2016 has much saner defaults:
Autogrowth action steps
You'll almost certainly want at least 64-MB chunks for autogrowth. If you anticipate larger database sizes, then consider increasing that, e.g. to 512 MB or 1 GB (possibly more, but 1 GB is probably plenty -- special note: make sure you don't use 4 GB unless you're using SQL2012+).
If this is a new database, consider what the initial size should be. Better to allocate it now, especially the transaction log (which cannot use instant file initialization).
Consider setting a max size, even if it's something unrealistically large (e.g. 100 GB, assuming the disk is larger than that and the workload smaller), particularly the transaction log. It's a really bad day when a transaction log fills up the disk by mistake.
Be sure to check this for tempdb too.
For tempdb, be sure to set multiple data files (2-8 -- see section above under 'setup'), and that their "initial size" and autogrowth configurations are the same.
You only need one log file, even for tempdb
Correcting existing databases
After you've set up the autogrowth, let's remedy existing databases, if they weren't set up optimally (growth size was less than 64 MB).
Fix the internal fragmentation on the transaction log -- each growth chunk that was added to the transaction log is broken down into several "VLFs" (virtual log files), depending on the size of the chunk.
Too many VLFs will particularly affect backup and recovery performance, and can negatively impact normal "write query" performance (insert/update delete).More info here:
https://www.sqlskills.com/blogs/kimberly/8-steps-to-better-transaction-log-throughput/
https://www.sqlskills.com/blogs/kimberly/transaction-log-vlfs-too-many-or-too-few/
1000 VLFs: okay
10000 VLFs: noticeable recovery degradation
100000 VLFs: severe performance degradation
you can check your database with DBCC LOGINFO
How do we fix it?
Don't do this while the server is busy
clear the active log
FULL recovery mode: take a log backup
SIMPLE recovery mode: call CHECKPOINT
shrink the log, either with SQL or via the SSMS UI
set a new "initial" size, either with SQL or via the SSMS UI
SQL commands are here:
For tempdb, just stop the service and delete the existing tempdb files
Defragment the disks, if possible. This will also pretty much require downtime. Once the autogrowth is fixed, it shouldn't be much of a problem going forward
Other database concerns
Don't use auto-shrink if at all possible
You should almost certainly have PAGE_VERIFY set to CHECKSUM. This is the default for databases in SQL 2005 and later. But note that SQL Server Data Tools does not have it on by default in SSDT projects!
Consider enabling Snapshot Isolation, and use that instead of NOLOCK to avoid blocking; unless you're intimately familiar with database locking, pretend NOLOCK is shorthand for "ERROR-PRONE GARBAGE". I'll write about this at some point, but for now, see Thought 2 here: http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx.
Maintenance
Consider setting up periodic maintenance plans
Backups
Reorganize indexes
Update statistics
Run consistency checks
So, that about wraps things up. If you're struggling with SQL performance, or this list seems a bit too daunting, we can help!