Microsoft – SQL Server – Fast DB Inserts and Error Suppression
As a DBA, one of one’s utmost desires in a Database is fast insertion of data. And, there are many ways to progress towards that goal:
- Best available Hardware
- Bulk Inserts
- Table Partitioning
- Auto-commit off
- Minimal Table Indexes
- Judicial choice of Clustered Index – Does one choose auto-incrementing keys so as to avoid page splits (that occur when new data needs to be placed in the middle of existing pages). The downside of auto-incrementing keys is that placing all new data at the tail end causes hot-pages in the new pages.
One other important decision is how to handle occurs. For example, when an attempt is made to add a “new” record and that record already exists in the Table.
Note that the new record might not be the same exact match, but simply share the same values in your primary key; whether the primary key is singular (one column) or composite (many columns).
So such as is our fate as DBAs and Application Developers…
How do you deal with error such as duplicate key, etc…
Well, there are a few ways (to deal with this particular error):
- Not have primary keys
- Extend your Insert Statement – Perform a select based on key columns and only insert if the select comes back empty)
- Use SQL Server’s new merge statement (which has the insert/update facility built in)
As one can not define a primary key with the IGNORE_DUP_KEY clause, we chose to add a new index and as part of its definition, added the “ignore duplicate values” operand:
CREATE UNIQUE NONCLUSTERED INDEX [idx_Unique_Key] ON [dbo].[Sales] ( [SaleId] ASC ) WITH ( PAD_INDEX = OFF , STATISTICS_NORECOMPUTE = ON , SORT_IN_TEMPDB = OFF , IGNORE_DUP_KEY = ON , DROP_EXISTING = OFF , ONLINE = ON , ALLOW_ROW_LOCKS = ON , ALLOW_PAGE_LOCKS = ON , FILLFACTOR = 75 ) GO
There are a couple of key points in the Create Index statement pasted above:
- The “Unique” operand in the create index
- The “Ignore_Dup_Key=ON” which creates a constraint that states before attempting to perform the actual record insertion, check to determine whether there are existing matches
- The “ONLINE=ON” which states that as we are running the Enterprise Edition, we can and should create the index “online” and insulate against disrupting ongoing access \ modifications to the table
- The “ALLOW_ROW_LOCKS=ON” which says it is OK for the DB Engine to provide individual row level locks to requestors
- The “ALLOW_PAGE_LOCKS=ON” which says it is OK for the DB Engine to provide page level locks to requestors
- And, the fill factor which is based on your use case and says how much would you like index pages to be filled and utilized. Again, if your indexes are not ever increasing then weigh\reduce this number so as not to experience too much page fragmentation
The “key” to this helping is that the constraint is validated before the values are inserted.
As with anything, try to get performance numbers and also review SQL Plans using :
- SET STATISTICS IO
- SET SHOWPLAN_ALL
- The Merge statement weighed in at “Scan count 21, logical reads of 100, physical reads 7″. Whereas, the Insert statement weighed in at “Scan count 0, Logical reads 27, physical reads 4″. Conclusively, in terms of pure IO, the Insert Statement is more expensive
- In terms of the Query Plans comparison, the merge is at 67% and the Insert is at 33%. The Merge simply has to do more work.
Per the Merge Statement:
- You really want to pay attention to your determinate clause — By determinant I refer to the qualifying clause that decides whether you ‘re doing an Insert or an Update. In our case, because of the in-availability of computed columns we did not get our best qualifier and thus ended up with a non-clustered index seek rather than the speedier clustered index seek
- As the “Clustered Index” is the first actual insertion of index\data we paid a bit of attention to it and determined that for the “Merge” operation it is a “Clustered Index Merge”; as compared to a “Clustered Index Insert”
Please keep in mind, that the constraint (idx_Unique_key) we added, adds to the overall cost and it is actually the first operation performed (in the case of the regular Insert Statement).