Microsoft – SQL Server – Database Snapshots ( in VLDB)

Feb 15th, 2012

Let us go back to over three years ago.

Here is a post I started, but never published.

Preface

Is it just me or is it true that “SQL Server” Snapshot’s applicability does not hold up in VLDB Environments.

It is probable that it is not “SQL Server”, but the MS Windows OS, at least as of MS Windows 2008/R2.

Imagine this scenario, you have have a database that is a bit big; shall we say a few hundreds GB, close to a TB, or in the TBs.

And, you are doing your thing – inserting, updating, and deleting data.

If I am reading this web docs correctly, once the changes in any of the data files, reaches 16 GB, we are toast!

We will start seeing Error 665.

Blogs Referenced:

  1. SQL Server reports operating system error 1450 or 1452 or 665 (retries)
    http://blogs.msdn.com/b/psssql/archive/2008/07/10/sql-server-reports-operating-system-error-1450-or-1452-or-665-retries.aspx
  2. SQL Server Snapshot File Size
    http://claysql.blogspot.com/2009/07/sql-server-snapshot-sparse-file-size.html
  3. View the Size of a Sparse File of a Database Snapshot
    http://msdn.microsoft.com/en-us/library/ms175823.aspx
  4. Sparse File
    http://en.wikipedia.org/wiki/Sparse_file

Please let me know your thoughts….

 

June 14th, 2015

I checked my email upon waking up and a nice engineer had replied a question I posted at  SQL Server Central:

The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset…
http://www.sqlservercentral.com/Forums/Topic1252840-391-1.aspx#bm1694417

DBCC CHECKDB uses internal database snapshots which are created at the same location of the corresponding database data file and grow as data is changed in the original data file. If transactional activity continues on this database, these snapshots created by DBCC commands may experience huge internal fragmentation. Keeping details of such high level of fragmentation requires more resources than the default size “Bytes Per FileRecord Segment” which is 1 KB.

Suggestions for Remedy <in order of effort required>.

  • Suggest the users to use a combination of other DBCC CHECK jobs and avoid DBCC CHECKDB
  • Avoid running DBCC CHECKDB at a time when other / major data modifications are taking place.
  • Divide the database into a multiple files. The limitations are per sparse file and each database snapshot creates matching sparse files for each data file
  • Find out which tables/indexes result in the most write activity during the lifetime of the snapshot
  • Separate them out into a different file group with multiple files of comparatively smaller sizes.
  • Identify & revise the Index Fill Factor & PAD index values.
  • Use check table or as appropriate on those.
  • Format the disks with /L to increase the “Bytes Per FileRecord Segment” to 4 KB from 1 KB.

More details at :

 

Personal Testimony

As it has been over three years since the original problem was cited, the problem is very, very staid in my mind now.

In all honesty, we have replaced the original SQL Server and single box solution, with Hadoop and clusters.

And, as such, any thoughts and/or approach will purely be conceptual and will not see actual implementation.

As it will not be implemented, it will not be vetted.

 

What things can you try?

MS windows patches

Microsoft Windows 7, 2008, 2008R2

If you happen to be running MS Windows 2008 or 2008R2, as we were in 2013, I will suggest that you download and apply KB 967351.

Actual:

 

Microsoft Windows 8, 2012

Not Addressed

 

Microsoft Windows 8.1, Windows 2012 R2

Pre-requisites

Actual

 

Use “Bytes Per FileRecord Segment” > 1K

The default “Bytes per filerecord segment” is 1K.

Let us determine a logical disk’s current setting:

Syntax:


fsutil fsinfo ntfsinfo [logical-disk]

Sample:


fsutil fsinfo ntfsinfo C:

Output:

Here is what things look like on my laptop:

fsutilFsInfoNTFSInfo

 

 

Quick Explanation:

  • You want to pay attention to “Bytes per FileRecord Segment“.

To correct, you want to backup the data somewhere else, reformat the drive using the /L argument, and restore the data.

 

Normal Data Files

On normal data files, not Snapshot files, one can keep an eye on the following tracks:

More files is better

As the corruption occurs at the file level and the chance you will see it increases as your file size increases, one of the things you can do is revisit your file size planing.

To attack this problem, I will suggest that you:

  • Partition your tables as much possible
    • Use partition keys that trend towards equal distribution

For each File Group, presize files

SQL Server uses proportional fill which means that the engine will seek out the file that has the most free space, and write to that.

We will use that knowledge to our advantage by doing the following:

  • Preallocate the file sizes
  • Use same file growth size
  • Periodically review file sizes, utilization, and growth pattern

 

Review File Growth Causation

What is causing your file to grow?

If you are dealing with Snapshot, I think one wants to see how what is essentially a spurious file gets populated with data.

  • Data Provisioning and Upkeep
    • When initially created, a Snapshot will be allocated with the same size as its template file
    • The template file is created hollow
    • When a request is received by the originating file to change or remove its contents, the impacted data is first copied to the Snapshot file, and the change is applied unto the Originating File.
    • This twin process ensures that we preserve a true point in time copy of the data
  • Request for Data
    • Most query for data is usually sourced from the originating file
    • The only data that we get from the Snapshot file, is data that has been updated or removed from the Origination file

 

Again, changed and deleted data are copied unto the Snapshot file.  With more changes, the Snapshot file continues to grow.

To stem the tide:

  • Target non-transactional data changes to a different database
  • Review and compare in-place versus forwarded updates
  • Revisit your architecture
    • How are out-dated records expired?
      • Are they updated in-place with new ones
      • Are they kept in place and new rows added; and delineation is that new rows have increasing key values (identity columns, effective dates, sequential GUIDs)

Sparse Data Files

The areas stated about are worth exploring for normal data files.  But, what about datafiles created during Snapshot creation.

Google on the right things

Google is really good.  But, one has to get into trouble first.  And, then find concise names for what ills one.

Here are some pertinent names:

  • Large File Size Records

 

Related Technology, Books, and Blog Postings

Book – Optimizing Hyper V Storage by Mitch Tulloch

https://books.google.com/books?id=mrlCAwAAQBAJ&pg=PT61&dq=ATTRIBUTE_LIST_ENTRY+file+info&hl=en&sa=X&ved=0CCkQ6AEwAGoVChMIqe2Ni5mQxgIVVyuICh0nFwI1#v=onepage&q=ATTRIBUTE_LIST_ENTRY%20file%20info&f=false

In his book, Optimizing Hyper V Storage, Mitch Tulloch, mentioned that one should pay attention to the “Bytes Per File Record Segment” when one is using “Virtual hard disk format types”.

He attributes his quote to Carlos Mayol Berral.  Carlos is a Premier Field Engineer with Microsoft.

 

Summary

In summary, the problem is not so much SQL Server.

It is reflective of how we build software these days.

Everything is a bit like lego; each build on the other.

In many cases, the underlying plumbing is very good.

In a few others, the use case outpaces the original intent of the technology.

For the benefit of our current discussion, Database Snapshot, and later Database Consistency Checker, relies on sparse files.

To conserve storage and realize quick provisioning, Database Snapshots rests quite heavily on the original data files.  As data in the original data file is changed, the sparse files gets a copy of the changed data.

Because of the means of duplication, we run into problem of volatility and fragmentation.

Fragmentation results is need to keep more ATTRIBUTE_LIST_ENTRY structures.  The more of these structures we keep, the more we find that the original 1K allocated to the Attribute List is insufficient.

 

Big data

Big data is a good problem to have.  The only downside is that it takes a while for the solutions to make their way out of the labs.

Your only triumph is that you know what is causing some of your migraine.

 

References

Microsoft

Partners

Veeam

Blogs

CheckDB

 

Proportional Fill

 

Q/A

CheckDB

 

 

Snapshot

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s