Wednesday, September 30, 2015

Checking Your Digital Content: What is Fixity, and When Should I be Checking It?

Checking Your Digital Content: What is Fixity, and When Should I be Checking It? Paula De Stefano, et al. NDSA. October 2014.
     A fundamental goal of digital preservation is to verify that a object has not changed over time or during transfer processes. This is done by checking the “fixity” or stability of the digital content. The National Digital Stewardship Alliance provides this guide to help answer questions about fixity.

Fixity, the property of a digital file or object being fixed or unchanged, is synonymous with bit-level integrity and offers evidence that one set of bits is identical to another. PREMIS defines fixity as "information used to verify whether an object has been altered in an undocumented or unauthorized way." The most widely used tools for fixity are checksums (CRCs) and cryptographic hashes (MD5 and SHA algorithms). Fixity is a tool but by itself it is not sufficient to ensure long-term access to digital information. The fixity information must be used, such as audits of the objects, replacement or repair processes, and other methods to show that the object is or will be understandable. Long term access means the ability to "make sense of and use the contents of the file in the future".

Fixity information helps answer three primary questions:
  1. Have you received the files you expected?
  2. Is the data corrupted or altered from what you expected?
  3. Can you prove the data/files are what you intended and are not corrupt or altered? 
Fixity has other uses and benefits as well, which include:
  • Support the repair of corrupt or altered files by knowing which copy is correct 
  • Monitor hardware degradation: Fixity checks that fail at high rates may be an indication of media failure.
  • Provide confidence to others that the file or object is unchanged
  • Meet best practices such as ISO 16363/TRAC and NDSA Levels of Digital Preservation
  • Support the monitoring of processes to monitor content integrity as content is moved
  • Document provenance and history by maintaining and logging fixity information
Workflows for checking the fixity of digital content includes:
  • Generating/Checking Fixity Information on Ingest
  • Checking Fixity Information on Transfer
  • Checking Fixity at Regular Intervals
  • Building Fixity Checking into Storage Systems
Considerations for Fixity Check Frequency include:
  • Storage Media: Fixity checks increases media use, which could increase the rate of failure
  • Throughput: Your rate of fixity checking will depend on how fast you can run the checks
  • Number and Size of Files or Objects: Resource requirements change as the scale of objects increase
Fixity information may be stored in different ways, which will depend on your situation, such as:
  • In the object metadata records
  • In databases and logs
  • Alongside content, such as with BagIt

No comments: