Thursday, August 20, 2009

Digital Preservation Matters - 20 August 2009

The Next-Generation Architecture for Format-Aware Characterization: About JHOVE2. Website. August 18, 2009.

Because of limitations in the original JOVE program, NDIIPP, the California Digital Library, Portico, and Stanford University are collaborating on a new project. An alpha prototype is available for download. The project looks at identification, validation, feature extraction, and policy-based assessment for simple digital file and potentially complex digital object that may be in multiple files.The Digital Continuity Action Plan. Website. Archives New Zealand. 10 August 2009.

A unique inclusive and unified initiative in New Zealand to prevent important public records from being lost and to ensure information will be available tomorrow. A brochure gives an overview of their plan. It includes a note that “Sixty-seven percent of New Zealand public sector agencies hold some information that they can no longer access.” The full plan is set out in a 48p. pdf. The plan is to make the information available and authentic / trusted. If no action is taken, digital information will be lost. A proactive approach is needed to maintain digital information for the future. “Failure to implement digital continuity strategies will result in irretrievable loss of information.” Six goals (explained in detail in the longer document) are:

  1. Understanding: Communicate effectively and have a common understanding of the problem.
  2. Digital information is well-managed from the point of creation onwards.
  3. Infrastructure exists to support the interoperability of systems and efficient digital continuity.
  4. High-value information is identified, so critical information is not lost.
  5. Digital information is accessible now and in the future, and protected from unauthorized use.
  6. Information management is characterized by good governance, leadership and accountability.


Sony to back open e-book format. BBC News. 14 August 2009.

Sony has announced it will use the ePub open format reader instead of its proprietary standard. This will allow Sony the option of making its e-book store compatible with other readers.




Long Term Digital Preservation of Web Sites. Mikael Tylmad. Thesis. Royal Institute of Technology for the Swedish National Archive. May 31, 2009. [38p. PDF]

Websites have become a standard way for organizations to present information to the public. There are a number of archival concerns in keeping this information long term. Few web pages are written in standard HTML anymore; they use a number of different technologies, such as Flash, and many formats. “The fewer file types the better and if they are human readable it is

even better.” This requires archivists to keep the software as well as the entire website. Besides the textual and graphical parts of a web page, the relationship of the parts and how they are presented are important (content and context). Archived sites lose interactivity, become static. Links in Flash etc can be hidden from crawlers and important parts will be lost. Heritrix, used by Internet Archive, is a powerful solution to web archiving. Emulation through virtualization is another powerful solution. Another solution is SWAT (Snappy Web Archiving Tool). The tool, written in Ruby, is available at: http://swat-archiving.sourceforge.net/. It does the following:

  1. Harvests all files from the website and analyzes for future compatibility with DROID.
  2. Screenshots of all web pages are created as tiffs to show the page design
  3. Creates in XML metadata about files, links, etc (METS standard)
  4. The web archive with documentation are put in a tar package with an ADDML description.





Amazon Erases Orwell Books From Kindle. Brad Stone. The New York Times. July 17, 2009.

Amazon remotely deleted some digital editions of the books from the Kindle devices of readers who had bought them. And they appear to have deleted other purchased e-books from Kindles recently.


Chrysler Destroys Its Historical Archives; GM to Follow? Bob Elton. The Truth About Cars. July 26, 2009.

Archives are the foundation of historical research. Without access to primary material (documents, photographs, financial statements, engineering, test reports, etc) historians lack the sources needed to understand the past. Some automakers have worked to preserve and protect their historical documents. However Chrysler and GM have recently closed their library, the librarian laid off. All materials were “offered to anyone who could carry them away.” Many of the GM divisions no longer know the location of their historical documents, how they are organized or how researchers can gain access.


Digital Archives That Disappear. Inside Higher Ed. April 22, 2009

As digital archives have become more important and more popular, there are different opinions about how best to guarantee that they will be available long term. Some think the creators of the archives should keep control, while others believe larger organizations with more resources would be better. The article looks at the example of "Paper of Record," a digital archive of early newspapers with a strong collection of Mexican newspapers. The archive was purchased secretly by Google in 2006; shortly thereafter, the archive disappeared from view. Historians and others complained to Google about the loss of their ability to work. It appears from other sources that the articles are now partially available in the Google news reader.


No comments: