Monday, April 24, 2017

Three Keys to Digital Preservation: Management, Technology, and Content.

Three Keys to Digital Preservation: Management, Technology, and Content. Edward Corrado, Heather Moulaison Sandy. ACRL Webinar.  Apr 12, 2017.
     This is a webinar by Edward Corrado and Heather Moulaison Sandy that examines the basics of digital preservation, starting with what it is and what it is not. They then examine three fundamental and interrelated concerns in digital preservation: management, technology, and the content. The webinar also looks at:
  • The life cycle of digital objects
  • Things to know before starting digital preservation projects
  • Preservation techniques designed to endure changes in technology, as well as models and technical resources currently available
Some notes from the webinar:
  • Digital preservation is the active management of digital content over time to ensure ongoing access.
  • Digital objects are mediated by technology
  • It is not possible to leave the digital object alone and expect it to survive
  • By definition, digital preservation is a long-term activity. It requires policies to support this
  • A preservation plan must balance priorities over time
  • The greatest danger to digital materials is that we forget the meaning of them
  • Preservation metadata supports the long-term access and use of content
  • It is important to get content creators on board with preserving and describing the content, since they know the field and the content, and they will potentially be the content users
  • Important steps to take now;
    • Identify and organize content
    • Manage multiple copies of the content
    • Do a risk assessment of your digital operations
    • Document your processes and decisions
Digital preservation is an opportunity that can be both challenging and exciting.

Tuesday, April 18, 2017

Understanding PREMIS

Understanding PREMIS. Priscilla Caplan. Library of Congress Network Development and MARC Standards Office. 2017.
     PREMIS stands for "PREservation Metadata: Implementation Strategies". This document is a relatively brief overview of the PREMIS preservation metadata standard. It can also serve as an "gentle introduction" to the much larger document PREMIS Data Dictionary for Preservation Metadata. PREMIS defines preservation metadata as "the information a repository uses to support the digital preservation process."  Preservation metadata also supports activities "intended to ensure the long-term usability of a digital resource."

The Data Dictionary defines a core set of metadata elements needed in order to perform preservation functions, so that digital objects can be read from the digital media, and can be displayed or played. It includes a definition of the element; a reason why it is part of the metadata; also examples and notes about how the value might be obtained and used.  The elements address information needed to manage files properly, and to document any changes made. PREMIS only defines the metadata elements commonly needed to perform preservation functions on the materials to be preserved. The focus is on the repository and its management, not on the content authors or the associated staff, so it can be a guide or checklist for those developing or managing a repository or software applications. Some information needed is:
  • Provenance: The record of the chain of custody and change history of a digital object. 
  • Significant Properties: Characteristics of an object that should be maintained through preservation actions. 
  • Rights: knowing what you can do with an object while trying to preserve it.
The Data Model defines several kinds of Entities:
  • Objects (including Intellectual Entities)
  • Agents
  • Events
  • Rights
PREMIS provides an XML schema that "corresponds directly to the Data Dictionary to provide a straightforward description of Objects, Events, Agents and Rights."

Monday, April 17, 2017

Rosetta Knowledge Center

Rosetta Knowledge Center. Ex Libris. April 17, 2017.
     One of the things that I like about Rosetta, is the Ex Libris commitment to an open system. While the software may be proprietary, the essential content is open. The permanent objects and metadata are stored openly, so that they can be accessed or managed outside of the Rosetta software.

Another area that Ex Libris has opened is their Knowledge Center. This is very helpful in training new employees, learning new things about the software, or refreshing my memory. The open website includes:
  • Product Documentation
  • Training: Learn new skills with tutorials, recorded training and other materials
  • Release Notes about the features and capabilities of each product version
  • Implementation Guides that explain the methodology and requirements
  • Knowledge Articles providing answers to help answer questions

Saturday, April 15, 2017

ETD+ Toolkit

ETD+ Toolkit. Dr. Katherine Skinner, et al. Educopia Institute. April 10, 2017.
     Very helpful website for dealing with ETDs. The Toolkit is an open set of six modules to help students create, store, and maintain their research outputs. It was designed to:
  • Help administrators understand the digital research outputs students are creating
  • Help administrators assess what to collect and care for as part of the institutional memory
  • Help students make sure that research outputs are in durable formats and on durable devices;
  • Help students make informed decisions about file formats, documentation, and rights.
The Modules, which include "Learning Objectives, a one-page Handout, a Guidance Brief, a Slideshow with full presenter notes, and an evaluation Survey", are:
  1. Copyright: How can students gain appropriate permissions and how can students signal copyright for their own works?
  2. Data Organization: How can students structure, describe, store, and deposit data and other research files for reuse and/or future access?
  3. File Formats: How will the formats students choose make future access to their research easier or more difficult?
  4. Metadata: How can students store information describing their files to make sure they can tell what they are in the future?
  5. Storage: How can students make well informed choices about where to store their research materials?
  6. Version Control: What mechanisms can students use to make it easier to see the history of a file with multiple versions?
"In a 2014 survey of nearly 800 students across nine universities, students reported that non-PDF files - including research data, video, digital art, and software code - are either as important or more important than the Electronic Thesis and Dissertation (ETD) PDF as research outputs and evidence. Fully 80% of these students are producing non-PDF research outputs, most commonly tabular data (43%), digital images (38%), software code (29%), and digital text (28%)."
.
The ETD+ Toolkit provides introductory training for data curation and digital longevity techniques. It helps students identify and offset risks and threats to their digital research.

Tuesday, April 11, 2017

It’s not just a word

It’s not just a word. Helen Hockx. Things I cannot say in 140 characters.  April 7, 2017.
     Post that talks about her new job, to coordinate and develop a campus-wide strategy, and to oversee its implementation. Digital assets are managed but it now provides the opportunity to revisit the topic and address the gaps.  "A key finding is the strong focus on “now” – archiving and preservation are routinely overlooked. As a result, some digital assets have been lost and some are at risk."  A recommendation, considering "the 3 pillars of policy, process and technology" is to add “digital resources” to the university's goals where superb stewardship is required. Adding the word “digital” or calling out “digital resources” specifically, may not seem needed by some, but it emphasizes the need to "do a much better job with digital assets, if we applied the same rigor and coordinated approach." We still have a ways to go with digital archiving and preservation.

"So it is not just a word. Digital assets are a new class of resources which requires active care and management over time.  Adding it to the strategic mix is a recognition of their value, and of digital stewardship as a strategic priority. No. it is not just a word, it will have to come with commitment, ownership and resources." Some day we can remove the word “digital” from our strategic plan, "when preservation of digital assets is embedded in the organisational culture and operations, when there is no need to even mention it."

Monday, April 10, 2017

Encoding and Wrapper Decisions and Implementation for Video Preservation Master Files

Encoding and Wrapper Decisions and Implementation for Video Preservation  Master Files. Mike Casey. Indiana University. March 27, 2017.
     "There is no consensus in the media preservation community on best practice for encoding and wrapping video preservation master files." Institutions preserving video files long term generally choose from three options:
  • 10-bit, uncompressed, v210 codec, usually with a QuickTime wrapper
  • JPEG 2000, mathematically lossless profile, usually with an MXF wrapper
  • FFV1, a mathematically lossless format, with an AVI or Matroska wrapper
The few institutions digitizing and preserving video for the  long-term are roughly evenly divided between the three options above. This report examines in detail a set of choices and an implementation that has worked well for their institution. Originally they chose the first option, but with recent advances of FFV1, they reopened this decision and initiated a research and review process:
  • Exit strategy research and testing
  • Capture research (use FFmpeg within their system to generate FFV1 files).
  • Comparison of issues
  • Consultation with an outside expert
Results:  Research into exit strategies, they were able to move FFV1 files to a lossless codec with no loss of data. They decided to capture using FFmpeg, which requires developing a simple capture tool, and developed specifications for a minimal capture interface with FFmpeg for encoding and wrapping the video data.

Technical:  identified a number of key advantages to FFV1, including:
  • roughly 65% less data than a comparable file using the v210 codec
  • open source, non-proprietary, and hardware independent
  • largely designed for the requirements of digital preservation
  • employs CRCs for each frame allowing any corruption to be associated with a
  • much smaller digital area than the entire file
FFV1 appears to be "trending upwards among developers and cultural heritage organizations engaged in preservation work". They also chose the Matroska wrapper, which is an audiovisual container or wrapper format in use since 2002, and which is a more flexible wrapper option.

As more and more archives undertake video digitization" they will not accept older and limited formats" (AVI or MOV), but they will be looking for standards-based, open source options developed specifically for archival preservation. "Both FFV1 and Matroska are open source and are more aligned with preservation needs than some of the other choices and we believe they will see rapidly increasing adoption and further development."

Implementation: They developed a quality control program to validate that the output meets their specification for long-term preservation and checks the FFV1/Matroska preservation master files. These files are viewed using the VLC media player, a free open source cross-platform multimedia player that supports FFV1 and Matroska

Currently, they have created over 38,000 video files using FFV1 and Matroska. "We have chosen two file formats that are open source, developed in part with reservation in mind, and on the road to standardization with tools in active development. We have aligned ourselves with the large and active FFmpeg community rather than a private company. While the future is ultimately unknowable, we believe that this positions us well for long-term preservation of video-based content."


Saturday, April 08, 2017

New Home and Features for Sustainability of Digital Formats Site

New Home and Features for Sustainability of Digital Formats Site.  Kate Murray, Jaime Mears. The Signal. April 6, 2017.
     The Library of Congress web site, Sustainability of Digital Formats, contains "the technical aspects of digital formats with a focus towards strategic planning regarding formats for digital content, especially collection policies." The formats are divided into the type of object, which includes:
  • still image, sound, textual, moving image, web archive, datasets, geospatial and generic formats
The website shows the relationships between formats, including the sustainability factors and the quality and functionality for each content category.
  • Disclosure
  • Adoption
  • Transparency
  • Self-documentation
  • External dependencies
  • Impact of patents
  • Technical protection mechanisms
The new website is at loc.gov/preservation/digital/formats and it now includes
  • The PRONOM ID and the Wikidata Title ID, both which help to document the formats, and 
  • The Library of Congress Recommended Formats Statement
The digital formats site continues to evolve to meet the Library’s and the digital preservation community’s changing needs.

Friday, April 07, 2017

How a Browser Extension Could Shake Up Academic Publishing

How a Browser Extension Could Shake Up Academic Publishing. Lindsay McKenzie. The Chronicle of Higher Education. April 06, 2017
     There are several open-access  initiatives. One initiative, called Unpaywall, is a just a browser extension. Unpaywall is an open-source, nonprofit organization "dedicated to improving access to scholarly research". It has created a browser extension to hopefully do one thing really well: instantly deliver legal, open-access, full text as you browse. "When an Unpaywall user lands on the page of a research article, the software scours thousands of institutional repositories, preprint servers, and websites like PubMed Central to see if an open-access copy of the article is available. If it is, users can click a small green tab on the side of the screen to view a PDF." A legally uploaded open-access copy is delivered to users more than half the time.

"It’s the scientists who wrote the articles, it’s the scientists who uploaded them — we’re just doing that very small amount of work to connect what the scientists have done to the readers who need to read the science." Open-access papers have the information but don’t always look like the carefully formatted articles in academic journals. Some users might not feel comfortable citing preprints or open-access versions obtained through Unpaywall, "without the trappings and formatting of traditional paywalled publishing," even if the copy is credible.