Friday, September 30, 2016

Victoria University of Wellington Selects Ex Libris Rosetta for Preserving and Managing Digital Assets

Victoria University of Wellington Selects Ex Libris Rosetta for Preserving and Managing Digital Assets. Press release. August 2016.
     "Victoria University of Wellington has selected Rosetta as its digital preservation and asset management solution. The Victoria University Library serves as the custodian of over 15,000 digitized historical cultural works, part of the New Zealand Electronic Text Collection, and over 11,000 born-digital theses and research projects in institutional repositories, including several other smaller digital collections. Rosetta will be a key element of the Library’s digital assets management and preservation processes and will enable researchers in any location to read or view the digital objects in the Library's extensive collections."

"Adopting Rosetta will enable us to manage, maintain, and preserve these collections in the long term, as well as grow them in the future. As our collections increase, standards of digital preservation and description become more vital to the continuity and discovery of materials for future knowledge creation. It’s not just about the students and researchers of today. It’s about the students and researchers of the future, too.”

Wednesday, September 28, 2016

The Document Life Cycle Road to Digital Preservation and Archiving

The Document Life Cycle Road to Digital Preservation and Archiving. Brett Claffee. Document Strategy. Aug. 18 2016.
     What is the difference between documents and records in today’s digital enterprises? Documents do not become records until they are declared a record. "When a document is first created, it is under its author’s control and typically goes into a workflow, put simply—a document life cycle".  When a document is declared a record, it moves from the author’s control to corporate control under the retention schedule, which determines what eventually happens to the record. Typically, a document’s life cycle involves these phases:
  • Creation
  • Management
  • Storage
  • Retrieval
  • Distribution
  • Disposal
When a document is declared a record, it "becomes subject to corporate control and cannot be destroyed until it meets all of its retention obligations, including being released from any legal, financial, or regulatory holds". Record types with long-term retention requirements may be kept permanently.

There are differences between digital preservation and archiving and how they look at documents and record life cycles. Records life cycle adds retention and archiving as a phase,which includes document destruction as part of the document life cycle.

The digital archiving and preservation is a multi-layered process, that deals with "provenance and authentication practices, to chain of custody and accountability, to format transformations—all designed to keep information legitimate, useful, and, if required for long-term retention, preserved". With the large volume of data inf recent years, data archiving has come to the forefront. It is estimated that there are more than 30 billion documents used each year in the United States. Archiving provides five critical advantages:
  1. Ensuring regulatory compliance for data retention, data immutability, and audit trails
  2. Improving performance and productivity of current business applications
  3. Making archived records widely available and easy to retrieve by authorized users
  4. Removing the problems of maintaining obsolete systems just for the data
  5. Reducing IT costs and time for back-up, upgrades, and other needs
For records, organizations can follow a retention schedule, but a retention schedule for documents and data are not as clear. The life span of these are often looked at in terms of use, informatics and analytics. A life cycle approach can assure consistent control.

Tuesday, September 27, 2016

Digital Preservation File Names

Digital Preservation File Names. Chris Erickson. September 27, 2016. Updated 31 Oct. 2016.
     While processing some collections, we had difficulty creating the mets xml files because of some characters in the file names. The characters may be valid in some systems, but may cause difficulties in others. From comments on the internet it appears that there are only a few characters that are forbidden, but experiences from a number of people suggest that some systems may not support all the characters in file names. We decided that it was better to use only alpha numeric characters, and underscores as a separator, and a fullstop (period) before the extension.  When preserving digital files it is important to remember that the files may be used by a variety of computer systems over their life time. To have the greatest chance of keeping the files usable in the future it is best to follow some basic standards when naming files.

Here are some suggestions we are considering:
  1. Decide on file naming conventions so that file names have meaning.
  2. File extensions can help determine the type of file it is (such as .txt, .doc, .wav, .jpg)
  3.  File name length varies for different operating systems, so generally stay under 30 characters
  4. Avoid spaces in file names. Spaces are an acceptable character for most file names, but they can cause difficulty when processing. Underscores may be used as a separator.
  5. Avoid punctuation and special characters. The safest characters to use are numbers and letters. Most operating systems are case sensitive. Some characters to avoid for our preservation system are spaces, ampersands, brackets, and commas
  6. Keep the filenames to a reasonable length and it is best if they are under 30 characters.
  7. Don’t start or end the filename with a space, special characters, or punctuation marks.
  8. These conventions apply to folders as well as files
Characters that others have had difficulties with and which should not be used in filenames:

# pound                      < left angle bracket               $ dollar sign                      + plus sign
% percent                   > right angle bracket             ! exclamation point           ` backtick
& ampersand             * asterisk                               ‘ single quotes                   | pipe
{ left bracket              ? question mark                     “ double quotes                = equal sign
} right bracket            / forward slash                       : colon                                      
\ back slash                 blank spaces                          @ at sign

It appears that xml in general has a specific problem with ampersands and brackets in file names. Some other resources of information:

Monday, September 26, 2016

Selection and Appraisal in the OAIS Model

Selection and Appraisal in the OAIS Model. Ed Pinsent. DART Blog. 7 September 2016.
     The post asks if the OAIS Model accommodate the skills of selection and appraisal, then suggests that it cannot.  The Model presents an over-simplified view where in a state that is all ready to preserve, which ignores the beginning processes.There is a need to define the pre-ingest stage in OAIS, but there needs to be  a greater recognition of the archivists' Selection and Appraisal skills, can have tremendous value in digital preservation. Archivists assess the value of the content in a contextual framework, based on other records in the archive and in the context of provenance. It requires an understanding of context, provenance, record series, to help identify the potential value of content. A Series model is the "foundation for all Archival arrangement, and is the cornerstone of our profession". It is difficult to see where the record / archival series is in all this.  "The integrity and contextual meaning of a collection is being overlooked, in favour of this atomised digital-object view.

OAIS, if strictly interpreted, could bypass the Series altogether in favour of an assembly line workflow that simply processes one digital object after another."  The blog post asserts that there is a need to rediscover the value of Appraisal and Selection and its importance in the digital realm. 

Assessing and Quantifying Risk to Digital Media Materials

Assessing and Quantifying Risk to Digital Media Materials. Lance Thomas Stuchell. Bits and Pieces. August 31, 2016.
     A post written by Sarah Breen, Alix Norton, and Alexa Hagen. Archives are increasingly facing challenges in preserving digital media materials; creating digital processing workflows and workstations is one often discussed challenge. This article discusses a framework for assessing risk of loss to digital archival materials and shows that the methodology can highlight materials most susceptible to loss. This will help administrators demonstrate the need for immediate intervention and processing.

The methodology used a formula for calculating risk to physical collections:
"The formula yields a calculation of the magnitude of a given risk (MR) by multiplying the factors of the fraction of the collection that is susceptible (FS), loss of value (LV), probability of risk (P), and extent of the risk (E). By giving each of these factors a value between 0 and 1, we calculated MR values for the overall magnitude of a variety of risks, also between 0 and 1. While this formula is often used to assess risks over a 100 year period, due to the nature of the short lifespan and rapid obsolescence of digital media, we have used this formula to assess risks over a 10 year period".
External risks would affect the collection as a whole, and would include fire, theft, damage, and lack of funding to continue preservation projects. Internal risks are more specific to the physical digital media format, such as obsolescence of format and media degradation. Management, funding, administrative decisions and the storage environment can also be areas of high risk.

The highest risks assessed include:
  1. degradation and obsolescence, 
  2. lack of funding, and 
  3. potential loss of management support. 
The article recommends actions be taken to mitigate these risks early by:
  • migrating digital content to a stable content management system, 
  • lowering relative humidity of the storage environment, 
  • securing the lowest cost digital storage option that remains aligned with the library’s policy, and 
  • advocate to library and university administration showing the need for preservation
These recommendations should significantly reduce the highest risks and help ensure the preservation of the digital information.

Saturday, September 24, 2016

Digital Preservation Projects in 2016

Digital Preservation Projects in 2016. Chris Erickson. 6 September 2016.
     For the past several months we have been working on a number of digital preservation projects, which include:
  • Reworking the ContentDM to Rosetta ingest interface. Orginally it was just for images or simple objects. It has been expanded to include also the compound objects particularly those  with page level metadata, page by page transcriptions, and such.
  • Improving our unstructured data ingest process. It uses a spreadsheet template for metadate related to files to be ingested into Rosetta. The content creator can enter the metadata or we have a file discovery tool that can traverse a directory structure and enter file and folder metadata into the spreadsheet template. The collection I am just finishing with this tool totaled about 45,000 tiff images.
  • Restructuring our digital ingest workflows from project based into a digital pipeline.  We now have a shared drive between Rosetta and our content creators, more storage disk space, and this makes it easier to transfer files at the end of a project, or they can transfer files as they go if it is a long project. 
  • Using all this to keep up with all new projects being created and adding them to Rosetta,  which allows more time to ingest the backlog of projects waiting for preservation. The usual rate of ingest now, depending on preparations of the collections is usually a couple of TBs each week.

Thursday, September 22, 2016

Content Delivery Drives The Move To The Cloud

Content Delivery Drives The Move To The Cloud. Tom Coughlin. Forbes. Sep 13, 2016.
     The growing reliance on the Internet is also increasing cloud-based services for collaborative workflows and content delivery in the Media and Entertainment Industry. This is causing a shift from capital expenses to operating expenses for media and entertainment content storage.  Cloud storage for the media and entertainment industry is projected to grow from $2.5 billion in 2016 to over $20 billion by 2021.  Archiving and preservation is a large part of this, seen in this chart.

Thursday, September 15, 2016

The Secret Libraries of History

The Secret Libraries of History. Fiona Macdonald. 19 August 2016.
     Religious or political pressures have meant that books have been hidden throughout history – whether in secret caches or private collections. This article looks at libraries that have been preserved over time, either to keep them hidden, or because of neglect.
  • Syria’s secret library currently beneath the streets of a suburb of Damascus
  • The Library Cave on the edge of the Gobi Desert in China, sealed for almost 1000 years.
  • The Vatican Secret Archives with papal correspondences going back over 1000 years,
  • The Cairo Genizah in a wall of the Ben Ezra synagogue containing almost 280,000 Jewish manuscript fragments from the ninth to the nineteenth centuries
  • A Hidden Medieval Archive found in papers used in binding medieval books
[Note: This is a good reminder of what it is we are trying to do, to keep important content for future generations. Chris]

Tuesday, September 06, 2016

The Pathways of Research Software Preservation: An Educational and Planning Resource for Service Development

The Pathways of Research Software Preservation: An Educational and Planning Resource for Service Development. Fernando Rios. D-Lib Magazine. July/August 2016.
     A great deal of effort has gone into preserving digital research data, but not as much for the software and code preservation to use the data. The computer programs to view, process, analyze, and create data are an integral part of the research workflow. There are "many issues remain in regards to identifying and capturing metadata, dependencies, support for attribution and citation, infrastructure development, and developing appropriate workflows to enable service provision." The article looks at the development of a visual representation of software preservation at Johns Hopkins University, which looks at: 
  1. major approaches to software preservation for research software and data
  2. a need to evaluate our capacity to offer software preservation services
  3. and the need for a road-map
One way to look at this is to view the development and use of software in the research process in general phases:
  • developing concepts and theory
  • writing the software 
  • obtaining all objects required for execution 
  • collecting inputs and setting parameters, and 
  • making use of the results
A visual approach for evaluating the possible Pathways of Software Preservation, a user can better understand the capacity for software preservation activities, as well "a better appreciation of the nuances of research software preservation and sharing".

Monday, September 05, 2016

Preservation Challenges in the Digital Age

Preservation Challenges in the Digital Age. Bernadette Houghton. D-Lib Magazine. July/August 2016.
     The rapidly evolving digital preservation field has many preservation challenges:
  • Digital materials are more at risk than analogue
  • Preserving digital materials is also providing access to the material
  • Ensuring the infrastructure that renders the file is preserved or replicated
  • Focal areas changing and best practices still under debate.
"The optimal preservation strategy for individual organisations will differ according to their requirements, resources and data type. Each strategy comes with its own set of challenges, many of which are dependent on, or impacted in some way by, other challenges. This article will cover what the author sees as the major challenges for digital preservation at this point in time, covering a range of technical, administrative, logistical and legal aspects."

Other challenges:
  • Data volumes. Digital storage is becoming cheaper, but not every file and every version of it can and should be stored or preserved. Selecting what to preserve and when to take preservative action becomes more complex with a larger volume of data and a wider range of storage media. This  increases the risk of failing to preserve materials of historical value. There is also a higher risk of data not finding data because of poor metadata.
  • Archivability. One of the most fundamental challenges in archiving is determining what should be preserved and the extent of preservation.
  • Multiplicities. Materials born digital today are likely to have multiple copies in multiple versions stored in multiple locations, possibly under multiple filenames and in multiple file formats.
  • Hardware and storage. Obsolescence, deterioration of media and hardware mechanical failure increase the risk of loss. The cloud is increasingly used for storage, but there are also significant issues with using it.
  • File formats. File formats were considered a big risk in digital preservation but they have not proven to be the overwhelming danger that it was initially perceived to be. Proprietary file formats continue to pose a challenge.
  • Metadata. Metadata is probably the most important aspect of digital preservation. Materials with poor metadata may be undiscoverable, and their authenticity, verifiability and their context unclear.
  • Legalities. Digital preservation presents some complex legal issues
  • Privacy. Material chosen for preservation may contain private and confidential information, and its unauthorised release may lead to legal action.
  • Resourcing. Preservation costs involve not just the actual digitisation, but also storage, infrastructure, staff resourcing and training, ongoing maintenance and auditing of the digitised materials. There are also costs associated with providing access
The challenge is to use the scarce resources to preserve the most important materials, using the most cost-effective and efficient methods. Even choosing not to preserve materials also involves costs. Those who will benefit most from current preservation programs are future generations, which makes it difficult to justify expenditure on digital preservation, since there is little current benefit. The "best that the preservation community can do with digital material is to make educated guesses based on a few decades of mostly anecdotal experience".

"The challenges in digital preservation involve dealing with not just the technologies of the past, but also those to come". The digital preservation field is developing rapidly and the people working with digital materials need to keep up with the changes.

Friday, September 02, 2016

TRAC Certified Long-term Digital Preservation: DuraCloud and Chronopolis for Institutional Treasures

TRAC Certified Long-term Digital Preservation: DuraCloud and Chronopolis for Institutional Treasures. Website. 1 September 2016.
     "An institution’s identity is often formed by what it saves for current and future access. Digital collections curated by the academy can include research data, images, texts, reports, artworks, books, and historic documents help define an academic institution’s identity."

DuraSpace and the Chronopolis service at the University of California at San Diego’s  announce the DuraCloud Enterprise Chronopolis subscription plan for digital preservation. It stores digital content in Amazon and in the Chronopolis network. It provides geographic replication and synchronization of content between three storage locations, and has content integrity monitoring in a dark storage option. Plan options are a combination of Amazon S3, Amazon Glacier, and SDSC.

Pricing and Plan details
DuraCloud Preservation                    Subscription Fee: $1,175 Storage: $700/TB
DuraCloud Preservation Plus             Subscription Fee: $1,175 Storage: $825/TB
DuraCloud Enterprise                        Subscription Fee: $5,250 Storage: $500/TB
DuraCloud Enterprise Plus                Subscription Fee: $5,250 Storage: $625/TB
DuraCloud Enterprise Plus                Subscription Fee: $5,550 Storage: $1,200/TB (Option 2)
DuraCloud Enterprise Chronopolis    Subscription Fee: $2,750 Storage: $500/TB (Ingest and retrieval fees extra)

Thursday, September 01, 2016

Digital Preservation: Keep calm and get on with it!

Digital Preservation: Keep calm and get on with it! Matthew Addis. Archives and Records Association 2016. 30 August 2016.
     This is a presentation about simple and practical steps towards digital preservation using open source tools best practices. The benefits of a digital preservation strategy is increasingly clear, but implementing the strategy can be overwhelming. The presentation lists resources and tools, such as the Digital Preservation Coalition handbook, the COPTR tool website, DROID, and the Data Assessment Framework. Sometimes complex resources can also be overwhelming and make decisions more difficult. "If you think that you’re not able to ‘do enough’ or ‘do it properly’, then this can result in doing nothing because this feels like the next best thing." But doing nothing has serious consequences in the digital world. "It’s almost always better to get on and do something than it is to do nothing." The presentation also refers to ‘parsimonious preservation’ or starting with minimal actions. Understand what you have and try to keep it safe through safe copies. It is important to understand formats and to use the tools to keep the content safe. "File format identification gives the information needed to make decisions." Another important part is to start simple and add functionality as you go. The maturity model from the National Digital Stewardship Alliance is a good guide.