« [CSG Winter 2007] Small File File Systems - Jim Pepin | Main | [CSG Winter 2007] Storage at Indiana University »
January 3, 2007
[CSG Winter 2007] MBooks at University of Michigan
- Project partnership with Google publicly announced in 2004 December - scanning 7 million print volumes over 4-6 years. Direct scanning costs are borne by Google.
UM receives a copyof all digital files, including OCSR and metadata which can be used to build services. UM can share, with some restrictions. Each volume page produces 2.01 files on average - will be about 2.2 billion files, 380 TB of data. Sustained rate of 3.16 MB per second for four years.
Data characteristics - well defined file formats - image files are TIFF or JPEG 2000, OCR files and metadata are UTF-8 text. Indefinite retention. Files are largely static. Much material is in copyright, so requires security practices.
Mbooks service - can search and look at books online.
There's interest in using the OCR data for textual analysis research.
Technorati Tags: CSG-Winter-2007, google, higher-ed, digital-libraries, storage
Posted by oren at January 3, 2007 3:32 PM
Comments
Post a comment
Thanks for signing in, . Now you can comment. (sign out)
(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)