January 2007 Archives
Being gone on work trips two weeks in December and two weeks in January have burned me out on travel (though the almost two weeks in Hawaii in the middle helped resurrect my sanity).
I was supposed to be at the CalConnect meeting in Provo at Novell this week, and people had asked me to come to net@edu and CNI, and I was registered for O'Reilly ETech conference in March, and I did want to attend SXSW this year... but I've decided to skip them all and stay home and get some work done and spend as much time as I can with my family.
I've volunteered to be on the Program Committee for Educause 08 (in Orlando - ugh) so I'll probably have to travel to the initial meeting of that group in April, and I've got an IT Leaders Program meeting in May in Minneapolis. But it'll be good to have a few months at home and stay out of airports and hotels!
Technorati Tags: travel
A group of 30 of us from the UW are spending a couple of days at a workshop on Service Oriented Architecture presented by Pete Lacey from the Burton Group.
It's a high level overview of a lot of detailed technical concepts, but it's very useful, and should provide a common starting point for a group of us who are likely to be discussing implementation of services orientation in the technical framework of the UW.
Interestingly enough, while the material in the Burton presentation is almost entirely oriented towards the use of SOAP to implement web services, Pete was surprised to find that we're also interested in ReST style interfaces. He says we're the first group he's given this workshop to that's even knowledgeable about the SOAP/ReST discussions, and (happily for us), he really knows this area like the back of his hand. He's an self-acknowledged ReST partisan, being the author of the much-cited S Stands for Simple.
Technorati Tags: soa, web-services
A bunch of us from UW spent last Thursday and Friday getting an extended executive briefing at Microsoft. We got to hear about new and future development efforts on a bunch of the product and technology lines and got the opportunity to meet with some of the product development folks. While the content of what we discussed is under NDA (not that we were privileged to any great secrets that I noticed), I do want to note that, as is always the case when I get to spend time with the actual development folks from Microsoft, I came away impressed by the intelligence and knowledge of the folks that work there. Many thanks to Frank Lobisser, our local MS rep, for putting together a couple of interesting and informative days.
Technorati Tags: microsoft
The city of San Francisco has just released its draft (200 pg pdf file) of a major feasibility study for building a "fiber to the premise" network for the city, titled Fiber Optics for Government and Public Broadband:
A Feasibility Study.
The press release is here.
What did they find?
FTTP is the holy grail of broadband: a fat pipe all the way into the home or business--but in the near future only available for a privileged few located in the limited areas of private-sector deployment.
But private-sector networks3 are not meeting this growing demand for bandwidth and speed in an affordable manner. Though there are private-sector FTTP deployments underway in some, limited areas of the United States, none is planned or foreseen for San Francisco.
In this context of private sector disinterest, municipal FTTP would rank San Francisco among the world’s most far-sighted cities -- by creating an infrastructure asset with a lifetime of decades that is almost endlessly upgradeable and capable of supporting any number of public or private sector communications initiatives.
The report proposes building a fiber network first to provide capacity for city government, then to targeted "enterprise zones" in the city, and then to expand it city-wide. The report details various fiber networking technologies and topologies for deployment in San Francisco, examines costs and financing alternatives, and looks at operational options.
This is definitely worth a look, and it will be fascinating to see how the report is received and what comes next in SF.
Technorati Tags: broadband, networking, municipal-networks, fiber
I'm honored to say that I've joined the The Citizens' Telecommunications and Technology Advisory Board (CTTAB) for the city of Seattle.
CTTAB has the responsibility to study and make recommendations to the Mayor and the City Council on issues including cable franchising, municipal networking, technology access, and others.
As part of the process I had to go have my nomination to the Advisory Board approved by the Energy and Technology Committee of the City Council today. They told me to be prepared to talk for a couple of minutes. What actually ended up happening was that Committee Chair Jean Godden and Vice Chair David Della asked a couple of questions about my background and what I thought about providing equitable access to technology for all citizens (you can watch the video (requires RealPlayer) if you're really interested - that part of the meeting starts at about 27 minutes into the video).
What I had prepared to say is, I think, more interesting than what I ended up talking about, so here it is:
I am honored to have an opportunity to serve on the CTTAB.
It will surprise nobody in this room to say that the future grows out of the conversations of the present, or to observe that those conversations are increasingly taking place in ways that we could not have imagined a couple of decades ago, in venues that are enabled by the telecommunications and technology infrastructure that is the very subject matter of this Advisory Board.
The innovative technologies that empower those conversations (such as email, instant messaging, blogs, wikis, social networks, virtual immersive environments, etc) did not grow out of any grand government or corporate scheme, but are the product of thousands of individual innovators - engineers and academics, business people and students, people with ideas and the will to make them happen. It's important that those of us who are knowledgeable and enthusiastic about technology participate in the conversation and work to ensure that those innovations that enhance connections between humans continue to be nurtured and encouraged, and that the environment for individuals to innovate be allowed to flourish.
Thank you for this opportunity and I look forward to working with you on the CTTAB.
Technorati Tags: policy, seattle, technology
I'm happy to say that my ECAR Research Bulletin on social software, titled Digital Rendezvous: Social Software in Higher Education is now available to folks at ECAR member institutions. This bulletin grew out of a workshop on social software at the CSG meeting in Spring of 2006, and talks a bit about what features define social software and make its use interesting in higher education, and what the current state of adoption of some of the social software technologies was at the CSG institutions at that time.
It was fun to get to write this piece - the ECAR Research Bulletins are short (12 page) pieces aimed at executive management in higher education institutions. Toby Sitko at ECAR was great to work with on this project, helping me get the bulletin down to the allowable size from my original draft, which was twice as long. It did lose some detail in the process, so if any of you from ECAR institutions are interested in seeing the original draft I'd be happy to share it.
Those of you who are not from ECAR institutions can at least see the slides (pdf) from the CSG workshop presentation.
One thing I'm not happy about is the way that ECAR's pdf publications don't allow copying of text to the clipboard. If these publications hope to be influential (and I know they do), then making it easy for people to quote sections in other venues is essential - and having to retype text in order to quote a publication in this day and age is simply a barrier to reuse. Given that one of the most stirring sessions at the recent ECAR Symposium was given by UBC's John Willinsky on Sustaining Access to Knowledge and Scholarly Publishing, the use of copy protection on ECAR publications seems antithetical to ECAR's own aims. I know that I myself have been dissuaded from blogging about ECAR publications because of the extra effort involved in copying text into my blog. I urge those of you from ECAR member institutions to let Richard Katz and his crew know how you feel about this - I know I have and will continue to.
Anyway, if you have comments on the bulletin, feel free to leave them attached to this post or send them to me - I'd love to hear them!
Technorati Tags: higher-ed, ecar, social-software
UW Provost Phyllis Wise has set up a blog for discussion and comment on the UW's new vision statement, which includes the great tag line:
Discovery is at the heart of our university
It's a terrific way of using blog technology, and I look forward to following the conversation!
While leading a policy discussion on the mis-named Digital Rights Management technologies (aka copy protection).
"...these technologies have the shelf life of sushi."
Technorati Tags: CSG-Winter-2007, DRM
We had a great workshop on Thursday on collaboration tools and how to approach them in higher education. I was part of the panel that led the presentation, so I wasn't taking notes, but I'm sure the notes will be posted to the CSG web site after the meeting.
For my part in the presentation, I reiterated some of the points I made at last spring's discussion of this topic, and went on to comment that what we're now experiencing in the collaborative tools space is somewhat analogous to the Cambrian explosion, where we have a tremendous proliferation of new species of software appearing almost on a daily basis and combining and evolving at a very rapid rate, making it very difficult to figure out which ones we should engage with at an enterprise level, or even how to construct a meaningful taxonomy of these applications.
Technorati Tags: collaboration, CSG-Winter-2007, social-software
Managing very large files in research computing at IU.
Task force two years ago on research cyberinfrastructure had recommendations concerning storage - Continuing to deliver centralized facilities to support research computing as well as dependable archival storage were identified as important. Large file storage is just a piece of the storage strategy for IU.
They have about a petabyte of spinning disk available for researchers, as well as 4 petabytes of archival storage (the Massive Data Storage System). The "Data Capacitor" captures data from instrumentation.
Data Capacitor uses Lustre OS.
MDSS designed to provide a deep store for large files. Runs HPSS. Interfaces include FTP, Samba, and tar. Radiology is one of the biggest users. Also working with digital library programming. They give the researchers 500 GB for free, and after that they want to discuss it.
Preservation, curation, and long term management of data is a big issue - need to link librarians, computer supporting, and IT professionals. Serge notes that finding ways of accomplishing persistent URIs for data is important.
Backup with mirroring is if you accidentally delete something or introduce bad data in big data sets is a serious problem.
Technorati Tags: CSG-Winter-2007, cyber-infrastructure, storage
- Project partnership with Google publicly announced in 2004 December - scanning 7 million print volumes over 4-6 years. Direct scanning costs are borne by Google.
UM receives a copyof all digital files, including OCSR and metadata which can be used to build services. UM can share, with some restrictions. Each volume page produces 2.01 files on average - will be about 2.2 billion files, 380 TB of data. Sustained rate of 3.16 MB per second for four years.
Data characteristics - well defined file formats - image files are TIFF or JPEG 2000, OCR files and metadata are UTF-8 text. Indefinite retention. Files are largely static. Much material is in copyright, so requires security practices.
Mbooks service - can search and look at books online.
There's interest in using the OCR data for textual analysis research.
Technorati Tags: CSG-Winter-2007, google, higher-ed, digital-libraries, storage
Small files are normal fo rlots of people - people write apps using files as a database substitute - this comes from the desktop computing world. This problem has existed for years - but now people have discovered HPC, but they don't want to rewrite their programs. Small files are deadly to most file systems - some more than others. Creates even more problems with clusters.
People are expecting cheap disk at commodity prices, but that's not fast disk. Virtualization can be deadly as it adds overhead due to the levels of abstraction.
An example - an 1800 compute node cluster at USC. If they're accessing small files, you have to have ways to coordinate file locking and synchronization across the nodes. 3-4 terabits of bandwidth capacity get slowed to nothing if there's lots of small file access going on.
Right now base file system is QFS (Sun). The directory metadata is on separate disks from the data itself, which is great on big files, but hard with small files because of single metadata catalog. There are local parallel file systems on the nodes, which work better for small files. NFS has its own issues with small file access because of the overhead. They've set up "condo disk" as well as condo nodes, so they can have their own file space instead of a virtualized environment.
Some example of small file file systems -
Genomics Group - 10ks files in a single directory.
Natural Language Group - 50-250k files in directory. Many nodes accessing the same dictionaries.
Backups are slower and harder - can't keep the tape spinning if you're doing lots of directory accesses - takes hours instead of minutes.
Ways to help -
- faster disk (helps metadata/directory space)
- distributed file access (qfs)
- no free lunch
Next generation -
- nfs 4 doesn't cut it
- gpfs helps some
- 10 gbps hosts on data plane - nothing but jumbo frames, which might make it worse.
- ram disk for metadata? san diego does it - might help.
- storage management solutions - performance for small files is in question.
-
Technorati Tags: CSG-Winter-2007, higher-ed, research-computing, storage
The afternoon workshop, coordinated by Kitty, is on data storage.
There is, of course, a survey to present. Most of the schools are offering multiple kinds of file services with ever-increasing quotas. Only two schools are offering replication technologies (like Apple or Microsoft's). The predominant technology is direct attached storage, but there is use of Fiber Channel SAN, and some use of iSCSI SAN. Most folks are using TSM for backup.
Most people said that the Library does not provide any data archiving services.
Unsolved problems include (of course) funding, smart data storage, multi-platform access, replacing current distributed file systems - what's next?, virtualization and tiering, more-more-more - keeping up with demand.
Summary - growth in data is a huge problem and an unfunded mandate. Federal requirements for keeping and protecting data for longer periods and unmanaged data are huge issues. Inefficiency is a problem - we're not aligning data with the right solutions. The technologies for storage don't knit together well - there's a duct-tape feeling to the solutions.
Ron Thielen from the University of Chicago is talking about storage.
SAN vs NAS is the wrong question - they're converging anyway. The real question is what APIs do you want to use to provide access to data - files, blocks, objects.
A File System is really a metadata repository and related APIs. Once a vendor understands that it enables really interesting things to happen - Xythos is an example of someone who gets that. Typical storage growth figures are quoted as 39% annually - even more worrisome is the percentage of budget devoted to storage. At the U Chicago, in the last few years they've seen 96% compound annual growth rate.
Gartner predicts "By 2008, nearly 50% of data centers worldwide will lack the necessary power and cooling capacity to support high-density equipment."
What's the storage buzz?
- SMI-S 1.2 (an ANSII standard for storage management) & Aperi (an open source storage management project - part of the Eclipse project).
- Continuous data protection - backs up files as they change.
- Virtualization - heterogeneous (the holy grail), switch-based (Cisco and Brocade - moving virtualization into the SAN itself), HBA (for VMWare or blade centers).
- Global Nape Spaces (File Virtualization) - put something in front of a bunch of NAS devices that looks like a single name. EMC and Brocade have purchased technologies in this area.
- Clustered File Systems and Storage (like Isilon)
- Archival file systems (Archivas, Permabit) - a specialized example of clustered file system.
- Database archiving
- Wide Area File Systems
- Object Based Storage Devices - when you're storing data on storage devices, some metadata can be managed by the device not the storage system. (why would you want to do this?)
- TPM (Trusted Platform Module) in storage devices - TPM in devices and servers exchange certificates - storage devices can be made to not give up access if they're not matched with the appropriate servers.
- Solid State & Hybrid
- Intelligent Storage Grids & Storiage Autonomics - do self-provisioning based on access to policy rules.
Regulatory Effects on Storage -
New Federeal Rules for Civil Procedures causing much FUD.
- "rules also mean that colleges that are in litigation or that suspect they may soon be in litigation cannot destroy electronic evidence they know would be relevant to a lawsuit." (Chronicle of Higher Education)
- means universities will have to keep much better track of data.
Greg Jackson notes that this is a risk management issue where we need to be careful about going to great lengths to solve problems technologically instead of planning on some basic procedures that we might take when or if we have to perform under this law.
Use case - VBoIP and Unified Messaging - talk about unstructured data!
Technorati Tags: CSG-Winter-2007, cyber-infrastructure, storage, research-computing
I'm in Los Angeles for the Winter meeting of the Common Solutions Group, at USC.
The first workshop is on building cyber-infrastructure for research. Bill Clebsch from Stanford frames the discussion by noting that this effort will make ERP implementations see cheap and easy, and he's told his provost that.
There was a survey of the CSG membership on context for research computing.
In the survey 78% of the membership see value in having governance/oversight for research computing, though only 26% have such a body.
The top issue is data center facilities, more than networking. Storage is a major short-term concern.
The predominant support model is raw hosting, with the data center only providing floor space, cooling, and power.
About half of the membership do some central staffing for research computing, but most of that is monitoring facilities and power. 45% of the respondents are doing some support for the technology portion of grant development. 68% offer options for system administration support.
Cost is the major factor influencing central data center use, especially when it's not part of the indirect costs.
In a panel on key drivers and changes, Tim Gleason from Harvard notes that the data center they built two years ago is now full, and they're about to start building another 10,000 sq. ft. data center right behind it.
Jim Pepin from USC is talking about hosting, co-location, and condo-ing - in condo-ing they put together the machine into the cluster, but the researcher has the exclusive use of it. They're seeing lots more use of that (as opposed to traditional hosting or colo), because of the complexity involved in building the machines. About 60% of the machines in the cluster are centrally owned, 40% owned by the researchers. Researchers can also trade cycle futures with each other. There's a faculty committee of senior faculty that allocates the central resources annually. They've never had to say no to any request in the six years they've been doing this kind of allocation.
Jim notes that the design of networks for high-end research is very important, and that there is some tension between that and the desires of campus security to build barriers into the network.
Pat Dreher talks about a physics project that will be amassing an exabyte (1,000 petabytes) of data over the next ten years.
Pat quotes Larry Smarr as saying that networking is becoming cheaper than storage, and storage is becoming cheaper than compute power - this is the first such major shift in a generation.
The folks from Penn State note that for planning purposes a kilowatt of power per square foot is a good number for the next few years.
Bill notes that at Stanford they believe that in fifteen years they won't be hosting anything (they'll be buying the services) so that the data center investment should be thought of in that time frame.
There's a bunch of discussion about whether every institution needs to build a lot of data center capacity, or whether there are ways to collaborate across organizations. Kitty Bridges from UMich points out that we need to learn how to be nimble on our feet and agile in our own collaborations. Jim Phelps from Wisconsin proposes that if we can offer ways to support virtual organizations for cross-institutional research that might be a place to start.
It's pointed out that despite the talk of virtual organizations, most research today is performed by single PIs working alone with a bunch of grad students within an institution.
After the break the discussion moves on to talking about sustainable funding models for research computing.
Kevin Morooney is talking about the history of research computing at Penn State - until 1988 research computing was in the Research organization, but in 1988 it moved into the Center for Academic Computing. They maintained three FTE for research support. In 1997 they created a new shop, which now has 15 FTE and a director for doing high performance computing and visualization. Kevin points out that cyber-infrastructure is not only happening in the central organization, but all over the campus. Looking ahead he sees another round of central IT investment coming, with campus coordination that goes beyond what happens in central IT, but that it's important that the central IT work at understanding and providing for the needs of faculty researchers while coordinating all these other conversations.
Bill Clebsch is talking about how at Stanford the institution is charging schools and other units for power, which has changed the paradigm for research computing - schools have had to pay for power for research computing. This year, for the first time, schools have to pay for space. In the last six months these factors, plus faculty realizing that they could spend more time on research than on the "plumbing". They're now looking at building a new data center.
One of the unexpected side effects of coordinating this activity is groups wanting to co-locate research assistants, which they hope will build a community around computational research.
Jim Jolkl from UVa is talking about their Linux clusters model - they contribute 20% of the cost. They charge $13.75 /GB/yr for storage, but they provide a Hierarchcical Storage Manager for archiving at no charge. Like everybody else, data center facility space is a large issue.
They hear a lot about getting people to support researchers. They've had a task force on computational science that's recommended senior-level leadership, the need for grant development support, seed funding for promising programs, expert support for computational science (algorithms, data and security, visualization, etc).
Technorati Tags: cyber-infrastructure, higher-ed, CSG-Winter-2007, research-computing
