August 2006 Archives
I'm off for a week of vacation - badly needed!
Tomorrow I'll be participating in a Seattle Jazz Guitar Society clinic with the great Robben Ford - who I first heard back in 1972 playing with Charlie Musslewhite, and went on to play with Joni Mitchell, Miles Davis, and slews of others. I'm really excited about the clinic, which is going to be shot on video for a future instructional DVD. It's always fun to be the token bass player at these guitar clinics :)
Sunday we're off to Bend for some rest and relaxation in the high desert, then Labor Day weekend with friends in the Willamette Valley, the heart of Oregon wine country. What could be better?
The following quips were all mentioned at a recent meeting I was at:
"The plural of anecdote is not data."
"Hope is not a strategy." (via Sara Gomez)
"The definition of fanaticism is redoubling your efforts when you've lost sight of your goal." (via Tom Colwell)
Tom and Kevin at KEXP sent over a copy of the latest CD release of live performances recorded at KEXP. It's got great stuff on it, including people you probably know about (Patti Smith, Gang of Four, Death Cab for Cutie) as well as people you've probably never heard of (like Skulbot, a hard rockin' trio of high school students from Stanwood, WA).
There's lots of great performances - go forth pick it up!
And I was totally surprised and honored to see my name mentioned in the "shout out to the rest of the KEXP crew and community" section in the credits - awww, shucks, you guys < grin >

I was impressed by Amazon's S3 online storage storage service (web-services based storage priced aggressively - $0.15 per gigabyte/month, $0.20 per gigabyte of data transer), and now they've topped that with their "Elastic Compute Cloud", aka EC2.
EC2 lets you set up virtualized linux computing power, where you have complete control over the machine image. It's a remarkable and powerful concept. As Amazon says:
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.
Just as Amazon Simple Storage Service (Amazon S3) enables storage in the cloud, Amazon EC2 enables "compute" in the cloud. Amazon EC2's simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon's proven computing environment. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use.
Given the issues that universities, including the UW, are having with trying to keep up with the space/power/cooling demands in our data centers, he vision of being able to easily outsource computing power at reasonable prices is very attractive.
Thanks to Rael for pointing this out.
Technorati Tags: amazon, data-centers, web-services
While on the topic of the Open Science Grid conference, Gordon Watts, who is a scientist as well as the organizer of this year's conference, has much better notes from the opening plenary speakers than I do - they're on his blog.
Thanks, Gordon, for setting up a couple of very interesting days!
Tim Bray's blog pointed me to this interesting post where a Polish blogger who calls himself Stiff interviews several well-known programmers, including Tim, Linus Torvalds, Guido van Rossum, Dave Thomas, etc about programming. It's an interesting read.
In it he asks them what their favorite tools are, and Linus says:
Other than those three parts, the only thing I care deeply about is my email reader. I use „pine” - not because it’s necessarily the greatest email reader ever, but because I’m used to it, and it does what I need it to do with a minimum of fuzz.
While I've mostly been using the Mac mail application, there always seems to be something I can only do in Pine - lately that's been bouncing messages (where you want to resend a message from your email to another address, but have it arrive at that other address as if it came from the original address, not as a forward from you).
Update 23 August 2006
Jim Gaynor points out that the Mac mail app does indeed to bounces:
Apple's Mail.app can do that, they call it Redirect.
Select the message you wish to resend, go to the Message menu, and select Redirect (shift-command-E).
When the message arrives at its new destination, the From, To, and Date headers will be unchanged from the original. Mail.app adds Resent-From, Resent-To, and Resent-Date headers to denote the Redirect information.
Thanks, Jim!
Technorati Tags: email
So apparently, writely can now do blog posts too. If you can read this then it works with my Movable Type blog. I added some tags - let's see if it manages to get those added to the blog post.
Writely is looking a whole lot better as an online word processor, too!
Bob Jones from CERN is talking about EGEE, an EU initiative which has 91 partners in 32 countries, encompassing 13 federations. Asia is a new federation, and US Partners: U Chicago, USC, Wisconsin (Condor), and RENCI.
The objective is large-scale production-quality infrastructure for e-science.
The infrastructure operation includes sites in 39 countries. Monitoring of grid services and automated site configuration/management.
They distribute production-quality middleware (glite), which they plan to use Apache2 license to distribute.
interoperability between grids is essintial - EGEE works with national grid projects and peer projects around the world. There are excellent relations wiht OSG on technical, operational, and policy issues. Further work is needed and the Grid-Interoperability-Now is providing a good environment for this.
The WISDOM project used grid for drug discovery. http://wisdom.healthgrid.org
They calculate how much money they're saving by doing this research "in silico" instead of in-vitro.
There's an EGEE conference in Geneva 25-29 September.
Simon Lin from Academia Sinica, Taiwan is now talking about the TWGrid infrastructure in Taiwan. The consortium started in 2002. They now have 2 2.5 Gbps connections to Amsterdam, which they use to connect to CERN. One link lands in Chicago, another on the US West Coast. There's a new link to Australia that reduces latency from 380 ms to 138 ms.
There are 12 LCG sites and 3 EGEE sites in Asia Pacific. Academia Sinica Grid Computing Centre(ASGC) is acting as the coordinator and the WLCG Tier-1 Centre and WLCG/EGEE operation Centre for Asia/Pacific.
16 sites in 7 countries: australia, japan, india, korea, pakistan, singapore, taiwan. 700 CPUs, growing to more than 1000 by end of year.
Konya Balazs from Lund Universityin Sweden is giving a tele-presentation on the NorduGrid. It started in 2001-2002 with a research project to enable Grid in the nordic countries. Since 2002 it's a research collaboration, focusing now on middleware. It develops its own Grid middleware. There are 13 countries participating, with 50 sites and about 5000 cpus.
When the Scandinavian High Energy Physics Institutes wanted to share computing resources and jointly contribute to CERN/LHC computing - they needed a Grid and there was no production ready middleware.
Their design philosophy, followed Scandinavian design phlosophy - lighweight, portable & modular, non-intrusive on the resource side. They wanted something flexible and powerful on the client side - easily installable, trivial tasks must be trivial to perform, no dependency on central services.
The goal was to have no single point of faulure.
He goes on to give some details of the ARC middleware, which is being positioned as general purpose open source European grid middleware. Many national grids in Euripe are using this middleware.
He notes that the major grid middleware providers need to become more dedicated to creating standards for interoperability. - Standards are needed in JJob description language, representation of grid-related objects, a standard interface to computing resources.
Jayanta Sircar from Harvard is talking about the CrimsonGrid. CrimsonGrid is an attempt to bring a (school-based) IT organization approach to supporting grids for science.
The motivation - the future of the Univerity's research vision is intimately connected to cyber-infrastructure. Interdisciplinary faculty collaborations are a high priority, and IT support must align itself to meet new needs. Research environments cannot be separated from personal productivity environment.
The approach: work at interface innovation and production. Build an ecosystem. Establish role of faculty as stakeholders. Build roles for industry. Serve as a 'sandbox' for campus technology test beds - zero penalty for failure.
But what is a campus grid? Many vertical grids? Every fluster tethered to a GT appliance? One fabric hosting many virtual organizations? All of the above, and then some. Last year had the first international workshop on campus grids as part of the global grid forum.
Aspirations - don't re-invent the wheel whenever possible. Want to leverage the contributions of the OSG community to develop a model for building switched (virtualized) campus cyberinfrastructure - a campus grid.
Crimson Grid initiative started in April 2004 - to engineer a technology fabric. Though the Crimson Grid is housed in Engineering and Applied Sciences, they are reaching out to the rest of campus and bringing others in - the medical school grid is being established as part of the Crimson Grid. They are also collaborating with other campuses, including GLOW at Wisconsin. They're running about ~750 procs in crimson grid, and linking up with GLOW who are running about 1000 procs for testing resource sharing.
There's a question on how they determined the sizing of compute power for the campus grid, given the local processing power in departments and the global power available in the open grids. The question didn't really get answered.
Dogan Seber from SDSC is talking about the GEON geosciences grid. Geon's vision is to enable new discoveries in teh geosciences by utilizing an easy-to-use "integration environment". The requirements were developed by a group of geoscientists, and then they brought in the SDSC folks, which has been working well for them. They want to bring in multiple datasets and information to integrate and interpolate data in their analysis. The goal is to bring a system where people can register datasets and search across them. They wanted the system to be friendly for teachers at all levels.
Their Cyberinfrastructure Principles:
- An equal partnership: IT works in close conjunction with science. This turns out to be challenging in social environments where people speak different languages.
-The "two-tier" approach - Technology needs are now - so use what's available now including commercial tools and standards where applicable (this keeps the community involved and helps solve real problems, even if you're not doing things quite the right way)...
...while developing advanced technology, and doing CS research
There are nodes in the US, India, China, and Japan.
There is a portal (using GridSphere) that allows people to search and access resources. They have a resource registration service that allows people to describe data sets. Biggest research now is how to describe resources based on ontologies, not just text matching. I'm not sure whether he's talking about tagging by users, or something deeper than that. The search allows textual, temporal, and geospatial searches.
GEON is a a TreaGrid science gateway. The example is how to build software that anybody can use to run seismic simulations. They built an app called SYNSEIS that does this, using a small cluster for 2d simple jobs or Teragrid for larger jobs, builds a Flash animation output.
I ran out of battery yesterday and didn't get these posted, but here are some notes from a series of presentations on how people are actually doing science on grids. I probably got some of these details wrong, as about 90% of what was said was over my head - and people say us computer types are incomprehensible :)
David Baker, a biochemist from U Washington, talked about the Rosetta@home project where they're using distributed desktops to help create lowest-energy protein structures. There are about 100k machines currently enrolled in project.
Robert Riggleman from Wisconsin (the other UW), is talking about the use of distributed parallel computing on anti-plasticization of polymers. They've used over 75 years of CPU time since April of 2006. They use the GLOW facilities, a centralized high-performance computing facility at Wisconsin.
Margaret Romine, from PNNL, is talking about the problems of dealing with all the data generated by rapid sequencing of genomes. She's using Gnare/Puma2 software developped at Argonne Lab. The software runs every genome that's out there to gather evidence. Sequencing a genome is slow, annotating it is slow - typically a year by manual methods. Looking for ways to better automate the annotations, particularly in identifying possibly bad matches.
Oliver Gutsche from Fermilab is talking about high energy physics and the Large Hadron Collider used to study proton-proton collisions. They compare simulated data to real data - they're talking about 6 petabytes of data in 2008. Core CMS infrastructure includes a data bookeeping service (DBS - catalog of available datasets) and a data location service (which data is stored at what site), and the Trivial File Catalog.
Tony is Corporate VP for Technical Computing at Microsoft
We're goin gback to Licklider's original vision for computing - "all the stuff linked together throughout the world"
We're entering a new era of science - we'll be overwhelmed by data. The need to mine data from all over the place - satellites, telescopes, etc. Grad students have been given "database 101" and then told to go build things to scale to terabytes - not a good use of science talent. Hence the need for:
e-Science - data-driven multidisciplinary science and the technologies to support such distributed, collaborative scientific research. A shorthand for a set of technologies to support collaborative networked science. High performance computing and information management are two of the key technologies.
Vision for scientific workflow - instead of having scientists doing the data plumbing, you want to have a data workbench that combines visual programming tools with persistent distributed storage along with distributed computation. Legacy programs can be wrapped in xml and exposed via web services.
Scholarly communication is also changing the nature of research. Documents increasingly will be linked to data which can be updated, streams of comments, etc.
Two examples of e-science
- astronomy data grid IVO
- Comb-e-chem project
Vision of the grid - set of middleware services supported on top of high bandwidth academic research networks.
A set of services that allow scientists - and industry - to routinely set up 'Virtual Organizations' for their research - or business.
- the 'Microsoft Grid' vision is as much about integrating and managing data and information than about compute cycles.
Federated Trusts are a big issue - using institutional authentications.
Service-orientation for building distributed systems.
Progress in grid standards?
- We need to agree on a set of grid service standards - the GGF/EGA merger into the Open Grid Forum is a great opportunity. The grid research community needs to propose and explore new features in real experiments. What services? 1. Very simple HPC job submissions and simple scheduling; 2. Security - federation; 3. Data storage, metadata. Can we standardize in these three areas by end of 2007?
Scholarly Communication
- global movement towards permitting 'open access' to scholarly publications. Principle that results of publicly funded research should be available to all. The Cornyn-Liberman bill is supported by most top US research universities.
Tony notes that Microsoft is working with researchers to understand where Microsoft tools can help - databases are one example. Another is the possible use of Sharepoint technology to share data in communities - does that fit with the way these science communities work? Another is the use of Visual Studio to help write and debug code, whatever platform it runs on.
This is the fourth consortium meeting
Few f2f meetings - this is the all hands meeting, which take place every six months.
There are more than 15 users organizations. There are 32 Virtual Organizations.
OSG is perceived as being a "mainly physics" grid.
They're expecting about $6 million in funding from DOE and NSF. This will fund the "OSG Project", which will fund about 33 FTE to maintain the grid and expand its use in new communities. Partnership with the TeraGrid is part of the plan.
Must support LHC and LIGO scaling
Over the next few years:
- EData distribution must exceed 1 DB/sec at 10-20 sites
- Workflow must support > 110k batch jobs per client
- Accessible storage greater than 10 PB
OSG software stack (from bottom up):
Built on NSF middleware - condor, globus, myproxy
Virtual Data Toolkit common services
OSG Release Cache - VDT + configuration, validation, vo management
Apps : LIGo grid, LHCS services & frameword, bio services && gramework, OSG VO framework
I'm at the Open Science Grid meeting, hosted here at the UW by the Physics Department. I don't know much about grid computing, but there's a lot of it going on in science research, and I'm looking forward to getting a glimpse into what these folks are doing, and I'll be blogging it as we go.
If you can read this, then I've successfully written a posting to my Movable Type weblog using the new Windows Live Writer beta version.
Looks like the editing basics are there, but I don't see any way to add categories or tags.
Doesn't look like anything to give up Ecto for just yet.
www.flickr.com
|
Yesterday my buddy Ed and I went down to the KEXP summer BBQ - it was a perfect Seattle late summer afternoon, the beer flowed freely, and the music (as you'd expect) was great! We got down there too late to catch Thee Emergency, but we did hear the end of Devotchka's very cool electronica meets eastern European roots-rock set.
The Austin duo Ghostland Observatory sounded to me like Robert Plant meeting Echo and the Bunnymen. Singer/guitarist Aaron Behrens definitely had the energy and moves to drive the crowd wild and synthesist/drummer Thomas Turner was the consummate nerdly foil in his baby blue satin cape (we wondered if perhaps he sleeps in the cape).
The day ended with local Seattle power-pop favorites The Long Winters, providing, as Ed put it, intelligent short pop songs with great harmonies.
My photos from the day are on Flick, tagged with kexp2006bbq
All in all a great day out under the sun - thanks to the KEXP gang and all of the volunteers for putting the day together!
Technorati Tags: kexp, kexp2006bbq, music
It's amazing what people have to do to get calendaring to work the way they want.

(from Ian Forrester)
Technorati Tags: Calendaring
Today's calendaring news from Apple gets even better!
Cyrus Daboo from Apple (yes, he's the Cyrus of the IMAP server and Mulberry) writes in an email:
Hi folks,
FYI today Apple announced full support for CalDAV in both the iCal client and a new open source calendar server for its upcoming OS X 10.5 client and server products. What's more we support the latest version of the scheduling specification for scheduling support.
iCal is available as a developer preview right now, and of course we will be bringing it to the next [calconnect] interop - on the Apple campus - next month.
The calendar server is being released as open source under an Apache license. This is available here:
http://collaboration.macosforge.org
Of course we will have this at the interop too.
In addition, of interest to other server developers, is a CalDAV server test suite with over 500 'unit' tests for testing various aspects of a CalDAV server implementation. This too is available as open source on the site linked above.
That link doesn't bring up anything for me currently, but I'm anxious to take a look at the open source server - it'll be interesting to see how platform-specific it is or isn't.
Technorati Tags: apple, Calendaring, open-source, osx, wwdc06

Though it only got the briefest of mentions in Steve Jobs' keynote speech this morning at the Apple World-Wide Developers Conference, the next major release of OS X (codenamed Leopard) will have a new version of iCal that supports the CalDAV standard for multi-user scheduling. This means that iCal should be able to be a client to any CalDAV compliant calendaring server, which will include Oracle Calendar and OSAF's Cosmo, with others hopefully following suit (is anyone in Redmond listening?).
There's also mention on Apple's web pages of an iCal server showing up in Leopard, but the web page for that isn't there yet - I imagine that will be a CalDAV server, which will allow other clients, including Chandler, to use that server.
And Apple has joined the CalConnect calendaring consortium, and will be helping to drive industry-wide interoperability along with the rest of the members (truth in advertising: the UW was one of the original founding members of CalConnect and I sit on the steering committee).
Great news all around!
Technorati Tags: caldav, calendaring, osx, wwdc06
This is a good post in Kathy's Creating Passionate Users blog:
n this Web 2.0-ish world we're supposed to be all about the users being in control. Where the "community" drives the product. But the user community can't create art. (And I use "art" with a lowercase "a" as in software, books, just about anything we might design and craft.) That's up to us.
Our users will tell us where the pain is. Our users will drive incremental improvements. But the user community can't do the revolutionary innovation for us. That's up to us.
I haven't been blogging much lately, due to my trip to New York to help my parents start to get ready to move out of their house that they've lived in for the last 35 years (take-away lesson: get rid of unused stuff now) and then being immersed in writing a Research Bulletin for ECAR on social software, which I'll have more to say about soon.
In the meantime, it's worth reading this little piece from Doc Searls:
More to the point, why trust building the "first mile" of the Net to people who never wanted it in the first place, who have always felt threatened by it, who can imagine their customers as nothing other than "consumers" of one-way "content", and who want to create scarcities and insert billing valves everywhere they can? Because they're the only ones in a position to do it? That's not a good enough reason. It's also not true.
The phone and cable companies will be the only ones in a position to do it if we let them lobby that privilege into law. That's their real agenda, and that's the important story here. And it's a lot bigger than Net Neutrality.
