Open Source Virtualization with VirtualBox

There are many technologies which I am very much on top of because I use them on a regular basis, here are others that I interact with periodically and it’s enough to stay abreast of developments and do basic troubleshooting but from time to time there are technologies that I’m only peripherally aware of and have only a basic understanding of. One such technology is virtualization or virtual machine software.

For almost ten years I’ve been hearing about software like that made by VMware which allows for a virtual computer to run inside of a host operating system. To this day I haven’t done anything more with this type of software than to fire it up and see that indeed it does work. It’s not that I don’t see the advantages, it’s just that I haven’t personally encountered a situation where I can justify the time and effort it would take to set it up. That said I do like to know what’s going on in all areas of technology and what I’ve been hearing lately is some movement in the open source virtualization arena.

For some years now I’ve known about some projects such as Xen, Bochs and QEMU. The problem with these solutions is they are really not open source replacements for commercial virtual machine software like VMware. I’ve heard great things about Xen and it’s ability to virtualize Linux systems (on Linux systems). While this is valuable in many cases it’s not for most of what I want to do which is to run a guest OS on an entirely different host OS. Bochs is more on target but this is an effort to emulate the x86 platform enitrely in software, a bit heavy duty (and with significant speed costs) for what I normally would want to do which would be to run an x86 guest OS on an x86 host, for example a Windows guest on a Linux host. QEMU has the upper hand here. While it’s still a big heavy emulator there is some closed source accelerator code which can help in x86 on x86 situations. Of course the closed source part is a bit of a drag. Still the real problem with all of these is that they are incredibly more difficult to configure (and especially to configure and setup a new guest OS on) than their commercial counterparts.

Well, the world may be changing. What I’ve been hearing recently is that an open source project from Sun called VirtualBox is looking like it will give some of the commercial vendors a run for their money (so to speak). There is no doubt that VirtualBox is still in the early stages of life but the development team seems to be putting some real effort into it and new releases have been timely. I’ll be excited to follow the continual development of this product.

The Use of S3 and EC2 for Remote Backup

Even before the introduction of the Amazon S3 storage service I was intrigued bye the possibilities of secure backup over the Internet. Over the years I’ve evaluated a number of possibilities such as the use of rsync and Unison either to my own remote servers or to a service. I’m really not too interested in the commercial vendors as most of their software works on Windows or maybe Mac and my files reside on a Linux fileserver. It only makes sense that my backup solution should run on the Linux server as well.

None of these solutions seemed to quite fit the bill for me because of expense, concerns about data security or speed. Since the introduction of S3 I have started playing around with some of the scripts and software which have been developed to take advantage of these powerful services. I was still disappointed though mostly because of some data encryption concerns (on the storage system, not in transit) and the potential charges associated with backing up data to the S3 service. Ideally I would want something rsync like which would only transfer the changed parts of the files instead of recopying the entire file or directory. Unfortunately there is no built in support for anything like this in the low-level S3 system. So after playing with many scripts that suggested they would be able to do something along these lines and remaining unimpressed I decided to put things on hold for a while longer.

Eventually Amazon released the EC2 cloud computing platform but that still didn’t seem particularly useful for my purposes because of the lack of persistent storage between sessions. Once the elastic block storage became available things got more interesting. Now that I could retain data between sessions I had visions of a backup script which would launch an EC2 instance, mount an EBS volume and run rsync or Unison to backup directories on my local server to the remote site. I started playing around with EC2 and soon discovered that although it is very powerful it is a monster to control unless you are writing your own application from the ground up. For a simple job like this that should be easily accomplished by a script it can be a nightmare with several shell variables to set and paths to keep straight. Never mind the several encryption keys and the changing SSH host identifier to deal with. Eventually with some help from two fantastic blog entries (Ereblog and Free Wisdom Online) I was able to get something working…mostly.

It’s quite a fragile thing and you have to make sure that things are executed in the correct paths and with the correct environment variables set. In addition the returned data from the control commands is just awked from the output so it could easily break if the control package were updated, etc. The final nails in the coffin for me were my increased backup storage requirements for photos, audio and video which are huge and can change the economics of doing remote backup quickly. Even for a slimmed down set of documents I found the process to be too slow and fragile for my needs. In the end I have gone back to hauling hard drives with data backups off site and using the rsync program locally to sync these periodically with my live storage.

*Edited 2/2/09 to fix the several times I mistakenly called EC2 EC3 although I knew better. Thanks to the commenter for pointing this out!

Remembering our Media Past

One of my more recent pastimes when I have a few minutes to spare and am already caught up on the news and either need to relax and unwind a bit or just don’t have time to dive in to a more substantial project is to browse around on YouTube (similar to I do on Wikipedia) and see what turns up. One of the more interesting things that I have turned up are old “airchecks” from Twin Cities area television stations.

Being a media geek I’m fascinated by how news has changed over the years, particularly in my market. I’ve known about many of the private collectors of radio airchecks for some time but thanks to the fine people at radiotapes.com there are now many TV airchecks from the area available online as well. Some of my favorites are actually the tv news reports on some of the area radio stations (which is how I found the archive in the first place). It’s amazing to see just how different news reporting looked even 15 years ago. While we can discuss somewhat about whether the content is any better there is no doubt that the production quality has drastically improved.

The Coldplay/Satriani Issue

If you’re not yet familiar with the Coldplay/Satriani issue the basic premise is that guitar artist Joe Satriani is suing the band Coldplay because he believes that a member of the band who attended one of his concerts copied some of his music and used it in one of their songs. We’re not talking about sampling or copying a part of a recording but actually copying the musical thought behind one part of the song. In any event you can get a brief idea of what I mean by watching one of the many videos on YouTube where they play the two sections one right after the other. To further complicate things it seems the original idea may have come from a third artist.

Being a bit of a music geek and copyright activist I find this all rather fascinating. After all there a a limited number of original chord progressions. My personal feeling is that you should not be able to own a chord progression at all. Music is built collectively over time by different artists listening to and learning from one another, it’s just how it works. In classical and solo piano music we have this as “variations on…” and the entire development of the Jazz genre is about artists hearing each others “sound” and tweaking it.

I would argue that allowing ownership of chord progressions is similar to allowing ownership of the writing concept that “the butler did it” or one of the many standard plots found in movies and TV. This is plain silly and should not be allowed. Clearly, despite having similar sections and feelings, they are different songs (they are not identical). While it is nice, courteous and polite to acknowledge your inspirations it should not be legally required, nor should you be required to get permission nor should you have to pay for these rights.

Much of the YouTube coverage of this is the simple laying of one track over (or next) to the other. If you’re interested in a much more in depth look at the music theory of chord progressions and knowing how the to melodies and harmonies relate to each other I highly recommend looking at the two part series on the music theory behind the accusations put together by the Creative Guitar Studio in Canada, which also has an accompanying web site.

Change In Site URL

In an effort to make the articles on this site slightly easier for people to find and for me to be able to easily give the site URL to people verbally I have moved the blog portion of my site to it’s own URL, bensbits.com. Thanks to the magic of mod_rewrite all of the old URLs should redirect to the new site. I don’t guarantee this will be the case forever though so if you are linking to any pages on the blog please find the page again and update your URL.

This move was somewhat spurred by the coverage I’ll be doing from CES this year which will include blog postings as well as some A/V material which will be available through a separate site that will be launching soon. Another reason for the move is that the ben.franske.com site has finally received a makeover that has been more than three years in the works and was started over from scratch several times (what can I say, I’m busy). That site now contains some introductory material about myself as well as some information on the many other projects and things I have going on.

Deconstructing the Car Talk Jukebox

The great folks at National Public Radio’s Car Talk recently switched from using Real Player to a flash based MP3 media player for online listening. I think this is a fantastic change as the only thing I was still using Real Player for was to listen to Car Talk online. I do realize that for some time a podcast version of the show has been available through NPR but I tend to listen to it spread out over several days and the Real Player (and now Flash based player) allow me to jump directly to specific segments of the show, a big advantage over one long MP3 file for my purposes.

The only problem with the new player is that I initially couldn’t get it to work with my system. The player would never fully load and would not play the show. This really presents a problem if one wants to listen to the show. Of course I submitted an email to Click and Clack notifying them about the problem, apparently they’ve been receiving quite a bit of email about the new player for better or worse because it took about 5 days to even get a form response back. Like most of what Click and Clack have to say it wasn’t that helpful (install the latest version of Flash, etc). Since I already regularly waste time with Flash websites on a regular basis I was sure that Flash wasn’t my problem. This led me to start deconstructing their player architecture to find out and fix the problem myself, in true Car Talk fashion.

To make a long story short for the impatient reader I’ll cut to the chase. Ads are loaded and played from a third party site (NPR) and require cookies (3rd party cookies) to play. The ad must play before the player will load the show audio. I had 3rd party cookies disabled, hence no show. I fixed this by explicitly allowing cookies from both cartalk.com and npr.org. Of course, there are many more interesting ways of solving the problem and more that can be learned by total deconstruction so the reader looking for further edification may want to read on.

By looking at the page source and link formats I fairly quickly determined that Car Talk was using the JW FLV Media Player and it was loading a playlist file called showAllsmil.xml which likely contained the asset (MP3 audio) URLs to be played. The trick would be to find this file and figure out why my player wouldn’t work. By looking at the source of the player page I could determine that before the player fully loaded it needed to play an ad from NPR. That certainly gave me quite a clue as to why things weren’t working and eventually led me to the cookie solution you read about above but let’s explore the javascript code that selects and gives the player a URL for the MP3 ad:

var site = 'CARTALK';
var area = 'Cartalk.Player';
var pageNum = Math.round(Math.random() * 100000000);
var randomNum = Math.round(Math.random() * 100000000);
http://u.npr.org/xserver/site='+site+'/area='+area+'
/utype=player/aamsz=MP3/pageid='+pageNum+'
/random='+randomNum+'.mp3'

It’s interesting that this is all done in client side javascript instead of randomly serving an ad from a static server side URL, but I guess doing things in javascript is the Web 2.0 way! Now you know what to do if you want to listen to NPR ads all day long. Generate a bunch of random numbers and load up some URLs. What if you want to listen to an actual Car Talk show, perhaps on an unsupported player/OS like Linux without Flash installed. For this you’re going to want to get your hands on that xml playlist file. First you’ll want to find the URL, which as it turns out is also generated by some bits of javascript which could also be done server side:

var f=gup('play');
var s=gup('show');	
if (s==null || s=="") s="WeeklyShow";
var file2 = 'http://www.cartalk.com/Radio/'+s+'/'+f;

Where gup is a function which pulls some variables out of the URL, again something really easy to do in a server side language like PHP, oh well…javascript it is. If you want to listen to the entire most recent weekly show you’ll end up with a URL that looks something like:


http://www.cartalk.com/Radio/WeeklyShow/showAllsmil.xml

If you want to just hear the last segment (segment 10 of the show) you’d end up with:


http://www.cartalk.com/Radio/WeeklyShow/10smil.xml

Of course it’s similar for 01smil.xml through 09smil.xml. Note yet again that this could all be handled without creating a million files if it were done server side, but I digress. When you open up that XML playlist file you end up with something where it’s easy to see the MP3 asset files can be found at:





and so on. Note that these streaming MP3s can be played in any MP3 player so you could play them on Linux or just about anything else that plays MP3 files.

An interesting project would be to create a script which dynamically generated an advertisement MP3 URL, pulled the SMIL file and stripped out the asset URLs and spit out a more standard M3U playlist file. If this were done in server side scripting (PHP anyone) you could easily create a link which would feed any player a playlist of the most recent show segments (plus an opening advert to keep NPR happy). Such a M3U playlist would be useful as it would allow you to play streaming Car Talk MP3s from just about any player/OS without manually getting all the segment URLs.

Computer Collecting

Friends who have seen my electronics warehouse, err.. basement, know that I’m an avid collector of “antique” electronics. From the 8-Track recorder, yes you heard that right not just an 8-Track player, but a recorder, to my collection of cell phones and landline phones my interest in history seems to manifest itself in collecting bits of history.

As an information technology professional I think it’s both important and useful to realize how I got to where I am. For me this means both the people like “Mr. C” my elementary school computer teacher who showed me the inside of an Apple //e and taught me the fundamentals of computing as well as those early machines I worked with. This means that it has been one of my personal goals to collect some of those influential machines from my early years. A fun side benefit is the ability to play the games and software I remember from my youth on real hardware instead of an emulator.

This means that I also have quite a collection of computers in my basement, primarily Motorola 68k Macs and a few Commodores. I’ve even gone so far as to have similar minded geek friends over for a LAN party consisting of these early Macs in a LocalTalk environment. Nothing like a good game of Wagon Train 1848 (multiplayer Oregon Trail) to get things going!

Because of these interests I try to stay on top of what’s going on in vintage computing circles, subscribe to several mailing lists and visit quite a few websites devoted to the topic. There’s something to be said for experimenting with computers just to see what can be done even though it may not be practical (LocalTalk to Ethernet bridge for Internet access from a 512K Mac anyone?) though it seems to be something that occurs less frequently these days.

I recently ran across 1000BiT, a website devoted to vintage computing which I had not seen before. 1000BiT is a great website for finding everything you can related to a specific vintage computer in one place. From system specs to original advertising, brochures and manuals they’ve got it covered. It’s a great stroll through personal computing history and an easy place to get lost in for hours as you pour over the specs and adverts which built an empire.

The Open Source Microsoft Access Alternative

Databases are a wonderful tool for organizing all those bits of information in your life. While open source technology took database backend technology by storm (MySQL anyone?) there remains a gap in desktop database technology. Let’s say you wanted to create a database for your address book. You could certainly do it in MySQL and write a PHP front end for it and make it web based but this really seems like overkill for a personal address book, it also seems like a lot of work.

You could also do it in a spreadsheet program but you give up a lot of advantages of a database (especially a relational database) when you do so. In an effort to fill this void between the massive SQL database with frontend application and the spreadsheet Microsoft offers Microsoft Access. This is both a banckend database engine and a frontend design package in one which allows you to generate forms for updating data as well as reports. As a bonus if your database is too big for it’s engine you can connect via ODBC to a bigger backend such as SQL.

Unfortunately, this segment of database tools has been largely overlooked by open source software, especially in the Windows environment. This is probably not without reason as middle-level database tools like this, even Microsoft Access, are often too complicated for most end users and too limiting for most developers. In fact, if you asked many Microsoft Office users what the “Access” program does they probably wouldn’t be able to tell you. Still, if you need a quick database form for entering data it’s tough to beat this type of application. Perhaps the most widely known open source office suite, OpenOffice, has has made an attempt at an Access alternative in their “Base” tool but, frankly, it leaves a lot to be desired.

A better choice is the KOffice program, Kexi. Like Microsoft Access, Kexi can serve as a combination backend/frontend or as a frontend to a remote backend database. Kexi provides scripting through the python and ruby languauges in addition to the basic tables, forms and reports. In fact, the only real problem with Kexi is that it is not available in an open source version for Windows.

Because KOffice relies on the Qt graphics toolkit it was not made available in an open source version on the Win32 platform. Recognizing the interest in an Access alternative Kexi was ported to Windows and a commercial version is available for $72. The winds of change are in the air though. Trolltech which makes the Qt toolkit has released the Windows version of their toolkit under the GPL meaning Qt based apps can now be made available in Windows under an open source license.

Based on this development the KDE developers have started porting applications, including KOffice and Kexi, over to Windows. Because of the large codebase and complex nature of KOffice it’s going to take a while to get things stable on Windows (they’re currently at Alpha 10) but someday in the not too distant future there will be a good open source alternative to Microsoft Access on Windows. You can see the progress being made and check out the alpha on the KDE for Windows site. In the meantime KOffice/Kexi is available for use on Linux and Mac.

Movable Type Goes Open Source

For reasons I can only speculate about two of my most popular articles to date remain “The Next Big Thing In Blogging Software” and “a year later: an overview of multiblog software options“. The first was written over four years ago and the second just under three years ago. In the online world that is eons.

One might ask that if these have proven to be such popular articles why not update them more frequently. To be honest about it this blog is as much for me to remember and track my interests and solutions to technical problems as it is to share knowledge and information with you the reader. Given the significant amount of time which was invested in installing, testing and reviewing the blog software choices and the return on investment it simply doesn’t make sense to spend the time to do an annual or even semi-annual update. This is primarily because I have been extremely happy with my chosen solution, b2evolution and despite the continued prevalence of WordPress in the blogosphere I see no compelling reason to change and one good reason to stay with b2evolution, multiblogging. Despite the continued development of WordPressMU it remains a sort of kludge which may or may not work in your specific instance. b2evolution, on the other hand, was built from the ground up to support multiple users and blogs so support exists throughout the product. This is reason enough for me to stick with b2evolution, the blogging software that I still believe is undervalued and an excellent choice for the vast majority of independent blogging sites.

For those that have forgotten once upon a time the independent blogging software market was ruled by Greymatter and after it’s discontinuation by Movable Type. There were no other serious contenders. All was good in the land of the blogger, then the sky fell. As I wrote four years ago…

On May 13, 2004 Six Apart, the company behind Movable Type, announced the long-awaited version 3.0. With this blog entry they also single handedly managed to start the demise of the Movable Type monopoly and changed the face of blogging software forever.

What they did was try to commercialize what had been free software while maintaining a crippled free version to placate complainers. As it turned out this was perhaps the biggest mistake Six Apart ever made. As bloggers such as myself became vocal about these changes and provided developing alternatives which were improving on a daily basis the vast majority of independent bloggers abandoned Movable Type for other platforms such as WordPress and b2evolution. I have an unsubstantiated hunch that my prediction of the demise of Six Apart became a haunting reality for the company who saw customers fleeing by the thousands. Although they retained some market share, particularly among the commercial bloggers it would never be the same for Movable Type, once the king of the bloggers.

Despite attempts to rectify the situation and improve the pricing structure it seems that eventually the stubborn Six Apart came to realize the gravity of their mistake. In December 2007, more than three years after that infamous day, Six Apart made what I believe to be one last ditch effort to regain the market share they once had. It was then that Six Apart announced “as of today, and forever forward” Movable Type would be open source. Finally a victory for those who complained so mightily about that initial pricing structure.

How does this change things? It doesn’t really. Movable Type will never again see the market penetration it once had. The decision to go open source is far too late to have that kind of transformational effect. The market has become far too diluted and there is no single competitor (WordPress would be closest) to try and overtake. If it would have been made shortly after the original backlash we would probably all still be running Movable Type for out blogging needs as many of the other contenders would never have seen the development influx they did in the weeks and months after the MT 3.0 announcement. Certainly there is now a possibility that over time Movable Type will innovate and become a serious contender but for the time being it will remain a WordPress (and b2evolution) world. I applaud the move made by Six Apart and it probably will keep the Movable Type software alive and viable for the time being but it’s too bad this lesson was such a hard one for Six Apart. Better late than never. At lest the sentiment is right.

2TB and growing

About a year ago I built several 1.2TB fileservers for a number of my consulting clients which utilized RAID5 arrays for redundancy with LVM running on top for expandability. One of my cleints which does some media work has exhausted the storage space and called a few weeks ago about expanding the storage space on the server.

The four hard drives in the server now were already utilizing all the onboard SATA-II ports. I certainly could have replaced the drives with larger ones (which I did do for another client) but that would have entailed some careful shuffling of data and wouldn’t provide for much future expandability. For another client who uses space much more slowly I could have added a two port SATA expansion card and added two drives in RAID1 but here I expect to need to continue adding space and so I proposed an external storage tower with a multiport SATA link. I was looking for a PCI Express controller which would support eight drives on a single card and would be supported in Debian Linux. I ended up selecting a Highpoint RocketRAID 2322 which seemed to fit the bill.

As it turns out packaged driver support for Linux is only available for Fedora, Red Hat and SuSE. Luckily I found great instructions at this University of Northern Iowa site for building the drivers from source provided by Highpoint. Although there is some grumbling in the open source community about these drivers being non-free licensed (hence no package from Debian) just about everything else is great. The kernel module built without any problems and without a huge number of dependencies and I was able to get the drives up and running without too much work.

Unfortunately, I did not get the module into the initramfs as I had intended and so on reboot it all came crashing down. This entailed a trip to the customer and several hours to fix because the entire system including the root filesystem is LVM on RAID. Luckily, I was able to boot off an Ubuntu CD and build the RocketRAID kernel module again then start the RAID and then the LVM which finally allowed me to mount the filesystem. After doing this a few times I was finally able to get the initramfs straightened out and things working again. Needless to say it was a long night, but a successful one nonetheless.