GarageBand Dumps it to XML

Saving data in an XML format has been all the rage for many years now and is the basis for both of the highly debated OOXML and ODF document formats. Part of the reason for its popularity is that many claim it is much easier to understand than the binary file formats of the previous generation. They might even say it is self documenting.

But as HÃ¥kon Wium Lie, the CTO of Opera once said:

…I’m no fan of either specification. Both are basically memory dumps with angle brackets around them.

He was talking specifically about the document formats competing for standarization at the time. More generally, he was talking about the whole idea of XML file formats. I remember thinking about Jokosher’s own file format after hearing this quote. It is an XML based format that I largely designed myself.

I will admit that our format is in a lot of ways a memory dump with angle brackes. The main nodes <Project>,<Instrument> and <Event> have the same name and structure as the classes that implement them. Each of these nodes also has a subnode called <Parameters> which is simply a list of the variable name, type and value of all the class variables. This only works well for simple data types; for list and dictionaries we have more complex structures.

In previous versions of Jokosher, I needed a easy way to store the very large list of volume levels used to draw the waveform. I figured that putting each number in its own XML node would take a lot longer to parse and consume much more memory than necessary so I created one simple node that looked like this:

<Levels value="0.911560058594,0.913299560547"/>

Imagine the above except with three thousand floating point levels, as is required for a typical five minute song. At this point it starts to get messy and stops looking like proper XML. It is also harder to decipher, and much more like a memory dump. Luckily the current version of Jokosher does not use this anymore. I stores the level data in a separate file that is just a raw dump of the bytes from the array.

Even though I have just shown Jokosher’s file format to be essentially a memory dump with angle brackets, I don’t feel bad about it. It makes our job as programmers easier, and the whole program is still simple enough that you can figure out most parts of the file format without documentation.

Today I had the chance to look at the file format used by Apple’s GarageBand, a program with a very similar purpose to Jokosher. It was interesting to find that like Jokosher, GarageBand has a folder for the project which contains the project file as well as a separate folder for all the audio files. Also just like Jokosher the project file is XML based.

After a few <dict> nodes obviously representing a dictionary data structure, there is a <data> node containing (wait for it…) 2584 lines of base-64 encoded data, and the entire XML file is only 2794 lines! Talk about a memoy dump with angle brackets. And you though your file format was difficult to interoperate with! If you really want to peer at the impurity, you can download the entire file.

Comments are closed.