Open Data Formats and Standards

Good day blog readers.  It’s Wednesday morning and my class of teachers that was supposed to show up has not (which actually is a suprise, usually they are early!  Wait, I have been informed there are some national exams going on…), so I have decided to introduce the layman to a very specific technical issue that plagues Information Technology in the developing world: that of open data formats and other forms of open standards.  Don’t worry, I know it sounds boring, but hopefully you will be interested enough at the end, and maybe you will have even learned something!  I can only hope.

First off, what are open data formats?  What is a data format?  What is data?  Where am I!? Who are you?!  Data in this context is everything created by a computer that is actually important to you, the person using the computer.  I am talking about documents, movies, pictures, emails, all of that jazz.  And while we are at it, we can even throw in other types of data such as websites that you browse on a regular basis.  A data format is the special set of rules that are written along with your data into that Word document file or jpeg photo file from your camera.  They tell the computer, hey, I am a jpeg picture, here’s how you read me.  Without data formats being specified, then those 1’s and 0’s making up that latest Springsteen song on your computer are just that: 1’s and 0’s.  With the proper format, and a program that knows how to read it, all of a sudden you’ve got the Boss pumping out of your Bose and all is well.

The problem with data formats is that the people who create the rules don’t always share the rulebook with others.  This causes a problem called lock-in.  Perfect example, though slightly generalized: a few court systems around the US recently realized that all of their government documents were created using Microsoft Word and other Microsoft Office products.  Well that’s nice, Microsoft Word is a great word processing application, capable of skillfully maniupulating documents from the simplest essay to the largest book.  However, the .doc format is proprietary.  Microsoft Corporation owns the rulebook as to how create and maniupulate documents stored as .doc (which, by default, are all documents created using Word).  Other computer programmers who want to make applications that can utilize .doc documents, must pay royalty fees to Microsoft, and then they are given the rulebook with permission to read it and implement their own solution.  The only other alternative is to reverse engineer the rules by reading them in binary itself, those pesky 1’s and 0’s that are so unreadable to the average human.  This is what the team has done, though the legality of reverse-engineering is questionable and the results are not always perfect.  Should Microsoft go belly-up or start doing interesting things with their licensing fees, what would the the courts do?  They were locked-in to a Microsoft-only solution.

Word documents are not the only prolific proprietary formats out there.  .GIF images are in a questionable state at any given time, with the original license being held by Compuserv (though I would need to do some wikipedia searching to know its current status), Adobe Creative Suite formats are proprietary (though PDFs are fairly open now), and the biggest?  MP3’s are a proprietary format, with every program and piece of hardware that legally uses it needing to pay licensing fees.  This also holds true for formats that your DVD movies come in.  Many, many, “everyday,” data formats are proprietary.

How does this harm developing world?  Well, the second part of the term “licensing fee,” is the little world, “fee.”   You could also use “royalty fee,” or my more preferred term is, “stupid fee.”  This fee trickles upwards into software cost.  As many of us know, there are plenty of free software projects out there that can easily substitute paid-for software in terms of functionality, but being free software projects, they are unable to pay the licensing fees and therefore do not always support proprietary formats.  Let’s continue to look at the trickle effect through a case example:

A non-profit in the United States emails a .doc file to an NGO in Kenya. It is their requested application for a grant that would enable to complete a successful AIDS-prevention project.  The organization, in order to open this, needs a copy of Microsoft Word, which only runs on Microsoft Windows (because the organization is unaware of other options, Microsoft being so entrenched in Africa).  They cannot afford Microsoft Windows and therefore get a pirated copy, as well as a pirated copy of Word.  Pirated, as in illegally copied and technically stolen.  They open the document. fill out and return the application and their grant request is fulfilled and they begin collecting all of their data and begin seeing trends that will allow them to seriously assist People Living With AIDS in their area.  A successful NGO!

In possession of pirated copies of software, the NGO is unable to practice proper computer security through updating their software to protect against the latest viruses, and even the simplest such as those transmitted through USB flash drives reap havoc upon systems across Kenya.  Of course, also being low on budget, and barely affording the computer itself, they are unable to purchase hard drives to back up information regularly.  A virus sweeps in and destroys all of their work.  Just to be able to open a .doc emailed from a group in America who themselves did not know better.

There exist alternatives however.  There exist alternatives to every major proprietary format.  Instead of MP3, use FLAC or Ogg.  Instead of GIF, use PNG.  Instead of .doc, use RTF or better yet, the Open Document Format (ODF).  If it’s a document that need only be read, use PDF.  This will have a trickle down effect for developing nation.  People like me can come in and start promoting the use of Open Source and Free alternatives to software, including Windows. 

An alternative operating system called Linux runs fantastically, especially on older hardware, but one of its drawbacks is the inability of its creators to always bundle applications that can read proprietary formats in order to avoid licensing fees (or law suits, which might ensue should they use less-than-legal reverse-engineered technologies).  It seems like a stretch, but I promise you, the effect would be real.  Ultimately people do not care about how their computer operates, as long as it does.  From an infrastructure support point of view, the only thing preventing people in the developing world from switching is a Microsoft lock-in directly tied to a data format lock-in.  We need to break out, because data lock-in is holding back development.  There, I said it.


1 Comment

Filed under A Category Other Than Uncategorized

One response to “Open Data Formats and Standards

  1. scientes

    Good post.

    One point however. You stated:

    “Pirated, as in illegally copied and technically stolen.”

    The Supreme Court of the United States would disagree with you. In Dowling v. United States (1985), while dealt with criminal charges linked to copyright infringement (a civil matter) that ruled:

    “interference with copyright does not easily equate with theft, conversion, or fraud. The Copyright Act even employs a separate term of art to define one who misappropriates a copyright: … ‘an infringer of the copyright.’ …

    The infringer invades a statutorily defined province guaranteed to the copyright holder alone. But he does not assume physical control over the copyright; nor does he wholly deprive its owner of its use. While one may colloquially link infringement with some general notion of wrongful appropriation, infringement plainly implicates a more complex set of property interests than does run-of-the-mill theft, conversion, or fraud.”