Interoperability in the real world

Too many users save their documents in binary formats that are both proprietary and transitory.

The justification for this practice is that the proprietary formats are 'de facto' standards. "De facto they may be," according to Jeremy Allison, a lead developer on the Samba team, "but standards they are not." The Samba server allows Linux and Unix to serve and share files on Windows networks, using Microsoft's SMB/CIFS (Server Message Block/Common Internet File System) protocol. SMB/CIFS is Microsoft's standard implementation that allows communication on PC networks running Microsoft systems. Linux and Samba drive most of the dedicated print and file servers on the market.

SMB/CIFS is described by Allison as "a bizarre hybrid between an open and a proprietary standard." The implementation on Windows deviates widely from the published standard. "Microsoft engineers tend to see the protocol as unique to their own systems", says Allison. "They view the implementation on Windows as correct even where it deviates from the X/Open standard." The Samba team interprets these deviations as bugs, and Allison describes Samba as "bug for bug compatible" with Microsoft. "The spec was a fiction, and made no sense. The complexity of SMB is one of the main reasons that NT was so unstable."

Although SMB/CIFS is a set of networking protocols for exchanging data between computers, the Samba team had the choice of signing an NDA, or reverse engineering the Windows implementation. An NDA was out of the question because Samba is free software and the code is visible. "Samba can exist because Microsoft has to provide backwards compatibility with all their older systems," Allison notes. "We live in their backwards compatibility shadow, which is very long indeed. We can emulate those file-sharing protocols because they can't change the clients that much and break the backend servers," but this solution is far from satisfactory. In laboratory tests, conducted by IT Week labs in October 2003, Samba on Red Hat Linux outperformed Windows Server 2003 as a Windows domain controller by a factor of 2.5, but the success of Samba does not disguise the fact that the absence of an open standard impedes interoperability.

"The SMB protocol is disgusting", says Allison. "It grew like a wart. You can tell. It has a 39-byte header, which might have been important when bytes were important on the wire, but now it's crazy. Eventually, I'd like Samba to go away."

I can't talk your language

Proprietary data formats, published or not, offer little long term data security, and are an unreliable choice for transferring data to prospective clients. A few years from now, the contents of the current word processor document, for instance, may well be hidden from view, and will be messy and expensive to retrieve. Reliance is too often placed on backups and short term solutions, and inevitably the data will be lost and forgotten - and the search will be frustrated by the inability to read the binary format that was the "de facto standard" a few short years before.

Microsoft Office, for instance, currently dominates large sectors of the market for productivity suites, possessing the lion's share of the customer base, but this has only been true for the last ten years or so. Back in the '80s, Lotus 1-2-3 had over 90 per cent of the spreadsheet market and Excel was a clunky no-hoper. WordPerfect was a billion dollar giant, and Word had less than 10 per cent of the market. The subsequent success of Word and Excel probably owes more to the success of DOS and Windows than it does to the inherent virtues of the products themselves.

The purpose of open standards is to promote interoperability between different applications on different operating systems. The effect of proprietary data formats is to encourage reliance on single vendor applications and to discourage the implementation of competitive products. Proprietary data formats give us no assurance of permanence or diversity, force dependence on the continuing popularity of a particular product, and are liable to alteration between different versions of the software. The user is locked into an involuntary upgrade cycle with an individual vendor, with few guarantees of consistency, and has little long term control over the viability of the data. This is the significance of the troubles that Microsoft is facing from the European Commission for uncompetitive practices, the current fuss over Microsoft's attempt to force its Office Open XML (OOXML) office document "standard" on the standards bodies, and the coincidental patent action Microsoft is facing from Alcatel-Lucent over MP3 compression technologies.

Patents, "de facto standards" and trade secrets embedded in file formats are anti-competitive and impede innovation and interoperability. They also go against the spirit of cooperation that has been responsible for the vast advances in information technology during the last half-century.

Knitting the web

Interoperability, or the simple notion that computer systems should produce outputs in common formats which allow one computer to talk to another, has been a goal of computing since the beginning of the electronic era. The prime example of network computing built on open standards is the web. Tim Berners-Lee's concept, first proposed in 1989, was that "a global hypertext space be created in which any network-accessible information could be referred to by a single 'Universal Document Identifier'."

On the server side, there were web pages written in a hypertext markup language (HTML)that followed simple conventions and rules. On the client side, there was a browser that was able to translate HTML code into a readable format. The web of browsable HTML pages was knitted together by hypertext links, which became known as URLs, and everything was connected by TCP/IP and IPV4. As Berners-Lee put it: "The dream behind the web is of a common information space in which we communicate by sharing information. Its universality is essential: the fact that a hypertext link can point to anything."

The web has worked well because the protocols and standards have remained open, universal, consistent and simple, and the players, for the most part, have had to play ball and follow the rules. Every computer can talk to every other computer and share information at a fairly basic level, and we are able to share Berners-Lee's "common information space" as a universal resource. These principles have been applied to the current trend for web Services and Service Oriented Architectures, both of which concepts are entirely dependent on open protocols such as SOAP, XML and WDSL to bridge the divide between pieces of software, to feed data between departments, services and consumers, and find new uses for that data.

In an ideal world, this model should apply to all aspects of the network. But the reality is sometimes different, as illustrated by the $1.52 billion judgement against Microsoft for infringing a patent taken out by Bell Laboratories and held by Alcatel-Lucent on MP3 audio compression technologies. MP3 was acknowledged as a standard by the International Organisation for Standardisation (ISO) in 1993, but contains a number of patented technologies owned by a variety of corporate entities, several of which are staking claims on the relative success of the format. If Alcatel-Lucent succeed in their claims against Microsoft we are all the losers, because the "de facto standard" format for compressed music is owned.

1900 was a leap year?

Interoperability is not just about how computers talk to each other over the network, but also their ability to access and share documents, irrespective of the origin of those documents. It has long been recognised that an open standard was desirable for the storage and sharing of office documents to provide continuity, interoperability, choice, access and control for end users. In December 2002, an industry-wide Technical Committee of the OASIS industry consortium was convened to "create an open, XML-based file format specification for office applications." The resulting specification, the OpenDocument format, is cross-platform, 600 pages long, clear and comprehensive, and was submitted to the ISO/IEC Joint Technical Committee 1 (JTC1) in November 2005. The specification was approved for "release as an ISO and IEC International Standard" in May 2006, and published as an ISO standard in November 2006.

Already, the OpenDocument format has been widely implemented and accepted as the standard format for storing and sharing documents by more than a dozen governments, and has been recommended for adoption by a number of American states. The problem for Microsoft is that the adoption of an open format as a standard threatens Microsoft's grip on the office market. Hence Microsoft has pushed for the adoption of a counter standard, confusingly called Office Open XML (OOXML).

Earlier this year, Microsoft asked for fast track adoption by ISO of the 6000 page OOXML specification. The objections to this procedure can be summarised in a statement from FFII, the Foundation for a Free Information Infrastructure: "OpenXML relies on undisclosed patents, and undisclosed or incomplete licensing terms that make any independent reimplementation impossible or heavily risky. It obliges implementers to reverse-engineer the behaviour of old closed Microsoft applications and formats. It uses non-standard formats for languages and dates, and specifies known bugs, such as treating 1900 as a leap year." So far, fourteen national standards bodies have voiced direct objections to the fast track adoption of what has been described as a "single vendor standard", and others have voiced concerns.

Most office suites will implement OpenDocument as their default format, enabling free and easy exchange of documents. Microsoft Office, which currently dominates the market, can be made compliant via an OpenDocument plugin which may be downloaded from Microsoft's website, but this only partially satisfies the demands of governments and industry for the OpenDocument format to be adopted as an open and ubiquitous format for the storage, retrieval and sharing of documents.

An open standard that enables users to share documents on any operating system on any device, unencumbered by patents and trade secrets, is a necessity for the office of the future. OOXML requires "bug for bug compatibility" with all the past implementations of Microsoft Office and does not fulfil the primary objectives of transparency and portability.

In a polyglot world, where people exchange information in many different languages and dialects, it is important that there are common reference points that make interaction possible. Standards give us the means to talk to one another in a heterogeneous environment, whatever applications, operating systems or computer language we use. "If I can't talk the language of your proprietary format, I can't hear what you say", and conversation becomes impossible.