« | Home | »

Norm Walsh on XML and JSON

By Noah | November 17, 2010

There’s been a lot of fuss lately about the widespread adoption of JSON for Web APIs, and a sense in some quarters that this represents a failure for XML.  Norm Walsh has a new post summarizing the pros and cons of JSON vs. XML,  and as usual, Norm has it exactly right:

(somewhat rearranging Norm’s text):

In short, if all you need are bundles of atomic values and especially if you expect to exchange data with JavaScript, JSON is the obvious choice. I don’t lose any sleep over that.  […] XML wasn’t designed to solve the problem of transmitting structured bundles of atomic values. XML was designed to solve the problem of unstructured data. In a word or two: mixed content. […]   I’ve seen attempts to represent mixed content in JSON and simple they aren’t. 

XML deals remarkably well with the full richness of unstructured data. I’m not worried about the future of XML at all even if its death is gleefully celebrated by a cadre of web API designers. […] I look forward to seeing what the JSON folks do when they are asked to develop richer APIs. When they want to exchange less well strucured [sic] data, will they shoehorn it into JSON?

That is indeed the tradeoff.  If you want to send along a list of job applicants and their recent salaries, JSON does fine;  if you want to send their resumes, well JSON isn’t quite as helpful.   A surprising amount of the world’s important information is in just such semi-structured documents.  Think insurance policies, shop manuals, and even Web pages themselves.  XML is designed to provide a standard means for encoding and interchanging such information, with good enough pure data facilities that if you then want a unified framework to also handle the applicant list, you can.  When you prefer a simpler, more Javascript-compatible means of exchanging simple data, by all means use JSON.

Topics: Web, Internet, Computing | 8 Comments »

8 Responses to “Norm Walsh on XML and JSON”

  1. Liam Quin Says:
    December 1st, 2010 at 9:51 PM

    We are starting to see people building on JSON – there’s a (very simple) JSON schema draft, for example. Before long there will be JSON Web Services. And then JSON Query. And JSON:Lang (and JSON namespaces). And then JSON Transformations. And then JSON will be almost as complex as XML. Which will still be there….

  2. Noah Says:
    December 1st, 2010 at 10:31 PM

    Liam Quin wrote:

    > And then JSON will be almost as complex as XML.
    > Which will still be there….

    JSON most likely still won’t deal with mixed content, e.g.

    "This is a <emph>really</emph> interesting example"

    So, in that way it will still be simpler, easier to process for the pure data scenarios that it does handle, and relatively poor at dealing with documents, semi-structured data, etc.

  3. Devdas Bhagat Says:
    March 8th, 2011 at 2:33 AM

    But do we care about resumes as structured data or are we happy with sending binary blobs over the wire?

  4. Noah Says:
    March 8th, 2011 at 4:44 PM

    Devdas Bhagat wrote:

    > But do we care about resumes as
    > structured data or are we happy
    > with sending binary blobs over
    > the wire?

    Well, first of all, let’s not emphasize binary, which suggests things like .doc. That can be useful, but if we’re comparing JSON and XML, both are text.

    I suspect you’re asking: “do we care about resumes as structured data or as undifferentiated text”? Well, that obviously depends on your application, but for many many purposes when you’re dealing with resumes, shop manuals, insurance policies or legal documents you very much want to deal with the structure. Certainly, the structure is important to things like document management systems. It’s also very important for templated document systems, in which something like an insurance policy is manipulated programmatically, with fields (name of policy holder) filled into the running text, paragraphs included conditionally (e.g. only if the policy is for more than a certain amount, etc.) SGML and later XML became big deals in part because of the value of such marked up text. Furthermore, there are lots of interesting things you can do with XML word processor and spreadheet formats such as ODF and OOXML.

    On the other hand, there may well be cases where you’re dealing mainly with the sorts of field-oriented data that JSON is good at, with just an occasional need to carry around something like an HTML fragment as an opaque string (e.g. for an error message or even a form fragment.) If that meets your needs, fine, but do note that even that HTML is marked up text, no matter how opaque it is to the JSON. All that HTML out there should be a pretty big hint that marked up text is indeed very important.

  5. Devdas Bhagat Says:
    March 8th, 2011 at 4:59 PM

    In the context where we will be using JSON, would we care about the semantics of the markup? Or would we be happy with treating it as an opaque string?

    Markup is good, but is it useful in a data packet? Or can we parse the markup at another layer of the application stack and not worry about it in the transport layer at all?

    I would hate to see SGML and it’s derivatives go away, but I don’t see them as solving all problems either. The territory which JSON covers today is structured messages, where the focus is on the meta-data rather than the content of the message itself. Contrast JSON with SOAP and you win, contrast JSON with ODF for a document format and JSON will suck big time.

    XML had become the RDBMS of the message-passing world, and that’s being fixed by JSON. Neither of them is bad in context.

    My question pertained to using JSON in a message passing context, as opposed to trying to represent everything in the one true format.

    (I think binary blobs in the context of wireless drivers, or images, where the knowledge of what the bits mean is left to some other piece of code. Human readability or lack thereof isn’t particularly important).

  6. Noah Says:
    March 8th, 2011 at 5:08 PM

    Devdas: we agree to a significant degree, and I think if you read the original post you’ll see your points acknowledged. I did say: “If you want to send along a list of job applicants and their recent salaries, JSON does fine”.

    On the other hand, you imply that there’s no significant need to pass around things like resumes in messages, and I strongly disagree with that. It depends on the application.

    Now, what I do think we’re finding is that for a great deal of what people are doing with Ajax today, what Norm calls “atomic” values do just fine. JSON is both lower overhead and a more natural fit with Javascript, so go ahead and use it. The community is, at least for the moment, voting for the more targeted, simpler, lower overhead solution that’s convenient for the 80% or 90% case, and perhaps less natural or capable for some others. As Norm says: [for those cases] “…JSON is the obvious choice. I don’t lose any sleep over that.”

  7. W.S.Hager Says:
    July 23rd, 2013 at 8:38 AM

    I know this is quite old, but… How is XML unstructured or semi-structured? Just because an element tree has mixed content? It is implied here that a resume is a document with some atomic content you could reason about, and some parts that are “entirely free of structure” or “to-be-structured”, but this is a false dichotomy.

    The XML standard implies a lot more than you mention, for example how implementations like parsers or xpath follow certain rules. As there exists software that can extract structure and meaning from visually marked up text (e.g. from white-space, headings, etc) XML can no longer contribute there.

    HTML is still viable (yet a different implementation all together), but for how long? More and more people hate writing HTML or producing it with a WYSIWYG editor, and markdown and such are more in favor.

    XML was created as a microwave oven that would eventually supplant all your pots and pans. Please turn your perspective around and say: okay, we’re software developers, what the *bleep* can we do with XML?

  8. Noah Says:
    July 23rd, 2013 at 10:54 AM

    Thank you for your comment. As far as I know, the common use of the term “semi-structured data” is as set out in the Wikipedia article on Semi-Structured data. As far as I can tell, my usage in this posting is entirely consistent with that. Quoting from the Wikipedia article:

    XML, other markup languages, email, and EDI are all forms of semi-structured data.

    Noah

Submit a comment:

Please press the submit comment button below to submit your comment for posting. All comments are moderated, so your comment will not appear until it has been reviewed. The blog owner reserves the right to decline to post any comment for any reason. Also, by pressing the submit comment button, you confirm your acceptance of the legal agreement below. Please read it before submitting your comment.

Legal agreement: by pressing the submit comment button you grant to Noah Mendelsohn a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute your comment contribution and derivative works thereof. Noah Mendelsohn reserves the right to republish such material in any form, though reasonable efforts will be made to retain the attribution to you. You also confirm that you have not knowingly violated copyright or other applicable laws pertaining to material that you have quoted or reproduced in your comment. (Note: if this agreement is not acceptable, an alternative is for you to post your comment on your own blog or other public Web site, and to post a link to that here. That way, you may retain more complete control of your own material.)