By Noah | December 1, 2011
Many people have commented on the pros and cons of the New York Times paywall. Most of these comments debate the effectiveness of the paywall in meeting the Times’ financial goals, discuss ways in which users will circumvent the paywall, etc. Here I’d like to explore a different issue: it seems to me that the paywall, as currently implemented, violates the specifications for the Web’s HTTP protocol. Interestingly, my concern is not with the part of the system that charges readers, it’s with the part the tries to count the 20 free pages allowed per month.
One of the important features of the HTTP specification is that GET, the operation that is used for almost all attempts to retrieve a Web page, has some very carefully crafted semantics. In particular, GET is inappropriate for any request that, directly or as a side effect, updates the state of a server. What’s an update? Well, taking money out of your bank account, confirming a plane reservation, or, in my opinion anyway, using up one of your 20 free New York Times page accesses. The way the HTTP specification puts this is:
In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered “safe”. This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.
Exactly. I really want to be warned before I access a page that’s coming out of my monthly quota, and that doesn’t happen today. Furthermore, it’s perfectly legitimate for software to follow links for me without me even asking. Consider an e-mail reader that might offer to download for me local copies of any page linked from any e-mails I might have received, perhaps so that I can read my correspondence while I’m traveling. Such an agent is unlikely to be aware that such accesses can quickly deplete my NYT allowance, and perhaps for pages I never intended to read anyway.
Does this actually bother anyone in practice? Yes. In fact, I think this is the technical explanation for one of the most frustrating aspects of the paywall. There are all sorts of situations in which one winds up unintentionally clicking on a link to an NYT article, only to discover after the fact that yet another “free” access has been accounted. This happens when links are sent using URL shorteners, when the link text or image does not show the URL, or when the user neglects to read the target URI carefully before following a link.
So, what should the NYT do if they want to continue to offer free articles, and also follow Web architecture? One acceptable answer would be: the response to a GET request for an article that counts against the “free” limit should redirect to a simple Web form that says:
“You are about to use one of your 20 free accesses to the New York Times for this month. If you wish to go ahead, click the button below”.
The button would, of course, do an HTTP POST, to properly account the access before returning the article.
As it stands, the paywall misuses HTTP, and the consequences are indeed frustrating to users. Furthermore, if too many sites start misusing GET, then it will become more difficult for us all to explore the Web without worrying about the consequences for each link we follow, and it may also be more difficult for engines like the Google crawler to retrieve Web content.
By the way, the W3C Technical Architecture Group, which I chair, has written about safe use of HTTP in its finding: URIs, Addressability, and the use of HTTP GET and POST.