Friday, May 25, 2012

Real trends in word and sentence length

A couple of days ago, The Telegraph quoted an actor and a television producer emitting typically brainless "Kids Today" plaints about how modern modes of communication, especially Twitter, are degrading the English language, so that "the sentence with more than one clause is a problem for us", and "words are getting shortened". I spent a few minutes fact-checking this foolishness, or at least the word-length bit of it — but some readers may have misinterpreted my post as arguing against the view that there are any on-going changes in English prose style.

So I wrote a script to harvest the  inaugural addresses and state of the union addresses from the site of the American Presidency Project at UCSB, and some other scripts to (I hope) extract the texts of the speeches from their html wrappings, and to count word and sentence lengths. Why use these sources? Well, different kinds of writing have their own norms, and so it wouldn't be good evidence of an overall historical trend to show (for example) that 20th-century sports reporting is stylistically different from 19th-century sermons, or that 21st-century blogging is different from 18th-century pamphleteering. U.S. Presidential addresses are one accessible example of a body of texts, spanning more than 200 years, which ought to be fairly consistent in genre and register.

The results suggest that mean word lengths have decreased slightly in these addresses over the past century — by 5% or so — while mean sentence lengths have been falling since the founding of the republic, and have undergone a cumulative drop of perhaps 50%.

(In the plots above, the red lines track the address-by-address measurements as my scripts calculated them, while the blue lines are smoothed approximations produced by locally-weighted scatterplot smoothing in R.)

There are lots of obvious questions, if you care about things like this — for example, how much of the fall in mean sentence length is due to using less clausal embedding, and how much is due to splicing fewer sentences together paratactically, e.g. with semi-colons?

But whatever is going on, we can't blame (or praise) Twitter for it, since Twitter was founded in 2006, and thus could possibly have affected only the last datapoint in the Inaugural graphs, and the last five datapoints in the SOU graphs.

For a more anecdotal picture of the trend, here is the first paragraph (five sentences) of George Washington's 1789 Inaugural Address:

Among the vicissitudes incident to life no event could have filled me with greater anxieties than that of which the notification was transmitted by your order, and received on the 14th day of the present month. On the one hand, I was summoned by my country, whose voice I can never hear but with veneration and love, from a retreat which I had chosen with the fondest predilection, and, in my flattering hopes, with an immutable decision, as the asylum of my declining years — a retreat which was rendered every day more necessary as well as more dear to me by the addition of habit to inclination, and of frequent interruptions in my health to the gradual waste committed on it by time. On the other hand, the magnitude and difficulty of the trust to which the voice of my country called me, being sufficient to awaken in the wisest and most experienced of her citizens a distrustful scrutiny into his qualifications, could not but overwhelm with despondence one who (inheriting inferior endowments from nature and unpracticed in the duties of civil administration) ought to be peculiarly conscious of his own deficiencies. In this conflict of emotions all I dare aver is that it has been my faithful study to collect my duty from a just appreciation of every circumstance by which it might be affected. All I dare hope is that if, in executing this task, I have been too much swayed by a grateful remembrance of former instances, or by an affectionate sensibility to this transcendent proof of the confidence of my fellow-citizens, and have thence too little consulted my incapacity as well as disinclination for the weighty and untried cares before me, my error will be palliated by the motives which mislead me, and its consequences be judged by my country with some share of the partiality in which they originated.

And the first five sentences of Barack Obama's 2009 Inaugural Address:

My fellow citizens, I stand here today humbled by the task before us, grateful for the trust you have bestowed, mindful of the sacrifices borne by our ancestors. I thank President Bush for his service to our Nation, as well as the generosity and cooperation he has shown throughout this transition.

Forty-four Americans have now taken the Presidential oath. The words have been spoken during rising tides of prosperity and the still waters of peace. Yet every so often, the oath is taken amidst gathering clouds and raging storms.

In between, the first five sentences of Lincoln's 1861 Inaugural:

In compliance with a custom as old as the Government itself, I appear before you to address you briefly and to take in your presence the oath prescribed by the Constitution of the United States to be taken by the President "before he enters on the execution of this office."

I do not consider it necessary at present for me to discuss those matters of administration about which there is no special anxiety or excitement.

Apprehension seems to exist among the people of the Southern States that by the accession of a Republican Administration their property and their peace and personal security are to be endangered. There has never been any reasonable cause for such apprehension. Indeed, the most ample evidence to the contrary has all the while existed and been open to their inspection.

