File /Humanist.vol22.txt, message 148

Date:         Sun, 27 Jul 2008 08:10:03 +0100
From: Humanist Discussion Group <willard.mccarty-AT-MCCARTY.ORG.UK>
Subject: 22.146 toy or tool
To: humanist-AT-Princeton.EDU

               Humanist Discussion Group, Vol. 22, No. 146.
       Centre for Computing in the Humanities, King's College London
                     Submit to:

   [1]   From:    Humanist Discussion Group                          (147)
         Subject: Re: 22.142 toy or tool?

   [2]   From:    Humanist Discussion Group                          (154)
         Subject: Re: 22.142 toy or tool?

         Date: Sun, 27 Jul 2008 07:52:51 +0100
         From: Humanist Discussion Group <>
         Subject: Re: 22.142 toy or tool?
         In-Reply-To: <>

Homo ludens is always a good concept and book to refer to, and the
'toy/tool' cycle is an important part of discovery. Boys and their
toys. Fooling around with them.

As for wordle, it is very cool, but there are limits to the cognitive
pay-off. I created a representation of Othello that used a lemmatized
version of the text (  I
was struck by the fact that the word 'Cassio' was much bigger than the
other names. Checking the data revealed that Cassio is in fact the
most often named character in the play. That's interesting. But the
size of other words is misleading. The play is not about 'thou',
'will', or 'shall', and verbs like 'come' or 'make' are unremarkable
in this play, while 'think' is not.

The problem is that the size of the word is a function of its count in
the document, without any reference to the question whether that count
is relatively high or low. This works well enough with certain kinds
of documents, e.g. State of the Union addresses, where readers bring a
tacit framework of comparative reference to the words in front of them.

If you wanted to put the quite brilliant design and visualization work
of this application to serious scholarly use, playful or not, you
would really need more sophisticated inputs. Some years ago Paul
Rayson drew my attention to Dunning's log likelihood ratio as a very
effective tool for comparing texts and identifying words that are
disproportionately common or rare in text A compared with some text
B.  WordHoard, an application developed by Northwestern's Academic
Technologies, makes effective use of this statistic, which has the
great virtue of being easy to interpret.

But log likelihood ratios --and other statistics --are very tedious
things to read. If one could use a splendid visualization tool like
wordle to foreground lexical phenomena against more robust and
variable backgrounds that would be terrific.

         Date: Sun, 27 Jul 2008 07:54:08 +0100
         From: Humanist Discussion Group <>
         Subject: Re: 22.142 toy or tool?

Since sending the message above, my colleague Phil Burns in Academic
Technologies succeeded in feeding Wordle with a lemma count in Othello
that is based on Dunning's log likelihood ratio and basically compares
the frequencies in the play with the frequencies in corpus.

He did two versions of this, one with names and the other without names.
You can see them at 


To my mind, these are very striking visualizations and show that Wordle
is a cool toy with a lot of tool potential.


From - Mon Jul 28 07:47:00 2008
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000

Humanist Main Page


Display software: ArchTracker © Malgosia Askanas, 2000-2005