File /Humanist.vol22.txt, message 277

Date:         Wed, 22 Oct 2008 07:19:10 +0100
From: Humanist Discussion Group <willard.mccarty-AT-MCCARTY.ORG.UK>
Subject: 22.288 Coh-Metrix? LIWC? or, text-analysis in the news!
To: humanist-AT-Princeton.EDU

               Humanist Discussion Group, Vol. 22, No. 288.
       Centre for Computing in the Humanities, King's College London
                     Submit to:

         Date: Wed, 22 Oct 2008 07:14:23 +0100
         From: Willard McCarty <>
         Subject: Coh-Metrix? LIWC? or, text-analysis in the news

Two text-analysis tools from other parts of the academy that were just
this morning brought to my attention by a student.

1. Coh-Metrix

Has anyone here experimented with this tool
( It is described as follows:

> Coh-Metrix is a computational tool that produces indices of the
> linguistic and discourse representations of a text. These values can
> be used in many different ways to investigate the cohesion of the
> explicit text and the coherence of the mental representation of the
> text. Our definition of cohesion consists of characteristics of the
> explicit text that play some role in helping the reader mentally
> connect ideas in the text (Graesser, McNamara, & Louwerse, 2003). The
> definition of coherence is the subject of much debate. Theoretically,
> the coherence of a text is defined by the interaction between
> linguistic representations and knowledge representations. When we put
> the spotlight on the text, however, coherence can be defined as
> characteristics of the text (i.e., aspects of cohesion) that are
> likely to contribute to the coherence of the mental representation.
> Coh-Metrix provides indices of such cohesion characteristics.

The tool has recently been used to analyse (surprise, surprise) the
language of the candidates in the US Presidential election
( It would be particularly
interesting if this had been tried on more demanding text or with more
demanding questions.

2. Linguistic Inquiry and Word Count (LIWC)

LIWC ( seems at first glance to be
methodologically much simpler. As far as I can tell from a quick
reading, it computes scores based on occurrences of target words
pre-defined to belong to different affective categories, plus scores
based on counts of sentence length and the like. It depends centrally on
a dictionary of 4500 words:

> The LIWC2007 Dictionary is the heart of the text analysis strategy.
> The default LIWC2007 Dictionary is composed of almost 4,500 words and
> word stems. Each word or word stem defines one or more word
> categories or subdictionaries. For example, the word cried is part of
> five word categories: sadness, negative emotion, overall affect,
> verb, and past tense verb. Hence, if it is found in the target text,
> each of these five subdictionary scale scores will be incremented. As
> in this example, many of the LIWC2007 categories are arranged
> hierarchically. All anger words, by definition, will be categorized
> as negative emotion and overall emotion words. Note too that word
> stems can be captured by the LIWC2007 system. For example, the
> LIWC2007 Dictionary includes the stem hungr* which allows for any
> target word that matches the first five letters to be counted as an
> ingestion word (including hungry, hungrier, hungriest). The asterisk,
> then, denotes the acceptance of all letters, hyphens, or numbers
> following its appearance.

Not being up-to-date with research in this area (psycholinguistics?) I
don't know how this tool compares with affective research via
text-analysis that has been going on for decades. Perhaps someone here
can say. How reliable is such research?



Humanist Main Page


Display software: ArchTracker © Malgosia Askanas, 2000-2005