File /Humanist.vol22.txt, message 118

Date: Sun, 13 Jul 2008 07:27:01 +0100
From: "Humanist Discussion Group \(by way of Willard McCarty              <>\)" <willard-AT-LISTS.VILLAGE.VIRGINIA.EDU>
Subject: 22.116 understanding language
To: <humanist-AT-Princeton.EDU>

               Humanist Discussion Group, Vol. 22, No. 116.
       Centre for Computing in the Humanities, King's College London
                     Submit to:

   [1]   From:    Ms Mary Dee Harris <>             (22)
         Subject: Re: 22.114 understanding language (in the right way)?

   [2]   From:                                (44)
         Subject: 22.114 understanding language (in the right way)?

         Date: Sun, 13 Jul 2008 07:21:14 +0100
         From: Ms Mary Dee Harris <>
         Subject: Re: 22.114 understanding language (in the right way)?

I initially missed the date of the quotation when I read the post,
but when I saw Willard's comment asking about accomplishments "in the
last 46 years", I wasn't surprised.  While I believe the statement is
largely still true, that we don't understand the underlying
mechanisms of language, there is considerably more understanding of
the usage of language, due to statistical studies in the last decade.

I recently attended the ACL conference in Columbus, Ohio, and found
that most papers reported on the use of statistical analysis of
corpora, in one way or another.  With the availability of large-scale
corpora in a number of different languages, more and more work is
being presented to show how language is used.  While I was not able
to hear all the papers given the multiple simultaneous tracks, it was
apparent to me that computational linguists still have a ways to go
in finding the underlying mechanisms and understanding language as a
self-organizing system.

So I would say that we have taken Hillel's first step: "Any work in
this field must start from an analysis and understanding of language
use" but we haven't moved very far past that.

Mary Dee Harris
Chief Language Officer
Catalis, Inc.
Austin, TX

         Date: Sun, 13 Jul 2008 07:24:33 +0100
         Subject: 22.114 understanding language (in the right way)?

Ah... Pivotal moments in the history of computational linguistics. The thing to
remember is that the entire field of computational linguisics came into
existence as a result of this conclusion. The field didn't exist until the
efforts to use computers to do translation failed so dramatically back in the
1960s. In the USA the result was the now infamous ALPAC report, that
effectively said that machine translation was an area of research--not
implementation. It cut the funding by non-research government agencies for
building machine translation systems and started the research field of
computational linguistics.

Yes, we're closer to modeling language as a self-organizing system...which is
pretty much what using neural networks to model language phenomena have done,
but neural networks aren't responsible for most of the advances in
computational linguistics over these last 46 years. Those came from hard
reasoning about language and efforts to construct and test programs using those
reasoning principles--and from the collection of data about words and language
use--and from exponential improvements in computer hardware.

The advances have been significant. Grammar was given a formal basis
by Chomsky,
which broke its exclusive ownership by linguists and offered mathematicians and
those new people who worked with computers the ability to experiment with it.
The new fields of speech synthesis, speech recognition (now an everyday
experience over telephones and through interactions with electronic gadgetry in
cars and computers) came into existence and became commercial
successes. Machine
translation led to the creation of terminology banks that substituted data for
expertise in performing translations. Parsing algorithms pushed the development
of computational mechanisms for using grammar and lexicon to fuel
(parsing) and reconstruction (generation) of language. Progress was continually
pushed onward by the relentless pressure from computer hardware developments
that made each order of magnitude increase in computer speed seem like it ought
to solve the problems with the last generation. Text corpora were created for
millions, billions, and now trillions of words. So much text was captured that
statistical techniques for extracting lexicon and understanding language
phenomena became possible, if not essential, as hand-analysis of all the data
became impractical.

We still don't have the theoretical understanding of language expected to be
possible back in the 1960s, but it is hard to be that critical of the
accomplishments. Not knowing how hard the problem was can only make those early
criticisms about the lack of progress sound like complaints born of hubris. In
the end, I tend to think tinkering has been pretty successful and that
theoretical developments come as much from seeing the imperfect constructs that
others have cobbled together as they do from theoretical preconception.
Sometimes you just have to throw a rock in the water to see what happens rather
than think through what theoretically might happen.


Humanist Main Page


Display software: ArchTracker © Malgosia Askanas, 2000-2005