File /Humanist.vol22.txt, message 220

Date:         Sun, 28 Sep 2008 08:16:54 +0100
From: Humanist Discussion Group <willard.mccarty-AT-MCCARTY.ORG.UK>
Subject: 22.228 online database
To: humanist-AT-Princeton.EDU

               Humanist Discussion Group, Vol. 22, No. 228.
       Centre for Computing in the Humanities, King's College London
                     Submit to:

         Date: Sun, 28 Sep 2008 08:10:47 +0100
         From: Humanist Discussion Group <>
         Subject: RE: "online database" for extinct language <Liza M.=20

Hello all, this is my first post to the list.

Ms Noetzel requested advice regarding an " online databse I envision
is a mixture of Google Book Search and the Corpus del Espanol. That
is, I'd like for the site to offer a visual, facsimile (with the
option of viewing the text in an expanded, plain text version)as well
as to have word search (exact word or wild card)options. ...compatible
with UNIX".

First, UNIX is the grandmother of all operating systems. The tech
people who operate (install, configure, maintain and (possibly)
support applications running on) it might want to know what you are
doing and how much support you are going to demand from them.  They
will think the system already has a viable database (probably called
mySQL or postgreSQL).  To them the database is the engine that does
all the work, to you it is the data and interaction you have with it.
There are a lot of natural language processing tools that have been
developed on the Unix platform, as well as formal concept analysis
tools etc.

You might want to start with the "Survey of the State of the Art in
Human Language Technology", read online or download pdf from

and then peruse Language Technology World (
"... an ontology-based virtual information center on the wide spectrum
of technologies for dealing with human languages. It is a free service
provided to the R&D community, potential users of language
technologies, students and other interested parties by the German
Research Center for Artificial Intelligence (DFKI)."

It also points to the natural language software registry
) which lists technologies in the following sections (number of

Annotation Tools (35)
Evaluation Tools (9)
Language Resources (74)
Multimedia (14)
Multimodality (20)
NLP Development Aid (88)
Spoken Language (47)
Written Language (227)

The tools are not necessarily specific to ethnology but many would be
useful in that context.  The "database" that any particular technology
uses could be simply a collection of files on your storage media or a
content management system (CMS) with a heavy duty server behind it
that handles the data through "structured query language" (SQL)

You might also look at "NLTK =C3=A2=C2=80=C2=94 the Natural Language Toolkit =C3=A2=C2=80=C2=94 is a
suite of open source Python modules, data and documentation for
research and development in natural language processing. NLTK contains
Code supporting dozens of NLP tasks, along with 40 popular Corpora and
extensive Documentation including a 375-page online Book.
Distributions for Windows, Mac OSX and Linux are available" from

Neil Kelly <>
Aesch, 4147 Switzerland

home:	+41 (0)61 681 17 77
mobile:	+41 (0)79 227 40 78

From - Sun Sep 28 10:12:55 2008
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000

Humanist Main Page


Display software: ArchTracker © Malgosia Askanas, 2000-2005