File /Humanist.vol22.txt, message 715


From: Humanist Discussion Group <willard.mccarty-AT-mccarty.org.uk>
To: humanist-AT-lists.digitalhumanities.org
Date: Sun,  3 May 2009 06:20:08 +0000 (GMT)
Subject: [Humanist]  22.731 statistics for humanists


                 Humanist Discussion Group, Vol. 22, No. 731.
         Centre for Computing in the Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist-AT-lists.digitalhumanities.org



        Date: Sat, 2 May 2009 06:44:56 -0700
        From: Nathaniel Bobbitt <flautabaja-AT-hotmail.com>
        Subject: RE: [Humanist] 22.728 controlled vocabularies? statistics for humanists?
        In-Reply-To: <20090502060456.62A643F6D-AT-woodward.joyent.us>


The issue of statistics and text runs between several areas: structure (systemic), multi-dimensionality, combinatorial expressions,and patterns/variation.
Corpus linguistics looks at how language is used and use patterns. A corpus has millions of words and has multiple registers: academic, conversational, dialect,newspaper, internet, radio, etc. Corpus linguists have some statistical practices.

Pioneers (linguists) in these pursuits include:
Douglas Biber, Patrick Hanks, James Pustejovsky, and Christian Matthiessen
Currently, I am developing a new way to encode,decode, and recode the following through features based on packing, fill-in, and an optical system based on two states: 1) features 2) transition (presence, absence/evacuation, and effacement).This work thinks about the mobility of patterns in the movement of checker pieces. Such an analogy between the movement of checkers and text (language-use) grows out of Halliday and Hasan's Cohesion in English.

For methodologies on Statistics and text-analysis see:

Corpus Linguistics

Introductory Materials: Biber, D.  1988.  Variation across speech and writing.  Cambridge: Cambridge University Press. 

Biber, D., S. Conrad, and R. Reppen.  1998.  Corpus linguistics:  Investigating language structure and use.  Cambridge: Cambridge University Press.

Conrad, S., and D. Biber (eds.).  2001.  Variation in English: Multi-Dimensional studies.  London:  Longman.

Biber 

See Multidimensional Analysis related papers at: http://jan.ucc.nau.edu/~biber/journal.htm

Biber, D.  2004.  Conversation text types:  A multi-dimensional analysis.  In Gérald Purnelle, Cédrick Fairon, and Anne Dister (eds.), Le poids des mots:  Proceedings of the 7th International Conference on the Statistical Analysis of Textual Data, 15-34.  Louvain:  Presses universitaires de Louvain.
 
Biber, D.  2003.  Variation among university spoken and written registers:  A new multi-dimensional analysis.  In Charles Meyer and Pepi Leistyna (eds.), Corpus analysis: Language structure and language use, 47-70.  Amsterdam: Rodopi.

Generative/ Combinatorial Methods

Patrick Hankshttp://nlp.fi.muni.cz/projekty/cpa/

Hanks, Patrick, and James Pustejovsky. 2005. "A Pattern Dictionary for Natural Language Processing" in Revue Francaise de linguistique appliquée, 10:2.

Hanks, Patrick. 2008. "Mapping meaning onto use: a Pattern Dictionary of English Verbs". AACL 2008, Utah. (slides)

Pustejovsky, James. 1995. The Generative Lexicon. MIT Press.
http://nlp.fi.muni.cz/projekty/cpa/

Matthiessen, Christian M.I.M. 1995. Lexicogrammatical Cartography: English Systems. xviii + 978 pp. Tokyo, Taipei & Dallas: International Language Sciences Publishers.

One obvious application of corpus linguistics is world english in poetics.http://www.world-english.org/listening.htm

http://www.cs.brandeis.edu/~jamesp/classes/prague/index.htmlhttp://www.ling.mq.edu.au/about/staff/matthiessen_christian/publications.html

Lexicogrammatical Cartography:  English Systems

http://scholar.google.com/scholar?hl=en&client=safari&rls=en&q=lexico%20grammatical%20cartography%20&um=1&ie=UTF-8&sa=N&tab=ws

Simple software you can explore:

http://www.collins.co.uk/Corpus/CorpusSearch.aspx

Note it uses American/British English, conversational english, radio transcripts.Type a word or a phrase you will see samples from the corpus that shows you common uses.

Surf and explore an actual corpus:

http://www.americancorpus.org/

There is a five minute tour link look for the following at the bottom of the text on the right hand frame:"Please feel free to take a five minute guided tour, which will show the major features of the corpus.  A simple click for each query will automatically fill in the form for you, search through the 385 million words of text, and then display the results."

The following will show you how to use corpus linguistics to develop teaching materials. Here is a general (non-technical) introductory book: http://www.amazon.com/Corpus-Classroom-Language-Teaching-Linguistics/dp/0521616867/ref=pd_sim_b_4

Nat Bobbitt Portland,OR


_______________________________________________
List posts to: humanist-AT-lists.digitalhumanities.org
List info and archives at at: http://digitalhumanities.org/humanist
Listmember interface at: http://digitalhumanities.org/humanist/Restricted/listmember_interface.php
Subscribe at: http://www.digitalhumanities.org/humanist/membership_form.php


   

Humanist Main Page

 

Display software: ArchTracker © Malgosia Askanas, 2000-2005