File /Humanist.vol22.txt, message 525


From: Humanist Discussion Group <willard.mccarty-AT-mccarty.org.uk>
To: humanist-AT-lists.digitalhumanities.org
Date: Tue, 17 Feb 2009 06:44:47 +0000 (GMT)
Subject: [Humanist] 22.540 a control corpus of 19th century American


                 Humanist Discussion Group, Vol. 22, No. 540.
         Centre for Computing in the Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist-AT-lists.digitalhumanities.org



        From: Humanist Discussion Group <willard.mccarty-AT-mccarty.org.uk>


                 Humanist Discussion Group, Vol. 22, No. 539.
         Centre for Computing in the Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist-AT-lists.digitalhumanities.org

  [1]   From:    Martin Mueller <martinmueller-AT-northwestern.edu>           (61)
                        Americanintellectual writing?

  [2]   From:    Mark Davies <Mark_Davies-AT-byu.edu>                         (23)
        Subject: RE: [Humanist] 22.535 a control corpus of 19th century
                Americanintellectual writing?


--[1]------------------------------------------------------------------------
        Date: Mon, 16 Feb 2009 07:44:35 -0600
        From: Martin Mueller <martinmueller-AT-northwestern.edu>
        Subject: Re: [Humanist] 22.535 a control corpus of 19th century Americanintellectual writing?
        In-Reply-To: <20090216120717.40E3D2DB4B-AT-woodward.joyent.us>

John,

This is not a direct answer to your query but some of it may be  
relevant anyhow.

In the Monk Project we have developed procedures for taking texts from  
various TEI archives and move them into a common TEI P5 environment  
that makes them more interoperable and supports easy linguistic  
annotation.  We have used MorphAdorner, developed by Phil Burns at  
Northwestern University, to create texts that are lemmatized, have  
virtual orthographic standardization, and part-of-speech tagging.  
Brian Pytlik Zillig and Steve Ramsay at Nebraska have been responsible  
for the architecture and details of this process.

Some 300 texts from the Wright Archive of American fiction are  
available in a linguistically format right now. There are another 700  
Wright novels that can be done that way. Another 2,000 didn't go  
through the last editorial review when Perry Willet did the project,  
but if I understand him correctly, a needed text can be brought up to  
snuff within a couple of hours.

We also have ~100 earlier American texts from the public archive of  
the University of Virginia Early American fiction project.

The very large and diverse archive of 'Documenting the American South'  
at the University of North Carolina has practiced sparse but  
consistent annotation over the years.  DocSouth texts would yield with  
little or no modification to linguistic annotation.

Texts in these TEI archives all have much better bibliographical  
information than what is typically found in Project Gutenberg texts.

MM
On Feb 16, 2009, at 6:07 AM, Humanist Discussion Group wrote:

>                Humanist Discussion Group, Vol. 22, No. 535.
>        Centre for Computing in the Humanities, King's College London
>                      www.digitalhumanities.org/humanist
>               Submit to: humanist-AT-lists.digitalhumanities.org
>
>
>
>       Date: Mon, 16 Feb 2009 10:57:55 +0000
>       From: "Bradley, John" <john.bradley-AT-kcl.ac.uk>
>       >       In-Reply-To: <20090216064439.639862E171-AT-woodward.joyent.us>
>
> I am supervising a student at CCH/KCL who is working with the  
> writings of several American scholars from the 19th century.  At  
> this point his work would benefit from having a control corpus of  
> 19th century American intellectual writing that he could use for  
> various kinds of statistical comparison.  He would welcome both  
> literary-oriented and non-literary (scientific?) texts, and even  
> material that although written for an intellectual audience appeared  
> in the non-scholarly press.
>
> He is checking out what is available in Project Gutenberg already.  
> Is there a member of Humanist who could suggest other possible  
> digital textual sources?
>
> Many thanks for your suggestions.
>
> ... john bradley
>


--[2]------------------------------------------------------------------------
        Date: Mon, 16 Feb 2009 10:49:53 -0700
        From: Mark Davies <Mark_Davies-AT-byu.edu>
        Subject: RE: [Humanist] 22.535 a control corpus of 19th century Americanintellectual writing?
        In-Reply-To: <20090216120717.40E3D2DB4B-AT-woodward.joyent.us>


You might try the various "Making of America" collections:

http://moa.umdl.umich.edu/
http://cdl.library.cornell.edu/moa/

as well as some of the other collections from:

http://memory.loc.gov/ammem/index.html

Also, I know it's later than the 1800s, but you might look at the 100 million word TIME Corpus (1920s-present): http://corpus.byu.edu/time/ .

Finally, I'm working on a 300 million word corpus of historical American English (early 1800s-present time), which will be balanced between fiction, non-fiction, newspapers, and popular magazines. It will complement the nearly 400 million word Corpus of Contemporary American English:

http://www.americancorpus.org

But this historical corpus is dependent on funding, and isn't available yet.

Best,

Mark Davies

===========================================Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
===========================================




_______________________________________________
List posts to: humanist-AT-lists.digitalhumanities.org
List info and archives at at: http://digitalhumanities.org/humanist
Listmember interface at: http://digitalhumanities.org/humanist/Restricted/listmember_interface.php
Subscribe at: http://www.digitalhumanities.org/humanist/membership_form.php




   

Humanist Main Page

 

Display software: ArchTracker © Malgosia Askanas, 2000-2005