File /Humanist.vol22.txt, message 642


From: Humanist Discussion Group <willard.mccarty-AT-mccarty.org.uk>
To: humanist-AT-lists.digitalhumanities.org
Date: Sun, 29 Mar 2009 07:45:21 +0000 (GMT)
Subject: [Humanist]  22.655 a billion e-books



                 Humanist Discussion Group, Vol. 22, No. 655.
         Centre for Computing in the Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist-AT-lists.digitalhumanities.org

  [1]   From:    "dennis c.l." <cyberdennis-AT-gmail.com>                     (20)
        Subject: Re: [Humanist] 22.654 a BILLION e-books???

  [2]   From:    "Jan Rybicki" <jkrybicki-AT-gmail.com>                       (36)
        Subject: RE: [Humanist]  22.654 a BILLION e-books???

  [3]   From:    "Michael S. Hart" <hart-AT-pglaf.org>                        (24)
        Subject: Re: [Humanist]  22.654 a BILLION e-books???

  [4]   From:    "Mandell, Laura C. Dr." <mandellc-AT-muohio.edu>              (9)
        Subject: Re: [Humanist] 22.652 what good are a billion ebooks? Resp.,
                No. 654

  [5]   From:    "Gerry Coulter" <gcoulter-AT-ubishops.ca>                   (117)
        Subject: A Billion books

  [6]   From:    "Michael S. Hart" <hart-AT-pglaf.org>                       (197)
        Subject: Re: [Humanist]  22.654 a BILLION e-books???


--[1]------------------------------------------------------------------------
        Date: Sat, 28 Mar 2009 05:13:28 -0300
        From: "dennis c.l." <cyberdennis-AT-gmail.com>
        Subject: Re: [Humanist] 22.654 a BILLION e-books???
        In-Reply-To: <20090328074243.2BD7430ADD-AT-woodward.joyent.us>


Jim R said in 22.654:

  "What would you learn better from: a deep understanding of a single very
> good book or a surface
> understanding of 100 lesser books?"

to which I make mine St Thomas Aquinas' "*hominem unius libri timeo*" and
Alexander Pope's "A little learning is a dangerous thing"
dennis cintra leite


--[2]------------------------------------------------------------------------
        Date: Sat, 28 Mar 2009 10:54:43 +0100
        From: "Jan Rybicki" <jkrybicki-AT-gmail.com>
        Subject: RE: [Humanist]  22.654 a BILLION e-books???
        In-Reply-To: <20090328074243.2BD7430ADD-AT-woodward.joyent.us>

The billion e-books, even if we limit it to literature, whatever that is,
does nothing to solve our main dilemma, so nicely calculated by Etiemble
("Should we redefine the concept of Weltliteratur?") in the pre-e-book years
(1966):

Take a life of fifty years without a single day of illness or repose - a
total of 18262 days. Meticulously substracting the hours of sleep, meals, of
everyday duties and leasure, of professional work, try to estimate how much
time is left for you to read literature's masterpiece to achieve little but
a glimpse of what literature is. I shall be generous - I shall give you the
privilege of reading a beautiful book a day - day after day - from among
those available to you in your mother tongue, in languages you can read, and
in translation. Now you know anyway you will never be able to go through The
Magic Mountain or The Arabian Nights; at the same time, I am aware that,
given some luck and zeal, you just might be able to cherish Hojoki,
Romancero gitano, Menexenes and Benjamin Constant's De l'esprit de conquete
all in one day, so that you can save the couple of days you might need for
Sholokhov's And Quiet Flows the Don. When compared to the overall number of
all very beautiful books, what are those 18262 titles? A pittance. "Culture
is striving towards the unattainable."

Joris's "bots" might help, of course, but just a little; even they would
produce results in billions of items. Even if the long-heralded demise of
the novel finally happened, the existing produce is much too much to
assimilate in a single lifetime (I'm not sure I got my conditional right).
Not even the canon, whatever it is, if it exists, is assimilable; not even
its travesty in the omnipresent *notes. The only real use for the billion
e-books is for the final hour when the last person on this Earth rushes,
server in hand, to the last spaceship leaving for a terraformed Venus
minutes before the comet hits our planet, because of course the billion
non-e-books is too much for any spaceship fleet - I have no idea how much
they would weigh on paper, but Roberto Busa's calculations of his Aquinas
project give a rough and terrifying estimate.

But of course we the text analysis people are happy with even a mere million
e-books. It is more than we need for a full lifetime of Delta-ing and
PCA-ing and even JGAAP-ing. And the fact that it is usually other people's,
not our own grants that go into this is a strangely comforting thought.

Jan Rybicki



--[3]------------------------------------------------------------------------
        Date: Sat, 28 Mar 2009 03:17:29 -0700 (PDT)
        From: "Michael S. Hart" <hart-AT-pglaf.org>
        Subject: Re: [Humanist]  22.654 a BILLION e-books???
        In-Reply-To: <20090328074243.2BD7430ADD-AT-woodward.joyent.us>


"What would you learn better from: a deep understanding of a single 
very good book or a surface understanding of 100 lesser books?"
James Rovira

Speaking as a an interdisciplinary generalist I think it is obvious
that what you might call a superficial understanding of "100 lesser 
books" would lead to a greater world understanding that your, "deep
understanding of a single very good book."

I am far more suspicious of a point of view generated from a single
source, no matter how deep, than of a point of view generated by an
exposure to the "surface understanding of 100 lesser" sources.

For the world thinker, it is the interplay between ideas that makes
the whole academic process worthwhile, not just one single book, or
one single point of view, or one single anything.

>From this perspective it is obvious that gaining just the highlight
ideas from "100 lesser books" would generate more thought procesing
between them than the "deep understanding of a single...good book."

It takes an interdisciplinary approach to invent new ideas/ideals--
to get outside one subject's boundaries to combine thoughts with an
alternative subject--this is where human growth originates.

Thanks for your comments!!!

Michael S. Hart
Founder
Project Gutenberg
Inventor of eBooks



--[4]------------------------------------------------------------------------
        Date: Sat, 28 Mar 2009 11:17:22 -0400
        From: "Mandell, Laura C. Dr." <mandellc-AT-muohio.edu>
        Subject: Re: [Humanist] 22.652 what good are a billion ebooks? Resp., No. 654
        In-Reply-To: <20090328074243.2BD7430ADD-AT-woodward.joyent.us>


Dear Humanist:

Matt usefully constrains the number of books for us, but there is another consideration.   Texts published before the advent of modern typography cannot be easily digitized EXCEPT as image files with dirty OCR running behind them.  Books published before 1820 on Google books run through the Tesseract OCR return with a 70% rate of correctness.  That may not mean much unless I give you an example: Google books doesn't "think" that the word "curiosity" is used even once in all 1500 pages of the 1784 edition of Samuel Richardson's novel Clarissa (I write this while staring at one page of that text containing three instances).  And so eighteenth-century digital archives have to compensate for this problem.  The Old Bailey Online project collects images of texts and then double-keys all the texts that they collect up to 1834.

No one can read 75 million objects, but we CAN, as Greg Crane has famously said, run those texts through data mining tools that do an amazing amount of "thinking" work.  If all the texts are clean plain text, we can be in charge of how our machines read them, not only aware of the algorithms selecting for us the texts to which we do wish to pay close attention, but also able to change and modify those algorithms.

Sure, digitizing 75 million objects may not be that difficult, but digitizing 400,000 of them, if that means double-keying them as well, is huge.  The Text Creation Partnership has managed to key 2,000 of those 400,000 eighteenth-century texts for which we only have page images, and they are now out of money.

In my view, any texts for which there are only page images, are effectively lost, at sea in a data deluge through which no human mind can sort without the help of human-made machines.  So, Michael Hart says that if we have access to ebooks, "errors in quotations reduced to infinitessimal levels."  If you are looking at pdf file, page images, that cannot be copied, then you have to toggle from screen to screen to type your quotation as you look at it.  Only if that image is accompanied by a text file, whether produced by OCR or keyed, can one actually cut and paste from it into one's own work.  That said, quoting errors can be reduced with page images in the minds of one's readers if one publishes an article on line with a direct link to the quoted page image, whether typed or not.

But the errors made texts are digital objects, accessible through some kind of database, go deeper than that.  I am currently attending an 18th-century literature conference, and teachers are abuzz with the problem that students quote from pages of eighteenth-century texts that they find through keyword search and then excerpt without reading them - this gets at James Rovira's point.

Finally, I just want to add that "book" is not a useful term for thinking about the textual objects that literary critics and historians wish to digitize, and given that fact, there are indeed billions and billions, like Carl Sagan's stars.

Laura Mandell



--[5]------------------------------------------------------------------------
        Date: Sat, 28 Mar 2009 15:00:52 -0400
        From: "Gerry Coulter" <gcoulter-AT-ubishops.ca>
        Subject: A Billion books
        In-Reply-To: <20090328074243.2BD7430ADD-AT-woodward.joyent.us>

Too much is too much.

Do the bots bet co authorhip?


--[6]------------------------------------------------------------------------
        Date: Sat, 28 Mar 2009 12:43:15 -0700 (PDT)
        From: "Michael S. Hart" <hart-AT-pglaf.org>
        Subject: Re: [Humanist]  22.654 a BILLION e-books???
        In-Reply-To: <20090328074243.2BD7430ADD-AT-woodward.joyent.us>


Brief replies to the following below:

>  [1]   From:    Joris van Zundert <joris.van.zundert-AT-gmail.com>          (186)
>        >
>  [2]   From:    Matthew Kirschenbaum <mkirschenbaum-AT-gmail.com>           (177)
>        Subject: Re: [Humanist] 22.652 what good are a billion ebooks?
>
>  [3]   From:    James Rovira <jamesrovira-AT-gmail.com>                      (34)
>        Subject: Re: [Humanist] 22.652 what good are a billion ebooks?
>
>
> --[1]------------------------------------------------------------------------
>        Date: Fri, 27 Mar 2009 09:22:28 +0100
>        From: Joris van Zundert <joris.van.zundert-AT-gmail.com>
>        Subject: Re: [Humanist] 22.652 what good are a billion ebooks?
>        In-Reply-To: <20090327060130.15EE02FC9A-AT-woodward.joyent.us>
>
>
> Thanks for a splendid vision. For what it's worth: I like it, even believe
> it up to point. I'm curious to see whether it will earn you howls and
> outcries. Might also be a kind of eerie deafening silence...

Sometimes the deafening silence is the silence of censorship....

> Can I add in that a billion electronic books will get us to the point were
> it becomes feasible to put out semi-intelligent bots? Agents operating on
> basis of a formalized learning algorithm. At first these will be very stupid
> indeed (just google like morphology based spiders). But we will evolve them
> over time into rather powerful means for scanning, preselecting, relating
> and weighting the information 'out there' that's relevant to the problems
> we're studying.

Obviously the more data there is to use, the more likely to get a
good solid collection of "hits" atop the list when searching it.

> This, I think, will add to your 10% base interest. But only
> if we have the data, vast amounts of it.

The "only" word there bothers me.

If we already have 10 million eBooks online, or more, we have
more than enough to increase the kind of reading that invests
in the readers' futures by 10% or much more.

I would find your statement more accurate if you used words a
little less constrictive/restrictive than "only."

To me, and maybe this is NOT what you meant [apologies if], a
word such as "only" translated into "necessary" and while I'm
sure there will be 10 million MORE eBooks by 2020, all things
being equal, I don't see them as "necessary" to improvements,
at least on the order of 10%.

>
> Kind regards,
> Joris van Zundert

Many thanks for your comments,

Michael

> --[2]
>        Date: Fri, 27 Mar 2009 07:37:12 -0400
>        From: Matthew Kirschenbaum <mkirschenbaum-AT-gmail.com>
>        Subject: Re: [Humanist] 22.652 what good are a billion ebooks?
>        In-Reply-To: <20090327060130.15EE02FC9A-AT-woodward.joyent.us>
>
> The estimates I've seen put the total number of books published and
> printed in the history of the world at around 65-75 million. That's a
> lot of books, but also a lot of books shy of a billion.
>
> I mention it because 65-75 million objects to digitize just isn't all
> that many. It seems eminently within reach of current patterns and
> trends (leaving aside other matters, such as IP and access). Matt

I sent off a detailed reply to this to the list, which has not 
appeared yet in my mailbox.  If it doesn't in a couple days, I
will send directly to the "interested" parties.

And yes, I agree that digitizing books will move from a paltry
thousands per day to tens of thousands, perhaps before 2020.

However, the next step will be translation so the whole worlds
of people speaking/reading other languages can participate.

Thanks!!!

Michael

///

My apologies, this reply is not as brief as the previous ones,
as the topics were too important.

mh

> --[3]
>        Date: Fri, 27 Mar 2009 09:47:28 -0400
>        From: James Rovira <jamesrovira-AT-gmail.com>
>        Subject: Re: [Humanist] 22.652 what good are a billion ebooks?
>        In-Reply-To: <20090327060130.15EE02FC9A-AT-woodward.joyent.us>
>
> This is beyond silly, Michael.

Of course I am silly, that's how I managed to start up the Project
Gutenberg effort while still a freshman in college, and that's how
I managed to graduate a 4 year program in 2 years with all A's.

If you aren't willing to reach for a life outside the box you have
very little chance of accomplishments outside the box.

Only a person willing to TAKE a million to one shot ever wins it.

> We don't learn from books for 70 years because we don't read for 
> seventy years; meaningful reading perhaps doesn't begin until our 
> teenage years and could conceivably end after retirement.

Please, speak for yourself.

I won my first reading contest when I was in early grade school
by reading 27 books in the first summer vacation week or two.

I owe a lot to that library I grew up near.

And, by the way, I still remember what those books were about.

As for retirement, well, I don't see retirement in my future, sorry.

In fact, given the current trends, I see lots of reading for a
whole generation of Baby Boomers, at least those who want to.

I do realize that some would prefer to play golf.

So, if I live to the statistical average lifespan they talk about,
I will get in just about exactly 70 years of reading, and I'm just
perfectly happy to take quizzes on what I read back then.

Not that I have re-read them all.

> Furthermore, the accumulation of knowledge for any one individual 
> is not necessarily an accumulation of knowledge for the human 
> race, unless that individual publishes.

So, you are saying that our learning "publishes or perishes" along
with our corporal bodies.

I prefer to think of our lives as having a more direct effect.

I think of the fact that I am somewhat well read as having the
improved effect on my relationships with the world at large or
just the individuals I come in more direct contact with:  with
no reliance on my career for those direct effects.

I would like to think that a world in which everyone were just
10% more well read would be at least 10% better in many ways.

The odds of finding a cure for AIDS should go up 10% and along
with any other such invention or discovery we might await.

There are all sorts of indirect advantages, as well, just like
keeping more of our population away from drugs, crime and jail
which are VERY expensive to our civilization.

> But when considering what publication entails, perhaps the most 
> important thing about knowledge is not its accumulation or 
> quantification but how we -use- it.

I couldn't agree with you more!

As Mark Twain said:

"The person who does not read good books has no
advantage over the person who can't read them."

Mark Twain

> I could add to this: how much redundancy is there in a billion 
> books?  How many of those billions books are fluff or idiotic -- 
> popular best sellers, editorials, newspaper articles, comic books?

Sorry, but you are not going to sucker me into an argument about
your own personal taste in comic books, newspaper articles, etc.

However, I must add that I more than agreed with you up to a big
point in my life where I had so much time on my hands that I did
read my very first best seller, and was totally surprised!!!

However, the next one, even more famous, won a Nobel Prize for a
rather out of the way author, didn't do it for me at all.

I learned, as with every other exploration, that you win some on
the one hand and lose some on the other hand.

I will take a look at nearly any author/title someone recommends
to me, but that's no guarantee I will read a whole tome.

However, if you invite me over, and leave me free to do so, your
experience will be that I will look over your entire library.

> These are all valid as objects of cultural study but do not 
> necessarily contribute to the growth of human knowledge until 
> they've been -written about-.

Sorry, I just can't agree with you there!!!

LIFE IS MUCH MORE THAN BEING "WRITTEN ABOUT."

For example, there are many in the media or academia who refuse
to acknowlege my existence, much less my contribution, but that
doesn't change the fact that I do exist or that my contribution
is literally changing the face of the Earth as we speak.

*blush*

> Finally, the point is moot as a response to what I actually wrote
> because I acknowledge advantages to speed of access but questioned
> what phenomenological changes, specifically, are involved in the
> transition from print to electronic media.

Hardly moot, just perhaps not exactly where YOU wanted it to lead.

As per the recent Wall St. Journal article about Facebook versus
The Fortune 500.

> This leads me to consider another question: what good are a 
> billion books to those who don't fully understand what they read?

Again my apologies, but not everyone "fully understands what they 
read" in exactly the same manner as you do, otherwisd they should
all get 100% on every exam you might ever give.

Please give room for other interpretations, other pathways, other
kinds of people. . .even such as myself.

> What would you learn better from: a deep understanding of a single 
> very good book or a surface understanding of 100 lesser books?

Speaking as a an interdisciplinary generalist I think it is obvious
that what you might call a superficial understanding of "100 lesser
books" would lead to a greater world understanding that your, "deep
understanding of a single very good book."

I am far more suspicious of a point of view generated from a single
source, no matter how deep, than of a point of view generated by an
exposure to the "surface understanding of 100 lesser" sources.

For the world thinker, it is the interplay between ideas that makes
the whole academic process worthwhile, not just one single book, or
one single point of view, or one single anything.

>From this perspective it is obvious that gaining just the highlight
ideas from "100 lesser books" would generate more thought procesing
between them than the "deep understanding of a single...good book."

It takes an interdisciplinary approach to invent new ideas/ideals--
to get outside one subject's boundaries to combine thoughts with an
alternative subject--this is where human growth originates.

However, I agree that it is much more fulfilling on one scale to do
a deep reading of a single very good book, much easier, much easier
to digest, much easier to defend your position, much easier to be a
noted expert of "a single very good book"

However, I prefer to expand my horizons rather than stay at home to
be a big fish in a small pond.

Thanks for such an exciting conversation!!!

Michael S. Hart
Founder
Project Gutenberg
Inventor of ebooks



_______________________________________________
List posts to: humanist-AT-lists.digitalhumanities.org
List info and archives at at: http://digitalhumanities.org/humanist
Listmember interface at: http://digitalhumanities.org/humanist/Restricted/listmember_interface.php
Subscribe at: http://www.digitalhumanities.org/humanist/membership_form.php




   

Humanist Main Page

 

Display software: ArchTracker © Malgosia Askanas, 2000-2005