Die Bücherschätzung von Google ein Schwachsinn …?

Im VÖBBLOG wurde unlängst der von Google kommende Hinweis gepostet, dass Google die Anzahl der Bücher weltweit vorerst auf 129.864.880 geschätzt hätte.

„After we exclude serials, we can finally count all the books in the world,“ wrote Google’s Leonid Taycher in a GBS blog post. „There are 129,864,880 of them. At least until Sunday.“

Naja, dass solche Zahlen keinen großen Wert haben, war klar. Arstechnica bezeichnet nun die Art und Weise, wie Google Book Search (GBS) zu dieser Zahl gelangt als glatten Schwachsinn und das ganze basiere auf einer „ongoing GBS metadata farce“.

But the problem with Google’s count, as is clear from the GBS count post itself, is that GBS’s metadata collection is a riddled with errors of every sort. Or, as linguist and GBS critic Goeff Nunberg put it last year in a blog post, Google’s metadata is „train wreck: a mish-mash wrapped in a muddle wrapped in a mess.“

Wer ist schuld daran? GBS meint, die Bibliotheken, weil die Metadaten ja eigentlich von dort kämen. [Ich habe in Kommentar zu http://www.univie.ac.at/voeb/blog/?p=6328 ja zum GBS Deal mit der ÖNB bereits die Qualität der Metadaten zu den Altbeständen der ÖNB bemängelt …]

Contrast this with the view of Eric Hellman, a blogger who covers digital library issues. Hellman agrees with Google that most library metadata collections are in sorry shape to begin with, and he suggests that Google might actually improve the situation if the company can become a one-stop shop for the world’s book metadata. …

Whoever’s to blame for the sorry state of GBS’s metadata, no one disputes that the problems are many and endemic. Indeed, much of the Google blog post on the book count is taken up with exactly this issue—i.e., how to deal with the flood of bad, library-generated metadata infesting its records collection. Google’s counting algorithm is an attempt to make the best of an awful situation, but Taycher’s description of it doesn’t inspire confidence in the final output, especially given where GBS’s metadata problems seem to be clustered.

In the end, most of the „metadata problems“ that Google’s engineers are trying to solve are very, very old. Distinguishing between different editions of a work, dealing with mistitled and misattributed works, and sorting out dates of publication—these are all tasks that have historically been carried out by human historians, codicologists, paleographers, library scientists, museum curators, textual critics, and learned lovers of books and scrolls since the dawn of writing. In trying to count the world’s books by identifying which copies of books (or records of books, or copies of records of books, or records of copies of books) signify the „same“ printed and bound volume, Google has found itself on the horns of a very ancient dilemma.

Google may not (or, rather, certainly will not) be able to solve this problem to the satisfaction of scholars who have spent their lives wrestling with these very issues in one corner or another of the humanities. But that’s fine, because no one outside of Google really expects them to. The best the search giant can do is acknowledge and embrace the fact that it’s now the newest, most junior member of an ancient and august guild of humanists, and let its new colleagues participate in the process of fixing and maintaining its metadata archive. After all, why should Google’s engineers be attempting to do art history? Why not just focus on giving new tools to actual historians, and let them do their thing? The results of a more open, inclusive metadata curation process might never reveal how many books their really are in the world, but they would do a vastly better job of enabling scholars to work with the library that Google is building.


Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert