The Wikipedia ecosystem expands (or: The Good, the Bad & the Ugly)

I was busy this last few days working with the export functionality of Wikipedia (some new Greyscale publications upcoming shortly…).

A quick overview: in the end of 2007, the Wikimedia Foundation announces its collaboration with a German start-up, PediaPress, aiming at developing an export function that would allow “high quality print and word processor copies” to be exported from Wikipedia.

This has been progressively implemented on the Wikibooks website, the german Wikipedia, and finally during 2009 it was activated on the english WP site. What is most exciting is that in addition to PDF, the software allows export in the OpenDocument format, which allows further editing and reformating workflows.

The software that allows this is available as an extension for Mediawiki — which means that the export function works not only for Wikipedia, but also for indepentent project wikis! At the present moment there are still some quirks (see a list of bugs i submitted), but it’s extremely promising.

Coupled to this software is a print-on-demand service offered by the german company, which allows WP users to order books compiled from several articles – a business model made possible thanks to WP’s open licensing model. That licencing, on the other hand, offers infinite ways of repackaging the content in different formats – no doubt that we will see “wikipedia spam” permeating all kind of online and offline-media. As we can see, it’s already starting:

cover of the Rembrandt audio-book

One perfect example is the Cologne-based publisher Navarra Verlagsgesellschaft, which is producing audiobook CDs based on Wikipedia articles – mostly biographies, such as Albert Einstein, the Dalai Rama, Rembrandt, but also mixed topics, such as Vampires, Pirates, The Titanic… The CSs are listed on Amazon.de, which gives us the chance to listen to sound snippets and appreciate the sleep-inducing monotony of the voice actors. Of course, such a product needs the adequate promotion, and of course, we find an Amazon user account that is giving out 5-star reviews, all of them for products of the Navarra back-catalogue – if you read german, have a look, as the phrasing of the reviews is hilarious.

But still, those CDs remain “hand-crafted” products, as human speakers were necessary to make the recordings – a tedious and time-consuming process, which explains that no more than 10 items were produced at this moment.

If this business model doesn’t seem viable, have a look at Alphascript Publishing, a Mauritius-based company that is taking the game to a whole new level. Their website claims an anual output of “10,000 new titles” and describes Alphascript as “one of the leading publishing houses of academic research”. Those numbers are not a joke, since an Amazon book search as of today gives no less than 17,273 results.

Some background: Alphascript Publishing is a trademark of VDM Publishing House (Mauritius), which is an outgrowth of VDM Verlag Dr. Müller AG, based in Saarbrücken, and mainly known for producing print-on-demand copies of University theses. The website of VDM Publishing mentions that beyond the office opened in 2008 on Mauritius, the company has staff in Georgia and Hungary. Regarding the Mauritius-base, they state: “VDM has over 30 staff working in the Beau Bassin office producing more than 1000 new titles per month.”

Let’s now have a look at those titles:

some products by Alphascript

As we can see, those books, selling for high prices in the 50-120$ range, contain seemingly random collections of Wikipedia articles, bearing a random photography on the cover and the names of three “editors”. One might wonder if a human agent has participated in the selection of the covers, or if it was a fully automated software algorythm parsing a stock-photo database.

The most incredible example is certainly the book soberly titled “History of Georgia (country): Colchis, Mongol invasions of Georgia and Armenia, Timurs invasions of Georgia, Georgia within the Russian Empire, Democratic … of Georgia, Caucasian Iberia, Tao-Klarjeti”
that shows on the cover … a photo of Atlanta, in the American state Georgia.

I’m quoting here a gorgeous comment (from Amazon user M “CultOfStrawberry”):

Now, when a book has the wrong image for the cover, you know this is a big indication that something is wrong. It appears that no thought has gone in this book. All articles are copied off Wikipedia and not cleaned up/edited at all (plus, sometimes people vandalize Wikipedia entries in subtle ways so only someone looking closely/is knowledgeable of the subject would find mistakes) As such, this is not a good source for research and/or information. At this date, there are over 17,000 of these “books” published by the same three clowns (or ‘editors’). Actually, 4 if you count VDM Verlag.

Go to Wikipedia to get the same exact material in this book for FREE, plus you can see pictures of whatever Georgia you want – the state or the country. Too bad the person who selected the picture didn’t know the difference. Har har.

Surprisingly, some (very) human intervention is still occuring, as an Amazon user named “VDM Verlag Dr.Müller” has given out 5-star ratings to the Alphapress books. However, there are only 77 of them, merely a drop in the ocean.

This gives an interesting spectrum: on one hand with PediaPress we have the perfect example of a company that understands the benefits of Open Source and is serving both their profitability and the WP community. Navarra is acting in a grey zone, their marketing efforts and the aesthetic quality of the packaging being highly dubious, but their methodology is acceptable as Wikipedia itself is clearly credited and the pricing of the products is not outrageous. Where Alphascript/VDM crosses the line from grey zone to scam-operation is:

1) the pricing, that ranges from 40 to 120 USD per item.
2) the actual Wikipedia authors are not properly credited, the books being listed under the authorship of “John McBrewster, Frederic P. Miller, and Agnes F. Vandome” – names that reportedly do not appear in the edit history of the respective articles. Of course this is one of the issues when producing print copies of Wikipedia articles: how to include the hundreds of authors an article may have, other than in small print at the end? But putting some arbitrary names on the cover is certainly the wrong solution.
3) the absence of any editorial review, which we can deduce from the wrongly used cover pictures, if not from the sheer number of books involved.

I feel that there are some more surprises coming…

Update (22.03.10): some other blog posts on the same subject by Chris Rand and doink.ch

News: Projects

News: Tech & Code

News: Other