Life is a Mystery

29 April 2009 . Comments Off on FFR: WolframAlpha

FFR: WolframAlpha

What if a free web site could answer queries like “france fish production” (number of metric tons produced, pounds per second, comparison to NYC trash rate) or “weather princeton, day when kurt godel died”? Would that change your world just a little or be “a stunt that could still end in disaster“? A task that would have been much easier with the semantic web is being carried out in its absence by Wolfram Research. The free WolframAlpha web site is expected next month. Keep an eye out for it. Meanwhile, Joho the Blog provides a liveblog with a few more examples of what WolframAlpha should be able to do. Wolfram explains what’s up at Harvard.

23 April 2009 . Comments Off on OCLC makes a move

OCLC makes a move

OCLC is laying down some big bets on the direction of library automation, and it appears to me that these bets may pay off. Library systems (those “integrated library systems” we buy from vendors like Ex Libris) have long been simultaneously too expensive for libraries and too complicated for the vendors to support. OCLC is now entering the market with a “cloud” service for libraries. Their bet is that libraries will accept a bit less uniqueness for a whole lot more interconntectedness:

“Visits to libraries, focus groups, and over a decade of engagement in the library automation world have convinced me that libraries require less complexity in their management systems,” said Andrew Pace, OCLC Executive Director for Networked Library Services. “To truly deliver network-level services—a platform-as-a-service solution—and not simply Internet-hosted solutions of current library services, new system architectures and workflows must be built that are engineered to support Web-scale transaction rates and Web-scale collaboration.”

I think this could work for OCLC. I think libraries are finding the old model unsustainable and are open to a new approach. But I think that it will be a true shame if OCLC does not build clear API’s to these “web scale” services so that libraries can extend them and reach into them from their own services. Putting services into the cloud can work, as long as the data you build there are accessible in all sorts of ways. Take the Flickr API as an example.

The troubling aspect of this is that OCLC has been much too ready to hold back other players on the data front, insisting that institutions cannot reuse and share the data they have created to further their interests and those of other collaborators around the world. Will they be just as closed on the services front? Will this new initiative help them open up on the data front? It is too early to tell, but well worth recalling some early warnings.

This new direction will take years to play out, but I wish OCLC well in the effort. It represents a significant shift in the library automation marketplace.


13 April 2009 . Comments Off on Citing sources

Citing sources

The MLA seems to have stirred the pot with it’s 7th edition of the MLA Handbook for Writers of Research Papers.

In the past, this handbook recommended including URLs of Web sources in works-cited-list entries. Inclusion of URLs has proved to have limited value, however, for they often change, can be specific to a subscriber or a session of use, and can be so long and complex that typing them into a browser is cumbersome and prone to transcription errors. Readers are now more likely to find resources on the Web by searching for titles and authors’ names than by typing URLs. You should include a URL as supplementary information only when the reader probably cannot locate the source without it or when your instructor requires it.

I agree with Maurice Crouse’s assessment of this:

It appears to me that the 7th edition of the MLA Handbook for Writers of Research Papers in § 5.6.1 comes very close to saying, “It’s out there somewhere; I found it; you probably can, too.” … Many of [their] points are well taken. But I would urge that you always give the URLs that you used to reach the cited material. Why not give your reader all the help you can? Why make him or her do a search for a source for every item in your paper? If the RL fails, then he or she can always resort to the searching that MLA recommends.

All in all, I am very impressed with Crouse’s recommendations in Citing Electronic Information in History Papers. If you are looking for some sensible advice, you might want to start there.


10 February 2009 . Comments Off on LT takes on authority control

LT takes on authority control

I love to see how LibraryThing approaches the task of cataloging. LT invites everyone to catalog, which rules out the use of priestly tools like AACR2 (3!) or the Library of Congress Subject Headings. One nifty feature of the professional library catalog has been authority control, which among other things provides the ability to distinguish one person from another, even if their names are the same. Last week LT started to offer an alternative for sorting out names.

LT calls this “distinct authors” and the concept centers on the fact that each author will have a universe of books they have authored, likely quite distinct from the universe of books authored by someone else of the same name. This clustering of books can be used to disambiguate the authors themselves. That could work! It is fuzzier than authority control, but that may matter little in our fuzzy tech enabled world. It also only deals with one of the problems in the authority control domain, but LT already has solutions to some others and I am convinced that over a long term the LT approach will prove more sustainable. We’ll see.

10 February 2009 . Comments Off on Herbert Van de Sompel

Herbert Van de Sompel

I just want to take a moment to acknowledge one of the giants in the field of library science today: Herbert. John MacColl posted a wonderful summary of a Herbert retrospective at the 9th International Bielefeld Conference last week.

Picture 1.png

Herbert is fearless, jumping into problems with abandon, always certain that he and his teams can make a contribution. Sometimes they succeed, sometimes not so much. But the failures are often as interesting as the successes, full of discoveries and insights.

His conclusion last week, after looking back at his work of the past decade: we do what we do in order to optimize the time of researchers.

That deserves a good ponder. Do new systems optimize the time of researchers? How does leveraging tools already out there in the infoecosystem balance with developing specialized tools to facilitate their research? Does this statement miss the need to facilitate collaboration as well as research? I love it when Herbert makes me think!

31 October 2008 . Comments Off on Googling PDFs

Googling PDFs

Google now lets you peek inside image-only PDFs, searching any text they happen to contain. This opens up a whole new class of documents to searching. For example, try this search: steady success in a volatile world. Check out the “view as HTML.” Download the PDF and try to select the text.

Now, imagine Google starts doing this to all image files. License plates? Business cards? Name tags? Sites like EverNote already offer this functionality. How long before it is part of search as well?

28 October 2008 . Comments Off on Book search business model

Book search business model

Today we begin to see the business model behind Google Book Search. Google announced a settlement in the lawsuit brought by the Authors Guild, the Association of American Publishers, some individual authors against Google Book Search. Amazingly enough, it not only leaves Google Book Search intact, but to my eye it seems to expand its offerings substantially. It almost appears that Google used the suit as an educational opportunity and convinced authors and publishers that the service Google could offer would be a win/win for all. Of course, they also paid $125M for the scans they made without permission (but that money goes toward setting up a Book Rights Registry which will try to determine who owns the copyright to out-of-print books so that they can be paid for any sales).


If this works, then the “snippets” will disappear from the out of print results; instead we will see full page results. Furthermore, for a (yet to be determined) price, we will be able to license access to the full books and put them on our Google “bookshelf.” That price is a key to the business model and the agreement, I’m sure. Suddenly authors and publishers have a way to “monetize” the “long tail” of the out of print catalog. That’s pretty revolutionary.

Now the urgency of Google’s effort to scan every work in some major libraries begins to make sense. With the competing Microsoft-led effort already hitting the skids it looks like Google will have some time to polish this model before the competition gets tough.

Of course, this agreement still has to be ratified by the court, so it may not be the shape of what is to come. Keep an eye on this space.

UPDATE: Harvard University Libraries opt out of the deal for many interesting reasons.

13 October 2008 . Comments Off on HathiTrust


Ever since agreeing to participate in the Google book scan project, John Wilkin and the University of Michigan have been looking for a way to provide access to the vast digital resources they get from the project. They have today announced the HathiTrust, which includes partners from across the country (including Minnesota). This is a great site and a great start on a tough problem: can we, together, maintain free and open access to materials from our library collections? A nice story at ArsTechnica too.

9 October 2008 . Comments Off on Persistent legislation

Persistent legislation

The Library of Congress adopts handles to provide persistent links to federal legislation. As FGI states, though,

Well, it is certainly nice to be able to link to legislation with a persistent link! But it would be much better if one could click to create a link rather than following a 600 word description of how to link on another page.

That’s in the nature of handles and other persistent link via redirect schemes. They are a step in the right direction, anyway. (hat tip to Slashdot)

27 September 2008 . Comments Off on Endnote (Thompson) sues Zotero

Endnote (Thompson) sues Zotero

Thompson owns Endnote and has decided to sue the developers of Zotero (George Mason University) for (they claim) violating the EULA (end user license agreement) for Endnote by reverse engineering the Endnote style file format (.ens). This is fascinating on so many levels. (1) Thompson really thinks the way to build a customer base for reference manager software is to sue an academically produced, open source, Firefox plugin? (2) The case seems awfully weak given that the Zotero team has shipped nothing at all derived from .ens files. (3) Just how enforceable will EULAs (those contract terms attached to ripping open a software box or clicking “I read it” on a computer program) turn out to be?

I hope Thompson rots for this kind of behavior. Between this and suing to prevent others from citing law based on the page numbers they add to legal proceedings, I have pretty much decided they are on the wrong side of the IP issues I care about.

Eric Celeste / Saint Paul, Minnesota / 651.323.2009 /