17 July 2004

Spotlight on Selectivity

I participated in an advisory board meeting of the Documenting Internet2 project this week. As we considered appraisal strategies for collecting electronic documents and records from I2, I wondered whether appraisal would shift into a retrospective task when dealing with the electronic record of organizations. Will it be easier to collect “everything” (or whatever can be easily acquired, anyway) and then become selective later by mining that trove for the important bits. Estimates at this meeting suggested that at least 95% of “everything” is not valuable to researchers, and appraisal has been the traditional tool to ferret out the golden 5% (or even 1% in many cases). In the electronic realm, though, could it be a wiser use of human capital to collect the 100% and then mine out the 5% as needed?

One dash of cold water on this approach has been the dearth of data mining tools. However, the rise of litigation support software may be one place to hunt for useful models. The U is also home to a strong data mining research group in the DTC. Perhaps we could work with them to develop research tools for future archives?

Finally, we are beginning to see this approach emerge on the personal computer desktop. Last week Steve Jobs announced that the next generation of Mac OS X (10.4 or Tiger) will incorporate a technology Apple calls Spotlight. Spotlight will be a very fast search engine for the Mac OS. I wonder if, as search gets fast and easy enough, it replace organization? We all know how difficult it is to create a good filing system and stick to it, even on a computer. As search improves, will we just give up on organization and instead rely on searching to pull together the documents we need as we need them?

