Serendipitous Lessons in Data Management

In a previous blog post I mentioned a document I’ve been spending a lot of time with, a technical manual for collecting and processing plankton samples, that was sent to me by our Russian colleagues. I was translating the manual searching for crucial information Derek needed to do some particular analyses. Unfortunately my Russian skills are not so advanced that I can easily skim a text and get the gist of it so I needed to translate the entire thing, and along the way I realized that I was engaged in a task that had much greater meaning that what I initially set out to do.

Part of my job as the information manager for this project is to ensure that anyone who might be interested in our data can access and understand them. This means keeping the data well organized, storing data in non-proprietary formats, and keeping exhaustive metadata (among other things). Effective data management is critical in any large collaborative project, and particularly one that has such a long history.

So, I started thinking of my work on this sampling protocol as an aspect of data management. The document was a wealth of information. I had received the document as a zipped folder of a bunch of low-resolution jpeg scans of the original document, and in the process of translating the whole thing I’d also first transcribed the Russian version. So in the end in addition to creating a version of the text that non Russian speakers can read I also produced two documents that are much more convenient to deal with than the original images (by having searchable text, for example). Our Russian colleagues will have both an English-language and a Russian-language version that they can use internally and share with new colleagues in the future. I have really enjoyed knowing that this work – so critical to us right now on the Dimensions project – also will have positive benefits for our Russian colleagues for years to come.

