Wednesday, January 13, 2010

Data organization and record keeping, Concisely

I've now written several posts on data organization and record keeping, and I realized I still haven't managed to say what I wanted to say. What I really wanted to say is:

Wow, data organization and record keeping is really important for scientists. You'd think with something so important, we would all do it really well, but actually, I think most of us, including me, could do it a lot better. I think part of the reason we don't do it well is a lack of training that applies to the large scale, unstructured level of post-undergraduate projects.

Perhaps the lack of training actually reflects a lack of a consensus on what is good data organization and record keeping. Should we write daily records in our lab notebooks? Or is that overkill? Should data be stored by date or by project? Should we keep all notes, all protocols and results on the path to the final result, all stages of analysis? Or should we trash that stuff as we reach more advanced levels of the project? Should we print copies of the shittiest gels on earth? Or is it OK to leave that out if we think we know why it went wrong? Should we print out programming code that is hundreds of lines long? Is data organization and record keeping too field or project specific to reach a consensus on how to do it well?

So, that's a summary of what I wanted to say, and some of the previous posts expound on the stuff in the 2nd paragraph. There still may be more to come on this topic, but that's it for now, anyway.

1 comment:

  1. I'm planning to go back and read the preceding posts in more detail this weekend. But a thought on this point:

    Perhaps the lack of training actually reflects a lack of a consensus on what is good data organization and record keeping.

    I don't think the issue is a lack of consensus. I think the real issue (in academia) is a lack of establishing practices and consequences. The most fastidiously maintained records I've seen among colleagues and predecessors are those who spent a few to several years in industry prior to working at a research university. If you're working in industry, there are established guidelines for good laboratory practice (GLP). GLP, esp. data management, is do-or-die for companies because of project and personnel turnover and legal requirements for patent filing. I don't whether the lack of GLP guidelines and repercussions for not following them in academic labs is due to "out of sight, out of mind" or the "I have better things to do with my time" mindset.