Wednesday, August 4, 2010

Notes, organization, and reproducibility

How do you keep notes and records on your experimental methods, data, analysis, and journal articles? What do you keep on file, and where, and how do you organize it? As scientists, we all face the record-keeping problem, and I'm interested in how others deal with it. Over the course of grad school, I've developed my own system in fits and starts, and I'll describe what's working for me.

What to keep, what to keep? There's essentially one reason for keeping things on file: because you might need it later! I know, obvious. But I'd like to break it down a little further, to four reasons you might need it later: 1) it's a result, 2) it's information needed to reproduce a result, 3) it's information that may help with interpretation of a result, or 4) it's information that may lead to a future result. So, as for what I keep on file, I try to keep everything on file that falls into one of those four categories. That's a lot of stuff. How the heck to you organize this stuff so you can find it later when you need to?

How to organize? In the beginning of a project, starting a system of organization is difficult because you don't have a feeling yet for the number of different sub-projects, the scale of those projects, and all of the variables that will change along the way. But, there are categories that will be important when it comes time for publishing and ensuring that you have all the information needed to reproduce your work. For me, those categories are experimental methods, data, analysis, and references.

Where to keep it? I have 3 places. I have lab notebooks - the daily record of what I've done. I have a physical filing cabinet - keeper of everything from written notes and hand-drawings to printouts of journal articles, hardware specifications, and data. And I have a computer (and backup!) - keeper of essentially everything that's not handwritten and a few things that were scanned in. The computer can really get you in trouble with it's infinite space and the myriad bad ways of organizing and naming files. But what I've outlined below works for me, and perhaps could work for others.

References/Journal Articles
I like to keep lots of pdfs of journal articles, and print lots, but not all. I generally make file folders (both physical and on the computer) that represent some general topic and file lots of papers in each folder. For computer files, I like to name the files "(first author's last name)_(Journal)_(pub year)_(2-3 key words)". This system works...okay. I can easily find the papers again when I need to, and it's nice to have hard copies with notes and soft copies for easy access when I'm away from my filing cabinet. However, now that I'm writing, I've started using EndNote, which may cause me to completely overhaul my system. We'll see.

Experimental methods
Experimental methods has been one of the most troublesome categories for me to organize. Methods can be a very diverse category, particularly in biophysics where methods may range from biochemistry to instrumentation to automation. Developing the methods may be a major component of a research project, or even if it's not, methods may change over time. Figuring out how to deal with diverse, evolving methods has been challenging.

Eventually, I determined two keys to organizing my notes and files on experimental methods: subcategorizing, and organizing by date. Over time, I found I generally had three categories of methods: sample preparation, instrumentation, and data acquisition.

For sample preparation, I found it best to type up a protocol, date it, and paste it in my lab notebook on that date. Then I could refer back to that protocol by date and lab notebook page number each time I performed the protocol. As I modified the protocol, I would still refer back to the same protocol, but note the modifications. Eventually, enough modifications would be added that I'd type a new copy, date it, and paste it in my notebook. Eventually, the protocols were fairly settled and each day I prepared a sample, I could just refer back to that date and page. And it worked well for some items I made and stored. Typically we write the date on anything bio-y that we store, and when I used it, or ran out and needed to make more, I could just refer back to that date in my lab notebook. Convenient. Plus, if a labmate wanted to do something similar, they could also look it up and not need me to "remember" exactly how I'd done or made something.

For instrumentation, I had several subcategories that required various forms of organization. I had designs, part specifications, optimization and characterization techniques, calibrations techniques, automation software. I wasn't the best about organizing these things as I went, but hindsight being 20/20, I've learned what I should have done.

1) Keep an accordion of files and a notebook dedicated entirely to your instrument.
2) Design involves a lot of notes, hand-drawings, calculations, references to articles. Date these things and keep them in a folder.
3) Part specifications become impossible to find later. Keep them on file, hard and soft copies if at all possible. (Also, determining what parts are in an instrument is really hard after the fact. Keep records of what you put in your instrument!) 4) Optimization and characterization techniques evolve, and the instrument evolves with them. Write down and date what you do in a notebook dedicated to the instrument (or at least make notes in the instrument notebook referring to where it is written down).
5)Calibrations are very important. Keep good notes on them, by date, in that instrument notebook.
6)Automation software is still a toughy for me. Ideally, it would change rarely, you would carefully track changes by date, and you would keep old copies. Our automation software is all custom written, in a very large suite of software that all works together. I tried to keep notes on major changes, and I tried to keep old copies, but ultimately, I didn't do it that well.

With hindsight, I eventually found the above guidelines to be a good means of organizing instrumentation techniques.

Data acquisition, was generally a combination of the above sample preparation and instrumentation techniques. Generally I prepared a sample and used the instrument in it's present status with the current calibrations to acquire data. And I used some of the automation software I had written to acquire that data. I usually made notes to this affect in my daily record in my lab notebook, with any modifications or variables noted. Those variables are one of the places where things can get tricky, as you often don't even know what the important variables are until you start changing them. But nevertheless, the lab notebook was generally where I noted the details of data acquisition.

Data
Data was actually the simplest thing for me to organize. Our lab had already established a system of organizing computer files by date and time stamp, with a descriptive base file name and different file extensions representing different types of data. I followed that system and found it works pretty well, as long as you follow it. Keeping notes in my lab notebook about the individual files was also key to knowing which files were what. Also helpful: a description somewhere of what's in the different files, and how to load the files for analysis.

Analysis
Analysis was another doozy of an organizational challenge. My raw data files were all nicely organized by date and time stamp, but for analysis, it took awhile to decide how to organize. Did I organize by date, like the data? But then, what about when I wanted to analyze data from several dates together? Did I organize by type of data? By date I performed the analysis? By type of analysis? And how did I organize the analysis programming itself? And what, if anything, did I write about it in my notebook? Or print anything out? Analysis organization definitely presented it's own set of challenges.

Eventually I decided to organize the analysis by date of the data and purpose of the analysis. If I analyzed several days of data together, I just titled it by the multiple dates and grouped it in the applicable month or year. I generally didn't write notes in my lab notebook about analysis, though I wish I had. I often printed plots, and put them in a 3-ring binder organized by data acquisition date and type of data. I wish I had more notes about my data analysis, and eventually I discovered notes about the analysis itself were very important (see below).

Even more important than the top-level organization of the analysis files was putting notes and organization into the analysis files themselves. My analysis software allows data folders, so that I could arrange data within the file by date, timestamp, and type. I also could make notes within the analysis files. I found keeping notes in the form of "date of analysis, goal of analysis, methods of analysis, and results of analysis" were very useful to making the analysis useful to myself later.

A couple more key points
Following the general rules above, I established a pretty decent filing system for myself. I have a lot of files, hard copy in my filing cabinet, soft copy on my computer, and lots of lab notebooks. In keeping this stuff useful, I have a few more key points:
1)Back up your computer files! In at least 2 places, 3 places ideally.
2)Organize your hard copy files. Filing cabinet, tabbed files.
3)Index your lab notebooks. Our labnotebooks had pages at the front for indexing. I liked to list a major category (sample prep, instrumentation, data, analysis, notes), then the specific task of the day.
Also, I found that the more information that was available on computer, the better. Lugging around lab notebooks and file folders sucks. I like to have soft copies whenever possible. Short of scanning in your lab notebooks (or typing all the entries), I found it useful to make an Excel spread sheet, organized by date, that listed what I did on each day that I acquired data. On this spread sheet, I also added in all the important changes made in the instrumentation, sample prep, and data acquisition. I color-coded it in subcategories. It's made finding data sooooo much easier.

Some final notes on file organization for publication
As I write up my work, I've found it very helpful to make a folder for each paper, and within that folder, to have subfolders for each figure or significant result. Each figure or significant result has it's own 1)analysis file, 2)final figure file, 3)data caption file, and 4)data source file. The analysis file has all the relevant data, from raw to fully analyzed. The source file has where the data came from, how and when it was acquired, how it was analyzed, and any important quantifications. The final figure file is the .jpg, and the caption is a caption that would be appropriate for the figure. With this information, I have everything necessary to include the figure or significant result in the the paper.

And that's it. That's my system. What's your system?

No comments:

Post a Comment