Tuesday, January 26, 2010

Start-up package and salary negotations - what the seminars have taught me

(Comment by me in response to post at FSP)

I've seen several seminars about negotiating start-up packages, and they have all advised candidates to do the money negotiations once the offer is made.

I've also talked to several post docs on the job market, and know that interviewers usually want some start-up estimate from the candidate before making an offer, which makes sense. If the university can't offer the kind of money necessary to do the research, the university and the candidate are not a good match.

The seminars I've seen also advise the candidate to *definitely* negotiate salary. I always thought it would be much easier (and much more admired) to negotiate for all the start-up package stuff, other than the salary. All the other stuff is all about getting great science done. The salary seems just selfish. But, it was explained to me this way: 1) the university *expects* you to negotiate the salary, so the offer is lower than they are prepared to offer, and they're surprised when you don't negotiate, and 2) negotiating to a salary that is fair and comparable makes the candidate/employee feel valued, loyal to the organization, and allows the candidate/employee to not waste time/energy worrying about salary.

Salary negotiations - a link

This salary negotiation tactic sounds like it might actually work. Anybody ever tried it?

(From an anonymous comment on a post over at FSP.)

Monday, January 25, 2010

Physics Subject GRE, Part 2

So, what do the Physics Subject GRE scores mean? I would love to see a good study evaluating correlations with PhD completion and assorted other measures of academic success. If anyone knows of one, please leave a comment. In the meantime, I wanted to brainstorm some scenarios that would result in high or low scores.

Qualities required to score highly on the Physics Subject GRE
1. The tester has encountered all or almost all of the material on the exam.
2. The tester has excellent recall of equations.
(I'd say successfully completing the test had less to do with recalling basic concepts and basic equations, and more to do with remembering specific equations that exactly equate the variables given.)
3. The tester is pretty darn quick.
(The test is 100 questions in 170 minutes.)
4. The tester can solve word problems.
5. The tester can afford to take the test.
($140-160, plus fees for score distribution)
6. The tester isn't suffering some disabling condition while taking test.

Scenarios that yield a lower score on the Physics Subject GRE
Basically, the opposite of any of the items listed that would yield a high score.

So, let's evaluate some of the list items, considering how the items might predict future grad school success, and how the items might be artificially skewed. Obviously, items 1-4 should be highly correlated with grad school success. I certainly would predict a grad student who has encountered all relevant material, has excellent recall, is quick, and can problem solve would have 4 qualities that would help with successful PhD completion. Are these 4 qualities enough by themselves to predict success? Absolutely not. But, they all help.

Conversely, would missing one or more of these qualities (1-4) ensure failure? I'd say not having encountered the relevant material (1) or not being able to problem solve (4) would put a person at significant disadvantage. To me, the excellent equation recall (2) and the quickness (3) required for this exam are disproportionate to how much those qualities affect PhD course work and research. Basic concepts matter so much more (and knowing how/where to look up specifics), and the ability to think through a long, multi-step problem matters so much more. Being able to do each step quickly and accurately without having to look anything up certainly would help, though. And maybe one would argue that excellent recall without understanding basic concepts would result in bad test scores anyway, so maybe the test does test depth of understanding more than I'm giving credit.

One might think items 5 and 6 don't matter, and maybe they don't matter much. But the test is expensive, even for a college student in the US. The expense is a major deterrent to taking it more than once, for sure, as are scheduling issues. Since the test is covering what's being covered in classes, people try to take it as late as possible in hopes that they will have actually covered the relevant material in class by then. The expense and the scheduling issues mean many people have one shot, so I'm sure a number of people take it even if they're sick as a dog or their Grandpa just died or they have a concussion or whatever. I'm sure this number is statistically insignificant, but I think grad committees would do well to remember this possibility and take a score with a grain of salt if it looks like an outlier compared to other application components.

Finally, what could artificially skew items 1-4 for a person or a segment of the population? Item 1 is the easiest for me to rip on. A good portion of the material covered on the exam is not covered until the senior year at many universities in the US. The exam must be taken near the beginning of the senior year for the applicant that wants to go straight from undergrad to grad, which is the vast majority of students. Meaning students in the US are getting killed on one of the most important requirements for getting a good score on the exam; we've never even encountered a lot of the material we're being tested on. (Instead we've been taking the required English and history and psychology credits and so forth, but I digress...) But these same students would likely score much higher by the time they actually enter grad school, which is when grad schools actually want them to have covered the material. Luckily, many graduate committees recognize this issue and evaluate domestic and international students separately for GRE scoring.

Another example of a subset of population that gets killed by a skew in item 1: students from institutes with a small physics department. Why? I'm sure there are multiple reasons a small department would have some disadvantages, but in several cases I know, a big disadvantage comes from the frequency of basic class offerings. In a good, but small department, students may take all the relevant courses, but they may have to take the courses very much out of the usual order. For example, the student may have to take their first real Mechanics and E&M (beyond intro level) as a senior. Meaning the student would take the exam without having any mechanics or E&M other than intro level. Mechanics and E&M together make up 38% of the exam. That student is screwed.

Ok, I'm really almost done here. I have a couple other skews to discuss.

One skew people talk about a lot is the test taking traditions in various countries. In the US, most students take tests in their physics classes that take 1-3 hours to solve a few (2-6?) problems. The physics GRE is *100* problems in less than 3 hours. This shift in test taking style is enormous, and I can't imagine that it wouldn't artificially affect scores. I've been told that in Asia and India, the test-taking culture is more similar to the GRE, and that continuity helps when taking the GRE.

The final skew I'd like to discuss: studying. Normally, I'd say studying is not an artificial skew, and that a person that studies is a person willing to work hard is a person who is likely to succeed. But in the case of the GRE, I feel like the studying correlation gets messed up. Why? From what I gather, studying for the GRE breaks into 3-4 categories: 1) studying 1 week or less, 2) studying a few months in the evenings while working or taking classes, generally on your own, 3) taking test prep classes, 4) taking months off of work/school to study, very systematically, probably with lots of commercial test aids.

For students in category 1, maybe they don't study and score badly because they are overly-confidant, or lazy, or don't care, or think it doesn't matter much. If they are any of the first 3, it's a good indication they won't perform well in grad school. If it's the 4th reason, possibly having been told this information by professors they trust, then damn, that sucks for them.

For students in category 2 that still score poorly, maybe they don't have the money or resources to take a test prep class, or one is not available in their area. Or they suck at studying or are lacking in a key quality. First few reasons, it's hard to say it would correlate with poor grad school performance. Last 2 reasons, they'd probably suck at grad school.

For category 3 students, I think they learn a lot of good tricks. But they have to be able to afford those classes, and those classes have to be available in their area.

For category 4 students, I've mostly heard that this type of studying is something some international students do because they know/think they have no chance without practically perfect GREs. So I certainly think high scores achieved this way indicate a large amount of drive, which is a huge must in a PhD program. I don't know how much the scores still depend on innate ability at this level of studying, so I don't know how much correlation is left with other abilities. If anyone else knows, pipe up.

So, in conclusion? Excellent scores on the physics GRE's indicate some very desirable qualities for grad school success. The testable qualities, by themselves, don't ensure grad school success, but would certainly help. Some of the testable qualities can be skewed for individuals or segments of the population, so be careful when interpreting results, and know that a poor score definitely does not perfectly correlate with poor performance in grad school.

Physics Subject GRE

The physics GRE...what a can of worms. At some point I found a number of references supporting the view that the physics GRE is, at best, weakly positively correlated with graduate school success, but I'm not feeling motivated to find them again. Anecdotally, I know a number of people who scored well (>800, 73rd percentile) on the physics GRE and successfully completed a PhD. I also know a number of people who scored extremely dismally on the physics GRE (<650, below the 40th percentile) and also successfully completed a PhD.

Were there significant correlations in the quality of the PhD? Let's say the number of people whose scores I know is 8. 2 very high, 1 medium, 5 dismal. (Note: These people are PhD students at top 25 universities. People with very low physics GRE scores are still accepted to some very top tier programs.) Of the 3 people who completed their PhD's fastest (in our class of 35), 1 was a high scorer, 2 were dismal scorers. One of the low scorers had very high impact publications and was highly encouraged to pursue the academic route. The other low scorer is also pursuing an academic route, currently in the post doc phase. The high scorer attempted an academic post doc and quit within 6 months to go into industry. The other high scorer, the medium scorer and 2 of the other low scorers completed their PhD's. The medium scorer went into industry. The high and 2 low scorers went the academic route. I'll break it down into more of a table form.

High scorer 1: Fast, medium impact PhD. Academic postdoc, quit for industry in 6 mo.
High scorer 2: Longish PhD. Academic postdoc
Medium scorer: Longish PhD. Industry
Low scorer 1: Fast, high impact PhD. Encouraged to go to academics, went to industry
Low scorer 2: Fast PhD. Academic post doc
Low scorer 3: Medium length PhD. Academic post doc
Low scorer 4: Longish, medium impact PhD. Academic post doc at Harvard
Low scorer 5: Still finishing PhD

(Note, I don't have information about the "impact" of most of these PhDs.)

Conclusions from that data? Umm, too low N. But certainly not a perfect correlation between high scores and high impact, quick PhD's followed by high impact academic careers. And the most encouraging thing from the data is to see that excellent institutions recognize the lack of correlation enough to still accept people with really terrible scores.

I've also read some statistics about the physics GRE, but again, I apologize for not looking up the references again. Women score an average of *100 points* lower than men. And international students make up an extremely disproportionate section of the top 50% (disproportionate to numbers taking the exam). Put another way, US students make up a disproportionate section of the lower 50th percentile.

So, what does the physics GRE indicate? Stay tuned...

GRE's

Lots of people seem to be weighing in on GRE's lately, including two of my favorite bloggers, FSP and Janus Professor. I'll add my perspective to those of the bloggers and commenters out there.

The qualifications that make my opinion worthwhile: I took the general and physics subject GRE's in 2001; I'm a PhD candidate with a reasonable amount of information about what it takes to complete a PhD; I served on a committee to admit PhD students to our university a few years ago.

Links where I will pull some information about scores and percentiles:
http://www.ets.org/Media/Tests/GRE/pdf/994994.pdf
http://www.ets.org/Media/Tests/GRE/pdf/gre_0910_guide.pdf


Quantitative Section
The quantitative section is all about doing 9th grade level math relatively quickly. Many (most?) of my physics PhD classmates received a perfect score of 800 (94th percentile). I did not, because I ran out of time before the last ~2 problems. My <800 score really does indicate that I'm a little slower at basic math than most of my physics peers. Not that I don't know it just as well. In many cases, I find I know it better. But I find it really, really hard to force myself to do it faster. I'm addicted to the double-checking.

Anyway, in most circles (including on the graduate admission committee at my university) an almost but not quite perfect score doesn't alert any red flags, but I have occasionally heard people express that no one in a quantitative field should be allowed to continue without that perfect score. Those people are wrong. I don't really understand why the quantitative section is so focused on doing such low level math so quickly. For testing quantitative reasoning, it seems more useful to throw some more complex problems into the mix (not necessarily even requiring higher level math, just higher level reasoning). If anyone knows of a justification for sticking with the quick, low level math problems, I'd love to hear it.

Verbal Section
The verbal section tests verbal skills via analogies, antonyms, synonyms, and reading comprehension. The hardest thing about this section (from my experience and according to everyone I've talked with) is the vocabulary. The vocab is really high level, esoteric, and a lot of words I had never encountered. "Esoteric" is a GRE word, by the way. I studied vocab for this section for an evening, and it helped me a great deal on the exam. As a graduate student, I've since encountered a few of the vocab words from the GRE. I suppose that it is helpful to already know the meaning of these words when they are encountered, and saves a little time to not have to look them up. I don't think it's wise to use most of these words in writing, because the vast majority of people will not know what the word means, so the words will detract from the clarity of your writing. But I guess one could make that argument for all words above the high school level? Anyway, personally, I feel that using such high level vocabulary takes away from the point of the test. The point of the test is to give graduate admissions committees an idea about your verbal reasoning skills. Yes, vocabulary is a part of that skill, but vocabulary is the part that can be most enormously improved by a short-term cramming session. Do they actually want to test our verbal reasoning skills? Or some combination of verbal reasoning, plus a test of whether we were willing to cram vocab in our brains for a short time beforehand?

As for our physics graduate admissions committee, verbal weighed in at about 1/2 the weight of the quantitative score.

Analytical Section
I know the relevant section these days is the analytical writing section. From talking with people and from my days on the graduate selection committee, this section requires writing a couple essays. The analytical writing scores are...bizarre. I saw many applications with very good (even excellent) writing in the personal statements and extraordinarily dismal (<4) analytical writing scores. People who I know to be good writers (because I've read several things they've written) have told me they received low scores.

On the graduate admission committee, we ignored the analytical writing scores completely.

As an aside, I have to say I looooooooved the old analytical section, which had logic problems. I would do that shit for fun. And I scored in the highest possible percentile on it, just to toot the old horn a little. Did it indicate anything? Again, I think most of my fellow physics PhD peers did really well on it, but I feel too biased to really make a strong statement, and it doesn't matter anyway since it no longer exists.

I'll have to make a separate post about the physics subject GRE...

Apologies

A comment, in response to a post by FrauTech at Design.Build.Play.

It's lessons like the Gibbs rule as expressed by William the Coroner ("Never apologize, it is a sign of weakness") that make me fear that my job is making me a worse person. I'm not saying William is wrong. I'm actually saying there's a huge ring of truth to that statement. But I don't want to be the kind of person that thinks like that.
I've spent a fair amount of time getting discouraged and being afraid that the only way to get ahead is by becoming an overly aggressive, ethically questionable person. Luckily, I found a really great role model before I abandoned all hope. This role model is a very successful, full professor who is kind, honest, and ethical. If he doesn't know something, he doesn't make something up and say it with authority. He says he doesn't know. But he says it with confidence and authority, not apologetically or with sheepishness.

What I learned from watching this role model is not that I shouldn't apologize, but that I shouldn't feel bad and blame myself for so much stuff. And if I'm not to blame, then I don't need to apologize.

Thursday, January 21, 2010

I think I'm stuck in an infinite loop

It looks like this:

make it better <--------|
|------->------->----->---|

Flow

In my last entry, I talked about a book called Drive that's all about motivation. One of the interesting concepts in the book is "flow," when a person is focused, undistracted, performing a task, and enjoying it. The book encourages people to find what maximizes their individual flow. So, here's some musings on it for me.

Times when I'm feeling good flow
-Reading, watching presentations-taking in and processing information when the information is relevant, and presented in bite-sized chunks
-Writing - outlining, filling in details, using it to organize my thoughts
-Discussing ideas and arguing it out in a safe, friendly forum
-Brainstorming ideas
-Searching for information on a well-defined, topic for a relevant purpose
-Designing a project, experiment, apparatus
-Performing a short set of routine tasks with a clear benefit/purpose
e.g. the 2-3 hour protocol for preparing my experimental samples
-Troubleshooting, problem-solving with purpose, freedom, for a limited time, and with good feedback on what is or isn't working
-Developing protocols - with purpose, for credit (even if it's exchanged credit with a labmate)
-Teaching - teaching someone interested and invested who does their part
-Taking data - when the data rolling in is interpretable
-Analyzing data - cranking out statistically relevant qualitative or quantitative conclusions
-Interpreting data - drawing conclusions from the data and analysis and incorporating the literature
-Running, walking, hiking, physical activity, especially outdoors
-Performing any task with a clear and definite purpose that allows me to use my mind, body, and talents to perform and complete said task, but without an unreasonable number of barriers

Flow killers
-Reading, watching presentations, when the information is irrelevant, overwhelmingly new, or overwhelmingly difficult to understand and evaluate
-Writing - when I lose sight of the goal
-Feeling ill, hungry, in pain, very tired
-Discussing anything in an aggressive, judging, antagonistic forum
-Brainstorming ideas and being quickly judged on them
-Searching for information on an ill-defined topic for a purpose not relevant to my own goals, for which I will receive no credit
-Searching for information and finding nothing or too much
-Searching for information and running into roadblocks like inaccessible journals or books, or possibly semi-relevant information that requires going to the library to access
-Designing a project, experiment, apparatus knowing that it's unlikely that I will be allowed to follow through and will therefore receive no credit for it, or designing it even though I think it's unlikely to work or yield publishable results but have been forced to design it by someone else
-Performing a long set of highly repetitive, tedious tasks, especially without a clear benefit
e.g. preparing samples for someone else who won't give me any credit
-Troubleshooting, problem-solving with no purpose, or under strict instruction, or for a long, long time when it's only a small, tiny part of the bigger purpose and/or with poor feedback on what is or isn't working
-Developing protocols - without purpose, or without credit, without well-defined goals, without good measures of what works or doesn't, involving long, repetitive processes with no idea what works or not and with little chance that a full day of work will yield interpretable results
-Teaching someone who doesn't care, pay attention, or do their part (e.g. repeatedly asks about stuff you already told them or that they should know how to look up)
-Taking data - when data is uninterpretable
-Analyzing data - if it's too repetitive, if it begins to seem unlikely that results will be uninterpretable in a qualitative or quantitative, statistically relevant sense
-Interpreting data - when I'm afraid it says nothing interesting to anyone or that ties into the literature
-Performing any task with an unclear purpose, or that is such a tiny part of a goal that it's not clear that a better/easier/faster way of meeting that goal may exist
-Encountering many time and energy consuming obstacles such that it begins to look like time and energy would be better spent pursuing a different, more accessible goal
-Performing tasks that would be somewhat helpful, but the tradeoff of effort to help fullness starts to look unbalanced
-Trying to complete projects that would be easy if supported by others but that become hard without that support
-Trying to complete an unsupported project and reaching the point where support is needed but being afraid to ask for it because the non-supporter is not supportive and is scary
-Feeling like any goal is hopeless

Drive

I'm reading a pretty interesting book about motivation right now called Drive by Daniel H. Pink. So far, I've really only skimmed through the highlights, but essentially, the book explains how and why using the classic "stick and carrot" method of penalties and rewards doesn't work well as motivation for most modern endeavors. The stick and the carrot might work pretty well for strictly physical tasks completed while being constantly monitored, but for any work that requires mental investment or work that needs to be completed without being watched the whole time, the stick and carrot method actually becomes very demotivational. This "bad motivation" idea really rings true to me, and Pink even backs it up with citations of scientific studies.

Some of the studies include experiments on children, experiments in workplaces, and experiments in societies. With children, researchers tried giving a group of children rewards for drawing. Pretty quickly, the children lost interest in drawing for fun, and would only half-ass draw for rewards. In workplaces, a daycare tried instituting a fine for parents who pick up their children late. After instituting the fine, parents were actually *more* likely to be late; the reasons were hypothesized to be that the parents no longer felt the social pressure to be on time and now thought of it as a business transaction where they paid more money for their children to be looked-after for longer. In society, Great Britain stopped paying people to donate blood, and actually found more people donated blood. These studies all showed instances where a penalty and reward system actually demotivated the desired behavior.

The book goes on to describe some of the bad effects of the stick and carrot motivational method. It describes how the reward and penalty method encourages people to only perform a task for their reward. They stop performing for their own enjoyment, for their own betterment (other than the reward), for the betterment of society or the organization. People feel like they are one entity and society and the organization and the provider of the reward are all the "other." People don't think about goals as "our goals" but as the reward-giver's goals. People begin to lie, cheat, and steal (or at least cut corners) in order to maximize their reward and minimize their penalties. People no longer enjoy their work, and never perform above and beyond unless their is a clear reward for doing so. Rewards and penalties really don't seem to motivate beyond a minimal point.

Even better than describing what's bad about the penalty and reward motivational method, the book also attempts to describe better motivational methods. I'm still reading this part, but so far the big concept seems to be that people are naturally driven to perform and complete tasks when the tasks have a clear purpose that a person cares about. People perform best when given more control to optimize the 4 T's: Time, Task, Technique, and Team. Another big concept in the good motivation section is "flow." Flow appears to be the state of being when people are focused on a task, undistracted, and therefore maximizing their performance. Oh, and also when in flow, a person is *enjoying* their focused task-performance. Flow is something that is maximized by optimizing the 4 T's above. For the best motivation, a person or an organization needs to capitalize on the natural drive of the individual.

So, in summary, stick and carrot--mostly bad. Capitalizing on the individual's drive to complete purposeful tasks--good. Finding flow: focused, undistracted, and enjoyable optimal performance--essential.

This book should be required reading for PI's. Or, at least for my PI.

Wednesday, January 13, 2010

Data organization and record keeping, Concisely

I've now written several posts on data organization and record keeping, and I realized I still haven't managed to say what I wanted to say. What I really wanted to say is:

Wow, data organization and record keeping is really important for scientists. You'd think with something so important, we would all do it really well, but actually, I think most of us, including me, could do it a lot better. I think part of the reason we don't do it well is a lack of training that applies to the large scale, unstructured level of post-undergraduate projects.

Perhaps the lack of training actually reflects a lack of a consensus on what is good data organization and record keeping. Should we write daily records in our lab notebooks? Or is that overkill? Should data be stored by date or by project? Should we keep all notes, all protocols and results on the path to the final result, all stages of analysis? Or should we trash that stuff as we reach more advanced levels of the project? Should we print copies of the shittiest gels on earth? Or is it OK to leave that out if we think we know why it went wrong? Should we print out programming code that is hundreds of lines long? Is data organization and record keeping too field or project specific to reach a consensus on how to do it well?

So, that's a summary of what I wanted to say, and some of the previous posts expound on the stuff in the 2nd paragraph. There still may be more to come on this topic, but that's it for now, anyway.

Data organization and record keeping, Part 3

On the previous post, I stated that one of the problems with data organization and record keeping training is that the training typically occurs on the scale of undergrad projects. Then we encounter a major discontinuity in the scale of projects between undergrad and grad work, but typically don't have extra training at the graduate level.

So what do I mean by the scale of projects? Well, in undergrad lab classes, I generally had a lab (= set of experiments) to complete once per week for the duration of the semester. Each lab was designed to take a few hours. Most of the labs were minimally related to each other, if related at all. And of course, like most pre-graduate work, labs were usually handed to us as a package deal: here's a lab related to this discrete set of information, here's the equipment and materials, here's the procedure, and here's some results and conclusions you should be able to draw. Writing that up was pretty easy-peasy.

Of course, I did encounter projects of larger scale. Notably, science fair projects from elementary, middle, and high school were a more relevant scale than my college-level labs. And I did undergraduate research. It was undergrad research where I learned that even good scientists sometimes discover parameters that are affecting their results after-the-fact. But still, none of these experiences really prepared me for the scale of my graduate level project.

Now let's think about the scale of graduate level work. Graduate level work is almost the opposite of the pre-packaged labs we're given in undergrad. We're given an unsolved problem to solve. It could take anywhere from a few hours to a few years. It may require knowledge from 1 discipline or 10. We likely design, build, and troubleshoot the methods and/or equipment. In fact, our problem may not even be a solvable problem with the current levels of knowledge and technology, and we may have to or want to change the direction of the work as we go. Now writing that up? Not so easy-peasy.

Just think about the numbers for graduate-level work. If a Ph.D. student might take an average of 6 years to graduate, and does relevant work 50 weeks per year, 5 days per week, that's 6 years * 250 days/year = 1500 working days. A daily record is easily more than 1 page per day, so that's more than 1500 pages of daily records. My lab notebooks are about 150 pages each, so I should have more than 1500 pages of records, more than 10 lab notebooks.

So fine, for a graduate career, with a daily record we end up with more than 1500 pages of notes. That's not really a big deal. It's an indexing challenge for sure, and one that should be discussed. But beyond the indexing challenge, what I've seen is that most people don't keep daily records and don't end up with nearly this amount of notes. Why not?

And when it comes to checking up on trainees data organization and record-keeping skills, generally by the time a PI would notice a problem, a lot of time would already be wasted. Most PI's at best have weekly discussions with trainees, so if a trainee seems on top of things week to week, everything seems fine. But what happens when it's time to write up a paper or make a talk that includes some older results? Or what happens when someone is supposed to return to some old results? This is the time when poor organization and record-keeping comes to light. But then it's too late. The damage is done and the time will have to be taken to sort through poorly organized notes, or redo large portions of poorly documented work.

As a sort-of aside, I have heard stories about PI's checking up on people's lab notebooks, usually in response to the PI's doubts that the trainee has been doing anything for a long period of time. This method of checking hardly seems fair, because typically the PI hasn't told the trainees that they should keep up with their daily progress in their lab notebook. While it may seem obvious to keep a daily record, many trainees tend to "save" their lab notebook for more final experiments, notes and results. The day-to-day drivel of notes and troubleshooting and uninterpretable results are often not written up at all, or, as biochem belle mentions, jotted down on paper towels and gloves and post-it notes, which often get trashed somewhere along the way.

So, some problems with data organization and record keeping are established. Training is inadequate to really cover the change in scale between undergrad and grad level projects, and no one really checks up on how trainees are doing until it's too late and mucho amounts of time are wasted.

Data organization and record keeping, Part 2

OK, so the previous post firmly established the importance of good data organization and record keeping. But why discuss it further? If everyone did it well, and/or if it were obvious how to do it well, then this would be the end of the discussion. But, it's not obvious how to do it well, and everyone doesn't do it well. And more personally, I think I could learn to do it better.

Since data organization and record keeping are clearly such an integral part of doing good science, one would expect an emphasis on training scientists to do it well, and ensuring that trainees are doing it well. In my experience, most scientist do get some training on the subjects, though usually not at the post-graduate level. And as for anyone checking up on my record keeping, that has never explicitly occurred, though perhaps it would if I obviously struggled with finding older results when they come up in discussions.

As for typical training on data organization and record keeping, I think most of it occurs before the graduate level. My first encounter with scientific record-keeping training came from elementary school and the first time I did a science fair project. That's when I learned about the "scientific process" which is essentially the stuff that's supposed to go in a lab notebook entry. The goal or hypothesis, the methods, the results acquired, and the conclusions and future directions. From there, it's been more about practicing how to do that efficiently, accurately, and appropriately for the experiment at hand.

Here's where I think the problem occurs: there is a major discontinuity between the scale of pre-graduate projects that give data organization and record keeping training and the scale of graduate level projects. It's like jumping from a big-wheel to a fancy, multi-geared road bike, with no tricycle and no training-wheels sessions in-between. Sure, you can do it, but there's bound to be a lot more scraped knees in the process. (See below for illustrative images.)

Anyone can ride on a Big Wheel


On a big-kid bike, undesired results may occur with inadequate training



What does it mean to have a discontinuity in scale between training and graduate level work, and why is it a problem? Stay tuned for more posts.

Data organization and record keeping

I've been thinking about writing a post about data organization for awhile now, and I'm finally getting around to it, in part due to an excellent post on the topic by biochem belle at "There and (hopefully) back again..."

Biochem belle does a fantastic job listing what essential information should go into your lab notebook, including references to raw data files. She discusses backing up data files, and she also muses on the PI's role in ensuring good record keeping and data organization. It's an informative post that provokes some thought on the topic.

Other than Biochem belle's post, I've been thinking about data organization lately for two reasons. One obvious reason is that, as working scientists, we deal with it on a daily basis. Every day, we have to decide what to write down or type up, what to save, print, or delete, and how to index all of it to be easily found and accessed later. And every day, we likely deal with trying to find and/or reference something we have previously done, or something someone else has done. Having a working system for data organization and record keeping is essential for day-to-day scientific work.

The second reason I've been thinking about data organization is because I recently got ahold of a copy of a form my PI uses to evaluate post docs. The form is from a national research fellowship award application. It has a list of 11 proficiencies and requests the PI rank the post doc on each proficiency. Data organization is 1 of these 11 proficiencies. Apparently, PI's and fellowship award committees recognize the importance of data organization.

So data organization and record keeping are obviously important. Important for doing the day-to-day scientific work, and recognized as important by PI's and fellowship award committees. Stay tuned for posts to follow that discuss some problems with data organization and record keeping and some potential solutions.