I have recently returned from a very engaging visit with my collaborator in San Diego. Through my PhD we’ve had a good run at investigating EAE (a mouse proxy for multiple sclerosis, which he investigates in the lab) through modelling and simulation. Alas, PhD is nearing its 3 year deadline, and so we are looking into alternative funding opportunities to continue the work. We talked a lot about calibration of the simulation, and as such I’m currently performing a literature survey of modelling/simulation based biological research. Simulation offers a great deal of flexibility to the researcher, in computer code it is very easy to turn on and off molecule expressions that might be quite difficult to engineer into, say, a mouse. But, this raises an interesting question: with all this power to represent whatever might take one’s fancy, how can you be sure that your simulation is in any way representative of the real world? How do you demonstrate that it is valid? That’s a tough question to answer (…with scientific rigour and evidence – “I’m pretty sure, I built it after all” is not going to stand up to scientific inquisition!).
In my review, I came across an article published in Science [1]. Looking at how models and simulations can aid in predicting the progression of influenza epidemics, and in preparing for such an event, the author recognizes this question of model validity. A ‘validation by prediction’ approach is advocated: generate predictions with your model, and then test these in the real domain. If prediction matches what’s observed in the lab, then you can have (I feel, a little) more confidence that your model is faithful.
I’m not as sold by this approach, for one thing, prediction is the end purpose for many simulations – it should not simultaneously be the research output, and also the validation of the model that generated it. “Here’s my prediction, but we need you to run some tests before we can be sure that the prediction was correct”. Well, since you’re running the experiment anyway, why bother with the simulation? A slightly aggressive view, I know. I agree that continually validating a model against the real world, as more data becomes available, is sensible, but I feel that this should not be the pinnacle of validation efforts. I would have a hard time presenting some predictions to my collaborator and saying “You need to kill some mice now, so that we can be sure that this simulation is correct”. He might respond “Well, mice aren’t cheap and ethically we don’t like to kill more than absolutely necessary. Why didn’t you calibrate the simulation as you built it, against the data we already have? Couldn’t you have been particularly rigorous in developing it, such that we can be more confident that these results are correct now?”
For me, that’s the key. Calibrate as you go, there’s a lot of data out there already. If your simulation can replicate the data that arises from different circumstances in the real world, then you can be reasonably confident that its correct. If you buy that, then you might expect papers reporting modelling based research of biology to contain details of how their models were calibrated. You might expect them to publish the details of sensitivity analysis, such that they understand their models’ fragility in the face of potentially uncertain parameter values, before presenting their results. As I am now finding out, there are extremely few papers that even mention the words “calibration” or “sensitivity analysis” before proceeding to present results assumed to be applicable to the real world. Perhaps they are, but as a scientist I’d like to see some evidence for it, rather than taking people’s word. Science is driven, after all, by (healthy!) skepticism, not blind faith.
I think the world of simulation-based science has some catching up to do… but have faith, progress is in the pipeline!
[1] Derek J. Smith. Predictability and Preparedness in Influenza Control. Science, 312. Pages 392-394. 2006.

You might find interesting/relevant the concent of “Pattern-oriented modelling” introduced by Grimm et al in 1996.
In essence, it proposes that you start with a loose model that outputs results according to a known, community accepted pattern. Then, in incremental steps, you add complexity to it while making sure the initial pattern is still expressed.