Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 222222
Loughborough University

IT Services : High Performance Computing

Experimental Control


Overview

For example, you could create a git repository which binds together your data and other simulation details. After running a simulation or set of simulations you can use git to commit the changes, capturing the changes in data and job scripts and so on. This creates an ID in git, which you can then record against the simulation you ran which you can then use to record against that result set and then bring back the exact set of programs, scripts, data and results on demand.

If source code for your own code is in a different repository you can still extract the git tag related to what you are using, and record that in a file.

A job script that might bring everything together might have the following steps:

extract source code from source code repository for a certain version
record which version used
automatically build the software
record what the inputs and output files are to be for this run
run the simulation
commit the changes to scripts, data, results with a message related to the job run
      

The above listing will be expanded in due course.

A particular advantage of this method of working is that you can control which changes are important by determining which things should be added to the repository. You can then effect a form of synchronisation for retrieval of critical data from hydra to a location where it will be backed up by using the git pull command on your managed desktop which will pull back just those changed elements you have deemed critical. However, using git for synchronisation does require you think carefully about how you change elements at each end to avoid pushing or pulling a change from your managed desktop that overwrites things you wish you had kept.

Recording Keeping

It is worth ensuring every time you submit a job you log what you ran, when. You can automate this by have stubs in your job script to record this information. This can be useful if you discover a bug in what you were doing as you can go back to all the job affected and rerun them. This record can be as complex as a database, XML, or just a simple file with output in plain text format. In the future, in the context of Research Data Management scripts to assist with this may be provided.

Information that it is worth keeping includes: when the job was run, what the program and arguments were that was run by the script, what command line options were passed when submitting the job, what versions of software were used, what the data used was and what data was created. It also helps you interact with Research Data Management when asked to deposit information related to a paper if you can determine easily what this data is.

Since you can replace the file /path/to/my/data.dat with another file then it is worth having both descriptive names for your data and to provide some sort of hash of the data. A full checksum of the data may take a long time, but things like dumping out the size of the file take much less time and may be sufficient.

You should keep a safe copy of this record.

An example script might be: Basic Serial Script .

Your requirements may differ from the above (e.g. you have many input files, parameters, and so on).

If you are using a program from a package (e.g. Comsol ) then you can use the additional, initial fragments:

" ProgramPath: " `which comsol `
" ProgramHash: " `cksum `which comsol` |awk '{print $1}'`"              \
      

or alternatively print out a version string from Comsol.