Basic Task Creation¶
Initializing the repository¶
Make new directory and initialize the workflow repository there:
> mkdir htw-workflow-bs
> cd htw-workflow-bs
> htw-util init
Created repository at /home/askhl/htw-workflow-bs
This will create a directory called tree, and a file called tasks.py. To get more information about the htw-util structure, use htw-util info:
askhl@computer:~/htw-workflow-bs$ htw-util info
Root: /home/user/htw-workflow-bs
Tree: /home/user/htw-workflow-bs/tree
Registry: /home/user/htw-workflow-bs/registry.dat with 0 entries
Tasks: /home/user/htw-workflow-bs/tasks.py
We will create a workflow which relaxes the unit cell and geometry, to get a relaxed ground state, and subsequently does a band structure calculation to a system.
Create a workflow file called workflow.py:
from ase.build import bulk
def workflow(rn):
atoms = bulk('Si')
calculator = {'kpts': {'density': 1.0}, 'mode': 'pw', 'txt': 'gpaw.txt'}
rn.task('relax', atoms=atoms, calculator=calculator)
This defines a task called relax with atoms and calculator given as a parameter. The name relax is will be looked up through the tasks definedin the file tasks.py. However, the string defining the name of the task also accepts import paths, for example:
rn.task('asr.c2db.relax', atoms=atoms, calculator=calculator)
However, without module paths, the default search path for the tasks will be in the tasks.py, as given by the htw-util info command. This allows users to define their own tasks, without them being a part of the asr source code. [We envision this will be possible via project wide lib-package, in the future.]
Next step, is to write an actual task. Edit the file tasks.py to have following content relaxing the cell and geometry in a GPAW calculation:
from gpaw import GPAW
from ase.optimize import BFGS
from ase.constraints import ExpCellFilter
def relax(atoms, calculator):
calc = GPAW(**calculator)
atoms.calc = calc
with BFGS(ExpCellFilter(atoms), logfile='opt.log',
trajectory='opt.traj') as opt:
opt.run(fmax=0.01)
return atoms.copy()
Although this recipe shows a typical relaxation recipe written in GPAW, it can in principle be with any code.
Note, that a copy of the atoms must be returned, since atoms with calculator cannot be returned (as calculator cannot be serialized).
To get information about various commands, each subcommand of htw-util has –help parameter. For example, to get information about command ls, write. To get list of commands write htw-util –help.
user@computer:~/htw-workflow-bs$ htw-util ls –help Usage: htw-util ls [OPTIONS] [TREE]…
List tasks under directory TREEs.
Find tasks inside specified TREEs and collect their dependencies whether inside TREE or not. Then perform the specified actions on those tasks and their dependencies.
- Options:
- --parents
list also ancestors of selected tasks outside selection; output will be in topological order.
- --help
Show this message and exit.
Now it is time to submit a workflow. In order to do that, we write:
askhl@erlkoenig:~/htw-workflow-bs$ htw-util workflow workflow.py
Add: relax ready 7fabeab7 tree/relax-tf7xlviq
askhl@erlkoenig:~/htw-workflow-bs$ htw-util ls
relax ready 7fabeab7 tree/relax-tf7xlviq
Note that the job is shown as ready, which signifies that is ready to be run, i.e. it is not yet submitted.
The commands have created following file-structure. Useful linux command to view the structure of the calculations in the physical filesystem is called tree:
askhl@erlkoenig:~/htw-workflow-bs$ tree tree
tree
└── relax-tf7xlviq
└── input.json
1 directory, 1 file
To run the workflow, i.e. relax the geometrty of a molecule, it is suggested first to do a dry-run i.e. what would be submitted. To that end, there is the -z parameter. Run:
user@computer:~/htw-workflow-bs$ htw-util run tree/relax-tf7xlviq/ -z
would run <Future(relax, 7fabeab7, /home/askhl/htw-workflow-bs/tree/relax-tf7xlviq)>
To actually submit the task, a following command needs to be executed:
user@computer:~/htw-workflow-bs$ htw-util run tree/relax-tf7xlviq/
We can now observe that the calculation is running (XXX):
user@computer:~/htw-workflow-bs$ htw-util ls
relax xxx 7fabeab7 tree/relax-tf7xlviq
After the task has finished, we can observe that is set to state done:
user@computer:~/htw-workflow-bs$ htw-util ls
relax done 7fabeab7 tree/relax-tf7xlviq
One may now see, that the task folder has been amended with a group of files, which are related to the execution of the task. These may be internal files, such in this case as there is GPAW output file, optimizer logs, optimization trajectory and thus not directly related to htw-util. However, the files related to htw-util are input.json, and output.json, which are the formal input and return value of the task which was executed:
user@computer:~/htw-workflow-bs$ tree tree/
tree/
└── relax-tf7xlviq
├── gpaw.txt
├── input.json
├── opt.log
├── opt.traj
└── out.json
1 directory, 5 files
Next step is to do a ground state calculation, based on the relaxed geometry. To do this, add a following line to the workflow.py:
rn.task('gs', atoms=relax.output, calculator=calculator)
This tells the runner to define a task passing the output of the relaxed task under the name atoms.
We need to also define a corresponding task:
def gs(atoms, calculator):
calc = GPAW(**calculator)
atoms.calc = calc
atoms.get_potential_energy()
gpw = Path('gs.gpw')
calc.write(gpw)
return gpw
We make it return a Path object inside wherever the task runs. When other tasks run, they will see such paths relative to the own task’s directory. This makes it possible to pass paths from one task to another.
If we run the workflow again, we will get the followin tree:
askhl@erlkoenig:~/htw-workflow-bs$ tree tree
tree
├── gs-gebijtk9
│ └── input.json
└── relax-tf7xlviq
├── gpaw.txt
├── input.json
├── opt.log
├── opt.traj
└── out.json
2 directories, 6 files
askhl@erlkoenig:~/htw-workflow-bs$
Finally let us add a band structure task and run it:
def bs(gpw):
gscalc = GPAW(gpw)
atoms = gscalc.get_atoms()
path = atoms.cell.bandpath(density=5)
calc = gscalc.fixed_density(kpts=path.kpts, symmetry='off',
txt='gpaw.txt')
atoms.calc = calc
bs = calc.band_structure()
bs.write('bs.json')
return bs
This requires the followin line in the workflow:
rn.task('bs', gpw=gs.output)
where gs is the object returned by rn.task(‘gs’, …).
Now execute the workflow and run the tasks. The band structure can be found in bs.json afterwards.