Practical Tutorial: Building a Database of Solids

In this tutorial, you are a developer of ASR and wish to produce a production scale recipes to run a database.

Creating the input structures

Create a file called makedb.py, which will supply:

from ase.db import connect
from ase.data import chemical_symbols
from ase.build import bulk


def materials():
    for i in range(118):
        sym = chemical_symbols[i]
        try:
            atoms = bulk(sym)
        except Exception:
            continue

        yield sym, atoms


with connect('materials.db') as con:
    for sym, atoms in materials():
        con.write(atoms, symbol=sym)

The first step is to create an input database:

python makedb.py

Building the repository

Second step is to setup a repository from the input database, i.e. import these structures into workflow. The first line sets up a repository, and after that we use an ASR tool called totree:

htw-util init
asr database totree materials.db --run

The totree command organizes the database rows into a directory tree and saves structure files. Each structure becomes a task, and we will next want to apply the previous workflow to each of those tasks. Thus, whereas previously the workflow did not take an input, our workflow will now take a structure task as an input.

htw-util ls will show us that a lot of tasks exist now called structure. They are not so much tasks as just pieces of data.

Setting up tasks

We use the same tasks as in the previous example:

from pathlib import Path
from gpaw import GPAW
from ase.optimize import BFGS
from ase.constraints import ExpCellFilter


def relax(atoms, calculator):
    calc = GPAW(**calculator)
    atoms.calc = calc
    with BFGS(ExpCellFilter(atoms), logfile='opt.log',
              trajectory='opt.traj') as opt:
        opt.run(fmax=0.01)
    return atoms.copy()


def gs(atoms, calculator):
    calc = GPAW(**calculator)
    atoms.calc = calc
    atoms.get_potential_energy()
    gpw = Path('gs.gpw')
    calc.write(gpw)
    return gpw


def bs(gpw):
    gscalc = GPAW(gpw)
    atoms = gscalc.get_atoms()
    path = atoms.cell.bandpath(density=5)
    calc = gscalc.fixed_density(kpts=path.kpts, symmetry='off',
                                txt='gpaw.txt')
    atoms.calc = calc
    bs = calc.band_structure()
    bs.write('bs.json')
    return bs

Creating a workflow

We adapt the workflow from previously, except we make it depend on a parameter with the same name as our “root” node, i.e., structure.

When the workflow runs, the variable structure will be a future corresponding to the output of the structure tasks. The workflow will look like this:

def workflow(rn, structure):
    atoms = structure
    calculator = {'kpts': {'density': 1.0}, 'mode': 'pw', 'txt': 'gpaw.txt'}

    relax = rn.task('relax', atoms=atoms, calculator=calculator)
    gs = rn.task('gs', atoms=relax.output, calculator=calculator)
    rn.task('bs', gpw=gs.output)

Since the workflow expects an input, we specify folders in the tree to run it:

htw-util workflow workflow.py tree/A/Au

This applies the workflow to all tasks matching the name structure under the specified path or paths.

The generated tasks can now be viewed and submitted as normal. To run the workflow can on all materials, specify the whole tree (e.g. tree/) to the workflow command.

Submit the workflow

Set up myqueue and use htw-util submit or use htw-util to run them on the local machine.

Something to think about

  • If a work flow script is updated, but the workflow is not run again, the tasks wont be updated, and there could be errors.

  • If a workflow is rerun many times, with different parameters, for example while developing the workflow, the tree directory gets one folder for each iteration. For production use, this is not an issue, because then the workflows are fixed. However, the tree directory will become unclear, due to caching of old invalidated things.