pyiron_contrib.atomistics.atomistics.job.trainingcontainer module

Store structures together with energies and forces for potential fitting applications.

Basic usage:

>>> pr = Project("training")
>>> container = pr.create.job.TrainingContainer("small_structures")

Let’s make a structure and invent some forces

>>> structure = pr.create.structure.ase_bulk("Fe")
>>> forces = numpy.array([-1, 1, -1])
>>> container.add_structure(structure, energy=-1.234, forces=forces, identifier="Fe_bcc")

If you have a lot of precomputed structures you may also add them in bulk from a pandas DataFrame

>>> df = pandas.DataFrame({ "name": "Fe_bcc", "atoms": structure, "energy": -1.234, "forces": forces })
>>> container.include_dataset(df)

You can retrieve the full database with :method:`~.TrainingContainer.to_pandas()` like this

>>> container.to_pandas()
name    atoms   energy  forces  number_of_atoms
Fe_bcc  ...
class pyiron_contrib.atomistics.atomistics.job.trainingcontainer.TrainingContainer(project, job_name)[source]

Bases: GenericJob, HasStructure

Stores ASE structures with energies and forces.

add_structure(structure, energy, forces=None, stress=None, identifier=None, **arrays)[source]

Add new structure to structure list and save energy and forces with it.

For consistency with the rest of pyiron, energy should be in units of eV and forces in eV/A, but no conversion is performed.

Parameters
  • structure_or_job (Atoms) – structure to add

  • energy (float) – energy of the whole structure

  • forces (Nx3 array of float, optional) – per atom forces, where N is the number of atoms in the structure

  • stress (6 array of float, optional) – per structure stresses in voigt notation

  • name (str, optional) – name describing the structure

collect_output()[source]

Collect the output files of the external executable and store the information in the HDF5 file. This method has to be implemented in the individual hamiltonians.

from_hdf(hdf=None, group_name=None)[source]

Restore the GenericJob from an HDF5 file

Parameters
  • hdf (ProjectHDFio) – HDF5 group object - optional

  • group_name (str) – HDF5 subgroup name - optional

get_elements()[source]

Return a list of chemical elements in the training set.

Returns

list of unique elements in the training set as strings of their standard abbreviations

Return type

list

get_neighbors(num_neighbors=None)[source]

Calculate and add neighbor information in each structure.

If input.save_neighbors is True the data is automatically added to the internal storage and will be saved together with the normal structure data.

Parameters

num_neighbors (int, optional) – Number of neighbors to collect, if not given use value from input

Returns

neighbor information

Return type

NeighborsTrajectory

include_dataset(dataset)[source]

Add a pandas DataFrame to the saved structures.

The dataframe should have the following columns:
  • name: human readable name of the structure

  • atoms(ase.Atoms): the atomic structure

  • energy(float): energy of the whole structure

  • forces (Nx3 array of float): per atom forces, where N is the number of atoms in the structure

  • stress (6 array of float): per structure stress in voigt notation

include_job(job, iteration_step=- 1)[source]

Add structure, energy and forces from job.

Parameters
  • job (AtomisticGenericJob) – job to take structure from

  • iteration_step (int, optional) – if job has multiple steps, this

  • add (selects which to) –

include_structure(structure, energy=None, name=None, **properties)[source]

Add new structure to structure list and save energy and forces with it.

For consistency with the rest of pyiron, energy should be in units of eV and forces in eV/A, but no conversion is performed.

Parameters
  • structure_or_job (Atoms) – structure to add

  • energy (float) – energy of the whole structure

  • forces (Nx3 array of float, optional) – per atom forces, where N is the number of atoms in the structure

  • stress (6 array of float, optional) – per structure stresses in voigt notation

  • name (str, optional) – name describing the structure

iter(*arrays, wrap_atoms=True)[source]

Iterate over all structures in this object and all arrays that are defined

Parameters
  • wrap_atoms (bool) – True if the atoms are to be wrapped back into the unit cell; passed to get_structure()

  • *arrays (str) – name of arrays that should be iterated over

Yields

pyiron_atomistics.atomistitcs.structure.atoms.Atoms, arrays – every structure attached to the object and queried arrays

property plot

plotting interface

Type

TrainingPlots

run_if_interactive()[source]

For jobs which executables are available as Python library, those can also be executed with a library call instead of calling an external executable. This is usually faster than a single core python job.

run_static()[source]

The run static function is called by run to execute the simulation.

sample(name: str, selector: Callable[[StructureStorage, int], bool], delete_existing_job: bool = False) TrainingContainer[source]

Create a new TrainingContainer with structures filtered by selector.

self must have status finished. selector is passed the underlying StructureStorage of this container and the index of the structure and return a boolean whether to include the structure in the new container or not. The new container is saved and run.

Parameters
  • name (str) – name of the new TrainingContainer

  • selector (Callable[[StructureStorage, int], bool]) – callable that selects structure to include

  • delete_existing_job (bool) – if job with name exist, remove it first

Returns

new container with selected structures

Return type

TrainingContainer

Raises

ValueError – if a job with the given name already exists.

to_dict()[source]
to_hdf(hdf=None, group_name=None)[source]

Store the GenericJob in an HDF5 file

Parameters
  • hdf (ProjectHDFio) – HDF5 group object - optional

  • group_name (str) – HDF5 subgroup name - optional

to_list(filter_function=None)[source]

Returns the data as lists of pyiron structures, energies, forces, and the number of atoms

Parameters

filter_function (function) – Function applied to the dataset (which is a pandas DataFrame) to filter it

Returns

list of structures, energies, forces, and the number of atoms

Return type

tuple

to_pandas()[source]

Export list of structure to pandas table for external fitting codes.

The table contains the following columns:
  • ‘name’: human-readable name of the structure

  • ‘ase_atoms’: the structure as a Atoms object

  • ‘energy’: the energy of the full structure

  • ‘forces’: the per atom forces as a numpy.ndarray, shape Nx3

  • ‘stress’: the per structure stress as a numpy.ndarray, shape 6

  • ‘number_of_atoms’: the number of atoms in the structure, N

Returns

collected structures

Return type

pandas.DataFrame

write_input()[source]

Write the input files for the external executable. This method has to be implemented in the individual hamiltonians.

class pyiron_contrib.atomistics.atomistics.job.trainingcontainer.TrainingPlots(train)[source]

Bases: object

Simple interface to plot various properties of the structures inside the given TrainingContainer.

cell(angle_in_degrees=True)[source]

Plot histograms of cell parameters.

Plotted are atomic volume, density, cell vector lengths and cell vector angles in separate subplots all on a log-scale.

Parameters

angle_in_degrees (bool) – whether unit for angles is degree or radians

Returns

contains the plotted information in the columns:
  • a: length of first vector

  • b: length of second vector

  • c: length of third vector

  • alpha: angle between first and second vector

  • beta: angle between second and third vector

  • gamma: angle between third and first vector

  • V: volume of the cell

  • N: number of atoms in the cell

Return type

DataFrame

coordination(num_shells=4, log=True)[source]

Plot histogram of coordination in neighbor shells.

Computes one histogram of the number of neighbors in each neighbor shell up to num_shells and then plots them together.

Parameters
  • num_shells (int) – maximum shell to plot

  • log (float) – plot histogram values on a log scale

energy_volume(crystal_systems=False)[source]

Plot volume vs. energy.

Volume and energy are normalized per atom before plotting.

Parameters

crystal_systems (bool) – if True, plot & label structures of different crystal systems separately.

Returns

contains atomic energy and volumes in the columns ‘E’ and ‘V’; if crystal_systems is given,

also contain space groups and crystal systems of each structure

Return type

DataFrame

forces(axis: Optional[int] = None)[source]

Plot a histogram of all forces.

Parameters

axis (int, optional) – plot only forces along this axis, if not given plot all forces

shell_distances(num_shells=4)[source]

Plot a violin plot of the neighbor distances in shells up to num_shells.

Parameters

num_shells (int) – maximum shell to plot

spacegroups(symprec=0.001)[source]

Plot histograms of space groups and crystal systems.

Spacegroups and crystal systems are plotted in separate subplots.

Parameters

symprec (float) – precision of the symmetry search (passed to spglib)

Returns

contains two columns “space_group”, “crystal_system”

for each structure in train

Return type

DataFrame

class pyiron_contrib.atomistics.atomistics.job.trainingcontainer.TrainingStorage[source]

Bases: StructureStorage

add_structure(structure: Atoms, energy, identifier=None, **arrays) None[source]

Add a new structure to the container.

Additional keyword arguments given specify additional arrays to store for the structure. If an array with the given keyword name does not exist yet, it will be added to the container.

>>> container = StructureStorage()
>>> container.add_structure(Atoms(...), identifier="A", energy=3.14)
>>> container.get_array("energy", 0)
3.14

If the first axis of the extra array matches the length of the given structure, it will be added as an per atom array, otherwise as an per structure array.

>>> structure = Atoms(...)
>>> container.add_structure(structure, identifier="B", forces=len(structure) * [[0,0,0]])
>>> len(container.get_array("forces", 1)) == len(structure)
True

Reshaping the array to have the first axis be length 1 forces the array to be set as per structure array. That axis will then be stripped.

>>> container.add_structure(Atoms(...), identifier="C", pressure=np.eye(3)[np.newaxis, :, :])
>>> container.get_array("pressure", 2).shape
(3, 3)
Parameters
  • structure (Atoms) – structure to add

  • identifier (str, optional) – human-readable name for the structure, if None use current structre index as string

  • **kwargs – additional arrays to store for structure

include_dataset(dataset)[source]

Add a pandas DataFrame to the saved structures.

The dataframe should have the following columns:
  • name: human readable name of the structure

  • atoms(ase.Atoms): the atomic structure

  • energy(float): energy of the whole structure

  • forces (Nx3 array of float): per atom forces, where N is the number of atoms in the structure

  • charges (Nx3 array of floats):

  • stress (6 array of float): per structure stress in voigt notation

include_job(job, iteration_step=- 1)[source]

Add structure, energy and forces from job.

Parameters
  • job (AtomisticGenericJob) – job to take structure from

  • iteration_step (int, optional) – if job has multiple steps, this selects which to add

include_structure(structure, energy, name=None, **properties)[source]

Add new structure to structure list and save energy and forces with it.

For consistency with the rest of pyiron, energy should be in units of eV and forces in eV/A, but no conversion is performed.

Parameters
  • structure_or_job (Atoms) – structure to add

  • energy (float) – energy of the whole structure

  • forces (Nx3 array of float, optional) – per atom forces, where N is the number of atoms in the structure

  • stress (6 array of float, optional) – per structure stresses in voigt notation

  • name (str, optional) – name describing the structure

iter(*arrays, wrap_atoms=True)[source]

Iterate over all structures in this object and all arrays that are defined

Parameters
  • wrap_atoms (bool) – True if the atoms are to be wrapped back into the unit cell; passed to get_structure()

  • *arrays (str) – name of arrays that should be iterated over

Yields

pyiron_atomistics.atomistitcs.structure.atoms.Atoms, arrays – every structure attached to the object and queried arrays

property plot

plotting interface

Type

TrainingPlots

to_dict() Dict[str, Any][source]

Return a dictionary of all structures and training properties.

to_list(filter_function=None)[source]

Returns the data as lists of pyiron structures, energies, forces, and the number of atoms

Parameters

filter_function (function) – Function applied to the dataset (which is a pandas DataFrame) to filter it

Returns

list of structures, energies, forces, and the number of atoms

Return type

tuple

to_pandas()[source]

Export list of structure to pandas table for external fitting codes.

The table contains the following columns:
  • ‘name’: human-readable name of the structure

  • ‘ase_atoms’: the structure as a Atoms object

  • ‘energy’: the energy of the full structure

  • ‘forces’: the per atom forces as a numpy.ndarray, shape Nx3

  • ‘stress’: the per structure stress as a numpy.ndarray, shape 6

  • ‘number_of_atoms’: the number of atoms in the structure, N

Returns

collected structures

Return type

pandas.DataFrame