wiki:Help/DevTranslator

Writing a Translator for Electronic Structure Code

This page is meant to help developers write a custom script in Python that translates the raw I/O of a particular electronic structure code into the UnifiedRepresentation XML format and also present the methods and best practices to integrate that translator with the ESTEST framework.

Required Python Libraries

First check you meet the required python libraries here and these found in the distribution of ESTEST under the lib directory.

periodicTable.py
UnifiedRepresentation.py
UrOptionParser.py
UrUtil.py

The ur4XxxMain.py Script

The first python script you will write is the "main" executable script that will be used for non-web command-line-based testing. Conventionally name this script ur4XxxMain.py where Xxx is the name of the electronic structure code. These main scripts typically use a template across all codes. A sample is given below

import sys
sys.path.insert(0,"../../.site-packages") #path to ESTEST .site-packages directory

from optparse import OptionParser

from UnifiedRepresentation import *
from UrOptionParser import *

from ur4Xxx import *

def main():
        urOptionParser = UrOptionParser() # define the command line parser

        (options,inputs) = urOptionParser.parse_args()
        # options is a dictionary of command line flags
        # inputs is an array of I/O file names

        # some kind of check that the I/O given on the command line is consistent
        if len(inputs) < 1 or len(outputs) < 1:
               sys.stderr.write("Not enough I/O!\n")
               sys.exit(1)

        # create a new UR class instance and initialize it with options and inputs 
        ur=UnifiedRepresentation(options,inputs)

        # pass the ur instance to the ur4Xxx function that will fill in the UR XML 
        # information based on the I/O files found in inputs. We assume I/O pairs
        # one input and one output. The developer is responsible for additional files
        stdin=open(inputs[0],'r')
        stdout=open(inputs[1],'r')
        ur4Xxx(ur,stdin,stdout) 

        # write the unified representation to file or stdout depending on the options
        ur.writeUnififiedRepresentation()
        # map the I/O file names in the inputs array to python FILE variables
        # for example if we assume len(inputs)==2, inputs[0]==STDIN & inputs[1]==STDOUT
        stdin.close()
        stdout.close()

        return

if __name__=="__main__":
       main()

The ur4Xxx.py Script

Another script that you will have to develop is ur4Xxx.py, where Xxx is the name of the ES code by convention. In this script the ur4Xxx function will be defined that performs the translation of raw I/O data into UR XML format.

This script is completely custom to the ES code Xxx and progresses in data-centric sections that I can only generally overview. Only the actual calls to set data in the ur class instance are coded; it is presumed that some code preceded it to read the data from the raw I/O format first.

import sys
sys.path.insert(0,"../../.site-packages")

from optparse import OptionParser

from UrOptionParser import *
from UnifiedRepresentation import *

import UrUtil

def ur4Xxx(ur,stdin,stdout):
        # set the stdin & stdout UR XML elements and rewind the FILE variables
        ur.setStdin(stdin.read())
        stdin.seek(0)
        ur.setStdout(stdout.read())
        stdout.seek(0)


        # Set the simulation start-time & end-time
        ... # some pre-parsing of raw I/O
        ur.setStartTime(start_time)
        ur.setEndTime(end_time)

        # Set program & version
        ... # some pre-parsing of raw I/O
        ur.setProgram(program,version,"xxx") 
        # the string "xxx" is the code name that is used to identify the code
        # when it comes to the CGI upload script(s). It is sometimes but not always 
        # the same as the program variable/name; i.e. program==PWSCF and "xxx" == "espresso"

        # set ecut if available, normalize to units of rydbergs
        ... # some pre-parsing of raw I/O
        ur.setEcut(ecut)

        # set the unit cell Cartesian in atomic units
        ... # some pre-parsing of raw I/O
        ur.setUnitCell(unit_cell_a,unit_cell_b,unit_cell_c)
        # unit_cell_[abc] are the three lattice vectors as python strings, i.e. "5.1 5.1 5.1"

        # set information about atomic species used in the simulation
        ... # some pre-parsing of raw I/O
        for spe in species:
                #href - the reference to the species pseudopotential, if available
                #mass - species mass given in standard atomic weight; carbon=12
                #symbol - the chemical element symbol
                ur.addSpecies(href,mass,symbol)

        # set atomic basis positions
        ... # some pre-parsing of raw I/O
        for atom in atomset:
                #name - atom name
                #species - atom species name
                ur_atom = ur.addAtom(name,species)

                #x - atom X-Cartesian coordinate
                #y - atom Y-Cartesian coordinate
                #z - atom Z-Cartesian coordinate

                ur.setAtomPosition(u_atom,x,y,z)

        # set energy components, normalize to units of Hartree
        ... # some pre-parsing of raw I/O
        for energy in energies:
                #ec_name - the energy component name, i.e. total, exchange, hartree
                #ec_value - the energy value in units of hartree
                ur.setEnergyContribution(ec_name,ec_value)

        # set the stress tensor, normalize to units of GPa
        ... # some pre-parsing of raw I/O
        ur.setStressTensor(stress_tensor)
        # stress tensor is an array of 9 numerical stress components in the order: xx,xy,xz,xy,yy,yz,xz,yz,zz

        # set eigenvalues, normalize to units of eV and convention of absolute reciprocal space coordinates
        ... # some pre-parsing of raw I/O
        for eig in eigenvalues:
                # spin - string for spin (not fully supported yet)
                # kpoint - string for k-point coordinate in absolute reciprocal space
                # values - an array of eigenvalues in units of eV
                ur.setEnergyEigenvalues(spin,kpoint,values)

        # More code to set UR XML data...

        return

The ur4XxxUpload.py Script

This script hooks into the CGI upload script used by the ESTEST server. It is somewhat specialized to the simulation code in terms of mapping I/O files but can generally follow a standard template. The getCodeNames function is a hook for the web upload form in ESTEST and should return an array of pair value giving the code name "xxx" and program name "Xxx" for the translator's ES code.

import os
import sys
sys.path.insert(0,"../../.site-packages")

import StringIO

from UnifiedRepresentation import *
from UrOptionParser import *

from ur4Xxx import *

def getCodeNames():
        return [("xxx","Xxx")]

def visit(arg,dirname,names):
        inputs=[]
        outputs=[]

        for n in names:
                abspath = dirname+'/'+n

                if os.path.isfile(abspath):
                        ext = n.split('.')[-1]

                        # The input file is matched by extension
                        if ext=="IN_EXT":
                                iname = dirname+'/'+n
                                # The output file is generated by prefix with assumed extension
                                oname = '.'.join(iname.split('.')[0:-1])+'.'+"OUT_EXT"

                                if os.path.isfile(oname):
                                        inputs.append(iname)
                                        outputs.append(oname)

        arg[0].extend(inputs)
        arg[1].extend(outputs)

        return

# this function is called by the ESTEST CGI upload script
def getUploads(tarpath,urOptions):
        uploads={}
        inputs=[]
        outputs=[]
        os.path.walk(tarpath,visit,(inputs,outputs))

        # check for consistency of I/O files somehow... usually number of inputs equals number of outputs
        assert len(inputs)==len(outputs)

        # The following code assumes pairs of I/O--one input and one output--without additional files
        # it is up to the developer to handle any additional files, for example an eigenvalues file, correctly
        # and matching the function calls in ur4Xxx

        for i in range(0,len(inputs)):
                stdin=open(inputs[i],'r')
                stdout=open(outputs[i],'r')

                ur=UnifiedRepresentation(urOptions,[inputs[i],outputs[i]])
                strio=StringIO.StringIO()
                ur4Xxx(ur,stdin,stdout)
                ur.writeUnifiedRepresentation(strio)

                uploads[ur.getUuid()]=strio

                stdin.close()
                stdout.close()

       return uploads

# This function is called by ESTEST re-translation main script and other diagnostic and debuggers.
# The iofiles variable is a dictionary of StringIO stdin/stdout file contents. The data structure
# is initially iofiles = {"stdin":[],"stdout":[]} where the arrays are filled with pairs of type
# (StringIO,u"") the first pair element being the I/O data and the second is the original file name.

def getUpload(iofiles,urOptions):
        assert ( len(iofiles["stdin"])+len(iofiles["stdout"]) )==2

        stdin=iofiles["stdin"][0][0]
        stdout=iofiles["stdout"][0][0]

        ur=UnifiedRepresentation(urOptions,['',''])
        ur4Xxx(ur,stdin,stdout)
        strio=StringIO.StringIO()
        ur.writeUnifiedRepresentation(strio)

        return ur

Installing Translators in ESTEST

Since ESTEST v1.5.1, installing a new translator to the ESTEST source directory is a simple matter of copying the necessary source files to the translators subdirectory. The three scripts described above should be stored in a directory structure "xxx/scripts", like, "xxx/scripts/ur4Xxx.py". Then run the setup.sh script in the translators directory which will produce two links to the ur4Xxx.py and ur4XxxUpload.py scripts in translators/.site-packages.

Last modified 5 years ago Last modified on 07/18/12 10:58:04