Job Submission System - S P E C I F I C A T I O N

James Werner

Monte Carlo Submission

Data can be generated from 3 different sources: the event generator, detector simulation, and experiment. Event generator is a quality assurance software to evaluate Monte Carlo generator. Monte carlo generator uses tcl files to set parameters for Geant4 software. The results of these parameters must be checked using the event generator to evaluate if the intent was achieved. Event generator generates listing outputs with all physical data for each event and allow the model design evaluate model accuracy. Detector simulation introduces the experiment transfer function into the event generator. It generates the output as if the event was happening in the real experimental setup. The experiment contains the electron/positron accelerator, Babar detector and data processing. We will describe only the necessary information to use the data, its relation with the detector and how to select events. Cuts are specific for each analysis, and will not covered here because there is not a predefined recipe. All these data are stored in the Book Keeping, which manages the access to a specific dataset.

Event Generator.

This procedure allows you to run Monte Carlo Generators in your computer or in the grid.

Only Event Generator, without detector simulation:

  1. Install analysis new release.
  2. Running Monte Carlo simulation.
  3. Grid submission, status, and recovering results.
  4. Running example using grid.
  5. Example results using grid.

The detector simulation.

  1. Install analysis new release.
  2. Running at slac.
  3. Installing Moose.
  4. Running Moose at Manchester.
  5. Grid submission, status, and recovering results.
  6. Running example using grid.
  7. Example results using grid.

EasyGrid modules for Monte Carlo Simulation.

Submission of MC software is done by script easymoncar. The script requires the name of the dataset that will be generated, the number of tasks to be performed in parallel and the initial sequence number.

./easymoncar Tau11 200 0

This command will generate 200 jobs and generate Tau11-0, Tau11-1, ... Tau11-199 datasets. To verify the status of the job, the user should use

./easygrid Tau11 as usual. easymoncar is an append that allow integrate different services to easygrid. The script can be generalised to other services as well, keeping easygrid core.

A complete session could be:

bash-2.05a$ ./easymoncar Tau11 100 0
...
Sub Tau11-0

Selected Virtual Organisation name (from --vo option): babar
Connecting to host lcgrb01.gridpp.rl.ac.uk, port 7772
Logging to host lcgrb01.gridpp.rl.ac.uk, port 9002


*********************************************************************************************
                               JOB SUBMIT OUTCOME
 The job has been successfully submitted to the Network Server.
 Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:

 - https://lxn1188.cern.ch:9000/ZP86xR7TLdzM502I-BZjNQ


*********************************************************************************************


               -  *  -  *  -  *  -  *  -  *  -  *  -  *  -  *  -
...
bash-2.05a$ ./easygrid Tau11
...

bash-2.05a$ vi pi0rojectMC.tcl
bash-2.05a$ cat pi0rojectMC.tcl
set ConfigPatch MC
set levelOfDetail cache
set BetaMiniTuple hbook
set histFileName pi0rojectMC.hbk
set NEvent 1000000000
#source SimulaData.tcl
#source Tau11-Run3-OnPeak-R14-6.tcl

lappend inputList /store/SPskims/pi0-00
lappend inputList /store/SPskims/pi0-01
lappend inputList /store/SPskims/pi0-02
lappend inputList /store/SPskims/pi0-03

mod talk pi0roject
show
exit

sourceFoundFile BetaMiniUser/pi0roject.tcl

Move files from /nfs/work/newstore/work/users/babar003 to /nfs/work/store/SPskims/ and rename the root files (if necessary). Go to your release directory and run analysis. bash-2.05a$ ls /nfs/work/store/SPskims/ R12 pi0-00.02E.root pi0-02.01.root pi0-03.02E.root R14 pi0-01.01.root pi0-02.02E.root simula.01.root pi0-00.01.root pi0-01.02E.root pi0-03.01.root simula.02E.root bash-2.05a$ . ./fullboot.sh Setting OO_FD_BOOT to /nfs/babar02/Production/objy/jamwer/26680/BaBar.BOOT bash-2.05a$ BetaMiniApp pi0rojectMC.tcl Current value of item(s) in the "pi0roject" module: Value of verbose for module pi0roject: f Value of production for module pi0roject: f Value of enableFrames for module pi0roject: f ... 73, TrackP1Pi0, 0, 0.616157, 0.937023, 1.90227, 2.2156, 4.17738, 5.09686, 5.09877 73, TrackP1Pi0, 1, 1.90211, -2.33774, -0.419866, -0.741769, -0.275289, 0.895708, 0.906517 73, MuonP1Pi0, -1, 1.90227, 2.2156, 4.17738, 5.09686, 5.09796 73, CPiP1Pi0, 1.90211, -2.33774, -0.419866, -0.741769, -0.275289, 0.895708, 0.906517, 139.57018 73, GamaP, 0, 0, 1.43931, -2.40619, -0.834871, -0.755304, 0.148894, 1.13563, 1.13563 73, GamaP, 1, 0, 1.46493, -2.27753, -0.570648, -0.668317, 0.0933871, 0.883746, 0.883746 73, 1Pi0P, 0, 0, 1, -1.40552, -1.42362, 0.242281, 2.01516, 2.01938, 0.13044 73, W1Pi0P, -1.82538, -2.16539, -0.0330079, 2.83232, 2.9259, 0.734054 215, TrackN2Pi0, 0, 1.28641, 0.712965, 0.572986, 0.770592, 0.278942, 0.999966, 1.00966 215, TrackN2Pi0, 1, 1.65835, -2.27532, -1.08627, -0.963566, -0.130137, 1.45787, 1.46453 215, MuonN2Pi0, 1, 0.572986, 0.770592, 0.278942, 0.999966, 1.00553 215, CPiN2Pi0, 1.65835, -2.27532, -1.08627, -0.963566, -0.130137, 1.45787, 1.46453, 139.57018 215, GamaN, 0, 0, 1.7916, -2.7053, -0.655254, -0.305517, -0.162284, 0.740969, 0.740969 215, GamaN, 1, -1, 0.977747, -2.55396, -0.393713, -0.262265, 0.318839, 0.570483, 0.570483 215, GamaN, 2, 1, 1.19042, -2.66281, -0.413008, -0.214376, 0.186063, 0.501151, 0.501151 215, GamaN, 3, -1, 1.78888, -2.3772, -0.543543, -0.521172, -0.166876, 0.771302, 0.771302 215, GamaN, 4, 0, 1.59573, -2.00761, -0.0211746, -0.0453523, -0.00124807, 0.0500675, 0.0500675 215, GamaN, 5, 1, 1.66995, -3.01745, -0.0943428, -0.0117722, -0.00945808, 0.0955437, 0.0955437 215, 2Pi0N, 0, 0, 4, -0.676428, -0.350869, -0.163532, 0.779364, 0.791036, 0.13539 215, 2Pi0N, 1, 2, 5, -0.507351, -0.226148, 0.176605, 0.582869, 0.596694, 0.1277 215, W2Pi0N, -2.27005, -1.54058, -0.117065, 2.74595, 2.85226, 0.771481

easymoncar


      1   #!/bin/bash
      2   #      GRID submission script  -  MONTE CARLO JOBS
      3   #            Author: Dr James Cunha Werner
      4   #            www.geocities.com/jamwer2002
      5   #               University of Manchester
      6   #
      7   echo "######################################################################"
      8   echo "#       e a s y g r i d - Job Submission system for Monte Carlo      #"
      9   echo "#                                                                    #"
     10   echo "#  If you need any support: James Cunha Werner  jamwer@hep.man.ac.uk #"
     11   echo "#                           Room 7-11  Phone 0161 275 4150           #"
     12   echo "#                           www.geocities.com/jamwer2002             #"
     13   echo "#  Documentation:   http://www.hep.man.ac.uk/u/jamwer/               #"
     14   echo "#                   University of Manchester                         #"
     15   echo "######################################################################"
     16   echo Welcome, $USER !
     17   echo
     18   if [ $# != 3 ]
     19     then
     20        echo Dear $USER,
     21        echo "You should provide the dataset name you want to run analysis, number of copies and initial sequence number. For example,"
     22        echo
     23        echo "       ./easymoncar Tau11 10 130    "
     24        echo
     25        echo "will submit 10 jobs from Tau11-130 to Tau11-140"
     26        echo "To obtain a complete list of datasets, type:"
     27        echo
     28        echo "       BbkDatasetTcl --dbsite=local > saida.txt "
     29        echo "       vi saida.txt "
     30        echo
     31        echo "You can find more information here: http://www.hep.man.ac.uk/u/jamwer/bbdata.html"
     32        echo "                     User manual at http://www.hep.man.ac.uk/u/jamwer/userman.html"
     33        exit
     34   fi
     35   if ( ! edg-job-submit --version )
     36     then
     37        echo
     38        echo LCG UI not installed in your computer. Contact grid project manager.
     39        echo
     40        exit
     41   fi
     42   grid-proxy-init -valid 240:00
     43   echo "######################################################################" >  $1.histo
     44   echo "#       e a s y m o n c a r  - Job Submission system for Babar Monte Carlo   #" >> $1.histo
     45   echo "#                                                                    #" >> $1.histo
     46   echo "#  If you need any support: James Cunha Werner  jamwer@hep.man.ac.uk #" >> $1.histo
     47   echo "#                   www.geocities.com/jamwer2002                     #" >> $1.histo
     48   echo "#  Documentation:   http://www.hep.man.ac.uk/u/jamwer/               #" >> $1.histo
     49   echo "#                   University of Manchester                         #" >> $1.histo
     50   echo "######################################################################" >> $1.histo
     51   date >> $1.histo
     52   grid-proxy-info  >> $1.histo
     53   export BINNAME=`echo $USER`_`echo $HOSTNAME`_$1_`date +%H%M%S%d%b%y`_MooseApp
     54   # Using SE/RLS to store application binary
     55   #echo Binary file: $BINNAME
     56   #edg-rm --vo babar cr file://`which MooseApp` -l lfn:$BINNAME  -d grid2.fe.infn.it > $BINNAME.setok 2>trab
     57   #echo Binary token: `cat $BINNAME.setok`
     58   #echo Binary Token: `cat $BINNAME.setok` >> $1.histo
     59   #echo
     60   #cat trab
     61   #echo >> $1.histo
     62   #cat trab >> $1.histo
     63   #rm trab
     64   # Using NFS to store application binary
     65   # If you are NOT in the release workdir
     66   #   cp <release directory path>/workdir/bin/Linux24RH72_i386_gcc2953/BetaMiniApp /nfs/babar01/$BINNAME
     67   # if you are in the release workdir or in the same path
     68   cp ../workdir/bin/Linux24RH72_i386_gcc2953/MooseApp /nfs/babar01/$BINNAME
     69   echo Binary File: /nfs/babar01/$BINNAME >> $1.histo
     70   ./gerasd $1 $BINNAME $2 $3 > gridsub
     71   chmod 700 gridsub
     72   ./gridsub
     73   cat gridtokens >> $1.histo
     74   rm gridtok
     75   cat $1.histo | awk ' /Sub/ {FileName=$2} /https/ {HandleName=$2; print "echo " HandleName "> " FileName".tok " }' >> gridtok
     76   chmod 700 gridtok
     77   ./gridtok
     78
     79   

[Download Source Code.]

Gerasd.c


      1   /*
      2           Geracao do jdl e dos comandos para GRID
      3              Author: Dr James Cunha Werner
      4            www.geocities.com/jamwer2002
      5             University of Manchester
      6   */
      7   #include <stdio.h>
      8   #include <stdlib.h>
      9   #include <sys/types.h>
     10   #include <time.h>
     11
     12   int main (int argc, char *argv[])
     13   {
     14   FILE *arqjdl,*arqtcl,*arqsh;
     15   char nomearq[300],nomedata[300],nomesh[300],nomebase[300],binname[300];
     16   int i,numcop,numseq;
     17   time_t t1;
     18
     19   (void) time(&t1);
     20   srand((int)t1 % 10000);
     21   strcpy(nomebase,argv[1]);
     22   strcpy(binname,argv[2]);
     23   numcop=atoi(argv[3]);
     24   numseq=atoi(argv[4]);
     25
     26   for(i=numseq;i<numseq+numcop;i++) {
     27     sprintf(nomedata,"%s-%d.tcl",nomebase,i);
     28     arqtcl=fopen(nomedata,"w");
     29
     30     fprintf(arqtcl,"set ProdTclOnly  true\n");
     31     fprintf(arqtcl,"set RUNNUM %d \n",9876+i*100);
     32     fprintf(arqtcl,"set CONDALIAS Jan2003\n");
     33     fprintf(arqtcl,"set NEVENT  20000 \n");
     34     fprintf(arqtcl,"set UDECAY PARENT/ProdDecayFiles/tau_generic_kk2f.tcl\n");
     35     fprintf(arqtcl,"set MooseHBookFile myMoose-%d.hbook\n",i);
     36     fprintf(arqtcl,"set MooseOutputCollection  /work/users/$env(USER)/myMoose-coll%d\n",i);
     37     fprintf(arqtcl,"mod talk KanEventOutput\n");
     38     fprintf(arqtcl,"  allowDirectoryCreation set true\n");
     39     fprintf(arqtcl,"exit\n");
     40     fprintf(arqtcl,"module talk StdHepPrint\n");
     41     fprintf(arqtcl,"        stdHepNumber set 10\n");
     42     fprintf(arqtcl,"exit\n");
     43     fprintf(arqtcl,"module talk RandomControl\n");
     44     fprintf(arqtcl,"  maxEventsPerRun  set  100000\n");
     45     fprintf(arqtcl,"  exit\n");
     46     fprintf(arqtcl,"sourceFoundFile  MooseProduction.tcl\n");
     47     fclose(arqtcl);
     48
     49     sprintf(nomesh,"%s-%d.sh",nomebase,i);
     50     arqsh=fopen(nomesh,"w");
     51     fprintf(arqsh,"#!/bin/bash\n");
     52     fprintf(arqsh,"echo Host computer: `/bin/hostname`\n");
     53     fprintf(arqsh,"echo Start time: `/bin/date`\n");
     54     fprintf(arqsh,"echo \n");
     55     fprintf(arqsh,"local=`pwd`\n");
     56     fprintf(arqsh,"echo \"Babar Inicialisation script \"  \n");
     57   //  fprintf(arqsh,". $VO_BABAR_SW_DIR/babar-grid-setup-env.sh \n");
     58   //  fprintf(arqsh,". /afs/hep.man.ac.uk/g/bfactory/etc/hepix/bashrc \n");
     59     fprintf(arqsh,". /afs/hep.man.ac.uk/g/bfactory/etc/hepix/bashrc_RH7 \n");
     60     fprintf(arqsh,"echo \n");
     61     fprintf(arqsh,"echo \" Environment Variables \" \n");
     62     fprintf(arqsh,"printenv \n");
     63     fprintf(arqsh,"echo \n");
     64     fprintf(arqsh,"echo \" Contents of $VO_BABAR_SW_DIR \" \n");
     65     fprintf(arqsh,"ls -l $VO_BABAR_SW_DIR \n");
     66     fprintf(arqsh,"echo \n");
     67     fprintf(arqsh,"echo \" Script for babar configuration \" \n");
     68     fprintf(arqsh,"cat $VO_BABAR_SW_DIR/babar-grid-setup-env.sh \n");
     69     fprintf(arqsh,"echo \n");
     70     fprintf(arqsh,"echo \" Listing work areas: /nfs/work \" \n");
     71     fprintf(arqsh,"ls /nfs/work \n");
     72     fprintf(arqsh,"echo \n");
     73     fprintf(arqsh,"echo -----------------------------------------------\n");
     74     fprintf(arqsh,"echo \n");
     75     fprintf(arqsh,"cd $BFDIST/releases/14.5.2 \n");
     76     fprintf(arqsh,"srtpath 14.5.2 Linux24RH72_i386_gcc2953 \n");
     77     fprintf(arqsh,"cd $local \n");
     78     fprintf(arqsh,"echo Arquivos disponiveis: $local \n");
     79     fprintf(arqsh,"ls \n");
     80     fprintf(arqsh,"echo \n");
     81     fprintf(arqsh,". ./fullboot.sh \n");
     82     fprintf(arqsh,"ln -s $BFDIST/releases/14.5.2 PARENT\n");
     83     fprintf(arqsh,"KanUserAdmin createuser\n");
     84     fprintf(arqsh,"/nfs/babar01/%s %s \n",binname,nomedata);
     85     fprintf(arqsh,"echo \n");
     86     fprintf(arqsh,"echo ----------------------------------------------\n");
     87     fprintf(arqsh,"echo \n");
     88     fprintf(arqsh,"echo End time: `/bin/date`\n");
     89     fclose(arqsh);
     90
     91     sprintf(nomearq,"%s-%d.jdl",nomebase,i);
     92     arqjdl=fopen(nomearq,"w");
     93     fprintf(arqjdl,"Executable=\"%s\";\n",nomesh);
     94     fprintf(arqjdl,"InputSandbox={\"%s\",\"%s\",\"fullboot.sh\",\"MooseProduction.tcl\"};\n",
     95       nomesh,nomedata);
     96     fprintf(arqjdl,"StdOutput=\"std.out\";\n");
     97     fprintf(arqjdl,"StdError=\"std.err\";\n");
     98     fprintf(arqjdl,"OutputSandbox={\"std.out\",\"std.err\",\"myMoose-%d.hbook\"};\n",i);
     99   //  fprintf(arqjdl,"Requirements = other.GlueCEUniqueID == \"bfa.tier2.hep.man.ac.uk:2119/jobmanager-lcgpbs-infinite\" ;\n");
    100     fprintf(arqjdl,"Requirements = other.GlueCEUniqueID == \"bohr0001.tier2.hep.man.ac.uk:2119/jobmanager-lcgpbs-babar\" ;\n");
    101     fclose(arqjdl);
    102
    103     printf("echo > trab\n");
    104     printf("echo \"Sub %s-%d\" >> trab \n",nomebase,i);
    105     printf("edg-job-submit --vo babar %s  >> trab\n",nomearq);
    106   //  printf("edg-job-submit --config-vo babar_vo.cfg --config babar_ui.cfg %s  >> trab\n",nomearq);
    107     printf("sleep 30 \n");
    108     printf("echo >> trab\n");
    109     printf("echo \"               -  *  -  *  -  *  -  *  -  *  -  *  -  *  -  *  - \" >>trab\n");
    110     printf("echo >> trab\n");
    111     printf("cat trab \n");
    112     printf("cat trab >> gridtokens\n");
    113
    114   }
    115   }
    116
    117   

[Download Source Code.]

Errors using grid:

1. Problems in the nfs system (19 times/100). However many other jobs were writing there without problems.

KanEventOutput::KanFileReg.cc(283):Could not open /nfs/work/newstore//work/users/babar003/myMoose-coll97.01.root!! Please check that this file does not already exist, and that you have write access to that location.
./Tau11-97.sh: line 32: 21103 Aborted                 /nfs/babar01/jamwer_bfb.tier2.hep.man.ac.uk_Tau11_11125908Jun05_MooseApp Tau11-97.tcl

2. Aborts (37 times/100): 

 Event: Running
- host                    =    lcgrb01.gridpp.rl.ac.uk
- node                    =    bohr0001.tier2.hep.man.ac.uk
- source                  =    LogMonitor
- src_instance            =    unique
- timestamp               =    Wed Jun  8 13:11:47 2005
- user                    =    /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner
        ---
 Event: Done
- exit_code               =    1
- host                    =    lcgrb01.gridpp.rl.ac.uk
- reason                  =    Cannot read JobWrapper output, both from Condor and from Maradona.
- source                  =    LogMonitor
- src_instance            =    unique
- status_code             =    FAILED
- timestamp               =    Wed Jun  8 13:13:13 2005
- user                    =    /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner

3. Submission errors (23/100)

**** Error: API_NATIVE_ERROR ****
Error while calling the "Status:getStatus" native api
Unable to retrieve the status for: https://lxn1188.cern.ch:9000/f0HAF8kGC4-k2Il5M46YrA
edg_wll_JobStatus: Connection refused: edg_wll_ssl_connect(): server closed the connection, probably due to overload

Top

Last modified:
Copyright 2004 Manchester University
Feedback to: jamwer@hep.man.ac.uk