Job Submission System Prototype - Version 2

James Werner

Running Root sources with Grid

Quick start

If you want to run a normal root source, follow this steps:

  1. login in your afs account in linux8(outside firewall):

    ssh -l you linux8.hep.man.ac.uk

  2. mkdir grid
  3. cd grid
  4. . /afs/hep.man.ac.uk/g/bfactory/etc/hepix/bashrc
    Warning: dot space /afs... You want to source definitions from BaBar bashrc file in your shell.
  5. vi Source_root.C The first line of your code should be:

    void Source_root(TString path,TString dataset,TString initset,TString numgrp,TString par1, ... ,TString parN){

    where par1, ... , parN are parameters for your code. If you do not have parameters, use only the 4 first parameters (mandatory):

    void Source_root(TString path,TString dataset,TString initset,TString numgrp){

  6. Suppose your data is stored in the directory /nfs/work/store, and the data is stored in the following files: Dataset-1.root, Dataset-2.root, ... Dataset-159.root.
  7. The first time you submit:

    easyroot Dataset /nfs/work/store Source_root inic fim group par1 par2 ... parN

    where Dataset is the dataset name, /nfs/work/store is the data directory, inic is the first data file number (for example 1), fim is the last data file (for example 3) and you want group files together (for example 3 will join files 1, 2 and 3 in the job 1, 4,5, and 5 in the job 4, etc). par1 par2 ... parN are optional parameters for your code.

  8. After submit, to recover results and listings:

    easygrid Dataset

    if you have no parameters, or

    easygrid Dataset-par1-par2-...-parN

    if you have parameters in your code.

    Warning: you will receive only results from jobs done. Results from jobs in scheduled or running state will not be available. You have to try "easygrid dataset" again until all jobs have returned results!!!
    You can know if there are jobs pending typing "ls *.tok". If there are no files means all tokens have been deleted, everything was recovered, and there is nothing pending.

  9. Results will be available in the directories Dataset-par1-par2-...-parN-seq if you submit with parameters or Datasetname-seq if you do not. seq is the sequence number of each's job first dataset.

If you need to know more details about...


This is an example of how to run root sources with grid. There are 3 important files in the process:
  1. Necessary files are available at $BFROOT/bin.man. You also can download the following files, and save them together with your root source code:
    1. easyroot, and save as easyroot. If necessary, change permission typing "chmod 744 easyroot".
    2. gerar.c, and save as gerar.c. You will need to change the production grid typing "vi gerar.c".
    3. easygrid, and save as easygrid. If necessary, change permission typing "chmod 744 easygrid".
  2. Grid submission script: easyroot dataset_name path source init_number final_number group par1 ... parN

    where:
    dataset_name is the file name with ntuples without extension (e.g. Tau11
    path is the path of the datafile without extension (e.g. /nfs/babardisk1/rootfile )
    source is your root source code without extension (e.g. SumEltag )
    init_number is the data file sequence number (e.g. 23 means the first data file will be SumEltag-23.root)
    final_number is the last file number to be run (e.g. 30 means SumEltag-30.root will be run, and SumEltag-31.root not)
    group is the number of files to be join in each job (e.g. 3 means SumEltag-23, 24, and 25 will be added in the processing)
    par1 ... parN are optional parameters.

    You will use easyroot only once to submit. Later, use easygrid to know the status and recover results.

  3. gerar.c: contains all necessary files to run your application. You only need to change the requirement clause sign in red:

    fprintf(arqjdl,"Requirements = other.GlueCEUniqueID == \"bf32.hep.man.ac.uk:2119/jobmanager-lcgpbs-babar\"; \n");

    to the computer element which manages worker nodes with access to your data files. When you write "./easyroot SumEltag /nfs/work/store James 1 3 1" the worker node have to access /nfs/work/store to read the data and perform your analysis program (in this case, James).
    These are data that users have stored somewhere (their own AFS area, or babar/disk/bla-bla-bla...). There is not a bookkeeping, or tags to identify them and tell easyroot where the data is. This is not a problem, because the users know where they wrote their output.

    For example, if I store data in data1.ral.ac.uk:/work/james, and CE1.ral.ac.uk/jobmanager-queue manages 100 WNs with access to data1.ral.ac.uk, but mounted as /nfs/work1, you have to type:
    ./easyroot SumEltag /nfs/work1 James 1 3 1
    and change the line:
    fprintf(arqjdl,"Requirements = other.GlueCEUniqueID == \"bf32.hep.man.ac.uk:2119/jobmanager-lcgpbs-babar\"; \n");
    to
    fprintf(arqjdl,"Requirements = other.GlueCEUniqueID == \"CE1.ral.ac.uk/jobmanager-queue\"; \n");

  4. gerar: is the executable of gerar.c crated by:
    gcc gerar.c -o gerar

    after you have changed the computer element name.

  5. Your_code.C: this is root source code for your application. For example, if the code is James.C:

    
    [jamwer@bf39 grid_root]$ cat James.C
    
    void James(TString path,TString dataset,TString initset,TString numgrp){
      TFile* file = new TFile(path+"/"+dataset+"-"+initset+".root");
      TH1F* histo =(TH1F*)file->Get("h_SigTauInvmass");
      TFile* output = new TFile(dataset+"-"+initset+".root","recreate");
    
    
      TH1F* new_histo  =   (TH1F*)histo->Clone("new_histo");     <<<=== put here your code!
    
      output->Write();
      output->Close();
    
    }

    In your code you need the same interface with at least 4 parameters:

    path is the path of your dataset (see path above)
    dataset is the name of your dataset (see dataset_name above)
    initset is the first file sequence number (see init_number above)
    numgrp is the number of files to be join in this job (see group above).
    if you need to pass parameters to your code, put then after numgrp format TString.

    This is a code for Monte Carlo Thrust by Roger. Write a root file source.C something like

    
    void source(TString path,TString ds, TString init, TString numgrp){
    // Create histograms
    hthr = new TH1F("hthr","Thrust distribution",100,0.8,1);
    // Loop over files
    int nfile;
    int nmax=atoi((char*)numgrp);
    int initial=atoi((char*)init);
    for(nfile=initial;nfile<=nmax;nfile++){
    TFile f(path+"/"+ds+"-"+nfile+".root");
    if (f.IsZombie()) {cout<<" file error \n"; exit(-1);}
    // access ntuple
    int n=h1013->GetEntries();
    Float_t thr;
    h1013->SetBranchAddress("thrustMag",&thr);
    // loop over entries in ntuple
    for (Int_t i=0; iGetEntry(i);
    hthr->Fill(thr);
    }// end of entries loop
    }// end of files loop
    // Write the output file
    TFile *hfile = new TFile(ds+"-"+init+".root","recreate");
    hthr->Write();
    hfile->Close();
    } 
    
    
    

    Do not change the structure of input/output files, or grid will not recover your results and find the data. You also have to concatenate all data files dataset-init, dataset-(init+1), dataset-(init+2), ... dataset-(init+numgrp) in your code.

  6. easygrid is the script to know your job status and recover results:

    ./easygrid dataset_name

    where dataset_nema is the file name with ntuples without extension. Run easygrid in the same directory you have submited your root code!!!

    The final results will be stored in your directory, in the folders dataset_name_NN.

    A simple session submiting root code to grid:

    This is the example Mitch gave me to test a root submission system (Thanks!). First, submit using easyroot.

    
    [jamwer@bf39 grid_root]$ ./easyroot SumEltag /nfs/work/store James 1 3 1
    
    ######################################################################
    #       e a s y r o o t - Job Submission system for root             #
    #                                                                    #
    #  If you need any support: James Cunha Werner  jamwer@hep.man.ac.uk #
    #                           Room 7-11  Phone 0161 275 4150           #
    #                           www.geocities.com/jamwer2002             #
    #  Documentation:   http://www.hep.man.ac.uk/u/jamwer/               #
    #                   University of Manchester                         #
    ######################################################################
    Welcome, jamwer !
    
    
    Job Submission User Interface version  lcg2.1.69
    Searching previous handlers.
    
    
            W A R N I N G ! ! ! Handlers not found.
    
    Type yes if you want to run everything again - some previous results may be lost!:
    yes
    
    Handlers not found. Submiting to GRID . Wait end of process...
    Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner
    Enter GRID pass phrase for this identity:
    Creating proxy .............................................. Done
    Your proxy is valid until: Thu Mar 23 15:19:19 2006
    
    Sub SumEltag-1 Tue Dec 13 15:19:19 GMT 2005
    
    Selected Virtual Organisation name (from --vo option): babar
    Connecting to host bf31.hep.man.ac.uk, port 7772
    Logging to host bf31.hep.man.ac.uk, port 9002
    
    
    *********************************************************************************************
                                   JOB SUBMIT OUTCOME
     The job has been successfully submitted to the Network Server.
     Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:
    
     - https://bf31.hep.man.ac.uk:9000/MC_zpyJrw6e_u14kq4okrA
    
    
    *********************************************************************************************
    
    
                   -  *  -  *  -  *  -  *  -  *  -  *  -  *  -  *  -
    
    
    Sub SumEltag-2 Tue Dec 13 15:19:40 GMT 2005
    
    Selected Virtual Organisation name (from --vo option): babar
    Connecting to host bf31.hep.man.ac.uk, port 7772
    Logging to host bf31.hep.man.ac.uk, port 9002
    
    
    *********************************************************************************************
                                   JOB SUBMIT OUTCOME
     The job has been successfully submitted to the Network Server.
     Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:
    
     - https://bf31.hep.man.ac.uk:9000/PFFKDvFR4Tj2Jf9x9PBq4g
    
    
    *********************************************************************************************
    
    
                   -  *  -  *  -  *  -  *  -  *  -  *  -  *  -  *  -
    
    
    Sub SumEltag-3 Tue Dec 13 15:20:03 GMT 2005
    
    Selected Virtual Organisation name (from --vo option): babar
    Connecting to host bf31.hep.man.ac.uk, port 7772
    Logging to host bf31.hep.man.ac.uk, port 9002
    
    
    *********************************************************************************************
                                   JOB SUBMIT OUTCOME
     The job has been successfully submitted to the Network Server.
     Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:
    
     - https://bf31.hep.man.ac.uk:9000/1bLmgCcTdIaW88rNI6pgyA
    
    
    *********************************************************************************************
    
    
                   -  *  -  *  -  *  -  *  -  *  -  *  -  *  -  *  -
    
    You can see the results at any time, running easygrid with the same dataset_name.
    
    [jamwer@bf39 grid_root]$ ./easygrid SumEltag
    
    ######################################################################
    #       e a s y g r i d - Job Submission system for Babar analysis   #
    #                                                                    #
    #  If you need any support: James Cunha Werner  jamwer@hep.man.ac.uk #
    #                           Room 7-11  Phone 0161 275 4150           #
    #                           www.geocities.com/jamwer2002             #
    #  Documentation:   http://www.hep.man.ac.uk/u/jamwer/               #
    #                   University of Manchester                         #
    ######################################################################
    Welcome, jamwer !
    
    
    Job Submission User Interface version  lcg2.1.69
    Searching pre selected skimdata.
    Searching previous handlers.
    Checking if jobs finished.
    
    ### SubFile SumEltag-1.tok  Handle -> https://bf31.hep.man.ac.uk:9000/MC_zpyJrw6e_u14kq4okrA
        Current Status: Scheduled
    
    ### SubFile SumEltag-2.tok  Handle -> https://bf31.hep.man.ac.uk:9000/PFFKDvFR4Tj2Jf9x9PBq4g
        Current Status: Scheduled
    
    ### SubFile SumEltag-3.tok  Handle -> https://bf31.hep.man.ac.uk:9000/1bLmgCcTdIaW88rNI6pgyA
        Current Status: Scheduled
    0 jobs aborted ! Try again using easyresub DATASET FILE_NUMBER
    
    [jamwer@bf39 grid_root]$ ./easygrid SumEltag
    
    ######################################################################
    #       e a s y g r i d - Job Submission system for Babar analysis   #
    #                                                                    #
    #  If you need any support: James Cunha Werner  jamwer@hep.man.ac.uk #
    #                           Room 7-11  Phone 0161 275 4150           #
    #                           www.geocities.com/jamwer2002             #
    #  Documentation:   http://www.hep.man.ac.uk/u/jamwer/               #
    #                   University of Manchester                         #
    ######################################################################
    Welcome, jamwer !
    
    
    Job Submission User Interface version  lcg2.1.69
    Searching pre selected skimdata.
    Searching previous handlers.
    Checking if jobs finished.
    
    ### SubFile SumEltag-1.tok  Handle -> https://bf31.hep.man.ac.uk:9000/MC_zpyJrw6e_u14kq4okrA
        Current Status: Running
    
    ### SubFile SumEltag-2.tok  Handle -> https://bf31.hep.man.ac.uk:9000/PFFKDvFR4Tj2Jf9x9PBq4g
        Current Status: Running
    
    ### SubFile SumEltag-3.tok  Handle -> https://bf31.hep.man.ac.uk:9000/1bLmgCcTdIaW88rNI6pgyA
        Current Status: Running
    0 jobs aborted ! Try again using easyresub DATASET FILE_NUMBER
    
    [jamwer@bf39 grid_root]$ ./easygrid SumEltag
    
    ######################################################################
    #       e a s y g r i d - Job Submission system for Babar analysis   #
    #                                                                    #
    #  If you need any support: James Cunha Werner  jamwer@hep.man.ac.uk #
    #                           Room 7-11  Phone 0161 275 4150           #
    #                           www.geocities.com/jamwer2002             #
    #  Documentation:   http://www.hep.man.ac.uk/u/jamwer/               #
    #                   University of Manchester                         #
    ######################################################################
    Welcome, jamwer !
    
    
    Job Submission User Interface version  lcg2.1.69
    Searching pre selected skimdata.
    Searching previous handlers.
    Checking if jobs finished.
    
    ### SubFile SumEltag-1.tok  Handle -> https://bf31.hep.man.ac.uk:9000/MC_zpyJrw6e_u14kq4okrA
        Current Status: Done
    
    Retrieving files from host: bf31.hep.man.ac.uk ( for https://bf31.hep.man.ac.uk:9000/MC_zpyJrw6e_u14kq4okrA )
    
    *********************************************************************************
                            JOB GET OUTPUT OUTCOME
    
     Output sandbox files for the job:
     - https://bf31.hep.man.ac.uk:9000/MC_zpyJrw6e_u14kq4okrA
     have been successfully retrieved and stored in the directory:
     /home/jamwer/grid_root/jamwer_MC_zpyJrw6e_u14kq4okrA
    
    *********************************************************************************
    
        Exit code: 0
    
    ### SubFile SumEltag-2.tok  Handle -> https://bf31.hep.man.ac.uk:9000/PFFKDvFR4Tj2Jf9x9PBq4g
        Current Status: Done
    
    Retrieving files from host: bf31.hep.man.ac.uk ( for https://bf31.hep.man.ac.uk:9000/PFFKDvFR4Tj2Jf9x9PBq4g )
    
    *********************************************************************************
                            JOB GET OUTPUT OUTCOME
    
     Output sandbox files for the job:
     - https://bf31.hep.man.ac.uk:9000/PFFKDvFR4Tj2Jf9x9PBq4g
     have been successfully retrieved and stored in the directory:
     /home/jamwer/grid_root/jamwer_PFFKDvFR4Tj2Jf9x9PBq4g
    
    *********************************************************************************
    
        Exit code: 0
    
    ### SubFile SumEltag-3.tok  Handle -> https://bf31.hep.man.ac.uk:9000/1bLmgCcTdIaW88rNI6pgyA
        Current Status: Done
    
    Retrieving files from host: bf31.hep.man.ac.uk ( for https://bf31.hep.man.ac.uk:9000/1bLmgCcTdIaW88rNI6pgyA )
    
    *********************************************************************************
                            JOB GET OUTPUT OUTCOME
    
     Output sandbox files for the job:
     - https://bf31.hep.man.ac.uk:9000/1bLmgCcTdIaW88rNI6pgyA
     have been successfully retrieved and stored in the directory:
     /home/jamwer/grid_root/jamwer_1bLmgCcTdIaW88rNI6pgyA
    
    *********************************************************************************
    
        Exit code: 0
    0 jobs aborted ! Try again using easyresub DATASET FILE_NUMBER
    All jobs done. Available results recovered in your folder.
    
    WARNING: Next time easygrid will submit everything AGAIN!!!
    
    When all jobs have finished, you will have several folders in your directory with results and listings:
    
    [jamwer@bf39 grid_root]$ ls SumEltag-?
    
    SumEltag-1:
    std.err  std.out  SumEltag-1.root
    
    SumEltag-2:
    std.err  std.out  SumEltag-2.root
    
    SumEltag-3:
    std.err  std.out  SumEltag-3.root
    
    
    
    
    
    Top

    Last modified:
    Copyright 2004 Manchester University
    Feedback to: jamwer@hep.man.ac.uk