Getting Started Using CM2 at Manchester

James Werner

Find out what data is available locally

References: Bookkeeping User Tools Bookkeeping at Tier Cs

Creation of .bbk file - you only need to do this once

The BaBar bookkeeping system controls user access to avoid outsiders. To achieve the security standards, at the first time you need to connect to SLAC and execute one Bbk command to create the .bbk directory with all necessaries permissions.

 kslac
 BbkDatasetTcl --dbsite=man
The error:
 bfactory@bfb>BbkFiles --dbsite=man --dataset=SP-1005-Tau11-R14 --remote='*'
ACCESS ERROR: Can not reach repository.
ERROR: Sorry improper site for connection.
There is no connection information for site man.

happens if this step is missing. Do it again later on if it happens again. However, if nothing is helping, try:

 cp -r /afs/hep.man.ac.uk/u/jamwer/.bbk ~

Ignore the "permission denied" messages, as a sufficient amount of the area should be copied.

Generation of skimdata (TCL pointers files)

Basic eventstore concepts and terms

The user interface to the eventstore (for an analysis job, say) is an event "collection". Each collection represents an ordered series of N events and a user can choose to read the events from the 1st one in the sequence or from any given offset into the sequence.

An important difference of the CM2 Kanga implementation from the original Kanga implementation in BaBar is that each collection may map to _one or more_ files. You should not do things with the assumption of a one-to-one correspondence between collections and files.

The data itself is written to ROOT trees within the files. We often talk about data "components":

  1. hdr - event header
  2. usr - user data
  3. tag - tag information
  4. cnd - candidate information
  5. aod - "analysis object data"
  6. tru - MC truth data (only in MC data)
  7. esd - "event summary data"
  8. sim - "sim" data from BgsApp or MooseApp like GHits/GVertices (only in MC data)
  9. raw - subset of raw data from xtc persisted in the Kanga eventstore

In practice, these map to different trees, i.e. there is a "hdr" tree, an "aod" tree, etc. within the files your job opens when you read a particular collection.

You will also hear people talking about "micro" and "mini". When reading one of these it just means reading some set of these "components":

micro = hdr + usr + tag + cnd + aod (+ tru)
mini = micro + esd

Note that the "mini" is just a superset of the "micro".

Basic structure of the CM2 Kanga eventstore from a user point of view

All of the collections from production (PR, SP, Skimming) will have names beginning with /store. The /store portion of the eventstore is meant primarily for production data to be shared between sites.

There are three things one needs to discuss to understand the eventstore structure in more detail:

  1. collections - these are "logical" names that users use to configure their jobs. These are site-independent so (assuming the site has imported the data) the same collection name should work at any site.
  2. logical file names (LFN) - these are site-independent names give to all files in the eventstore. Any references within the event data itself _must_ use LFN's so that these remain valid when they are moved from site to site.
  3. physical file names (PFN) - these are file names that will vary from site to site. In practice they are usually derived from the LFN's by adding a prefix that encapsulates how the data is accessed at that site.
The first step is to find out what datasets are available. You can query BookKeeping to find out the types of data streams available with:
BbkFiles --dbsite=man

                                      On disk
                                    (status=0)
Skim Release Stream Components Files    Events GBytes
============ ====== ========== ===== ========= ======
14.3.2h      Tau11  HBCA         103 346661982  149.6
14.3.2h      Tau1N  HBCA          51  38444016   60.0
14.3.2h      Tau33  HBCA          56  42577579   66.6
14.4.0d      Tau11  HBCAT          8   1263308    3.6
14.4.0d      Tau1N  HBCAT          9   2409156    6.8
14.4.0d      Tau33  HBCAT         12   6369117   12.2
14.4.0e      Tau11  HBCAT          6    446446    1.8
14.4.0e      Tau1N  HBCAT          6    731289    3.6
14.4.0e      Tau33  HBCAT          8   2285602    6.7
14.4.2b      Tau11  HBCAT        175   2397099   11.2
14.4.2b      Tau1N  HBCAT        176   7825894   40.0
14.4.2b      Tau33  HBCAT        181  20186611   92.9
============ ====== ========== ===== ========= ======
Totals                           791 471598099  455.1
database: 791 (488606952049 bytes, 455.1 GB)
BbkDatasetTcl --dbsite=man | more

BbkDatasetTcl: 14527 datasets found:-

A0-Run1-OffPeak-R14
A0-Run1-OnPeak-R14
A0-Run2-OffPeak-R14
A0-Run2-OnPeak-R14
A0-Run3-OffPeak-R14
A0-Run3-OnPeak-R14
A0-Run4-OffPeak-R14
A0-Run4-OnPeak-R14
... 
The user should send the listing to a file, and later use a text editor to search:
BbkDatasetTcl --dbsite=man > saida.txt
vi saida.txt


The first case gives information about the number of files and amount of memory.
The second lists the datasets found in the system. Once you have an idea of a string contained by files that would be useful, use grep (or grep -i for case-insensitive search).
This listing allows the user search for more specific datasets, like:
 BbkFiles --dbsite=man --dataset=Tau11-Run3-* --remote='*'
                                      New file
                                     (status=2)
 Skim Release Stream Components Files    Events GBytes
 ============ ====== ========== ===== ========= ======
 14.3.2h      Tau11  HBCA         103 346661982  149.6
 ============ ====== ========== ===== ========= ======
 Totals                           103 346661982  149.6
 database: 103 (160680928250 bytes, 149.6 GB)

 BbkFiles --dbsite=man --dataset=Tau1N-Run3-* --remote='*'
                                  To Import (normal)
                                     (status=lC)
 Skim Release Stream Components Files   Events GBytes
 ============ ====== ========== ===== ======== ======
 14.3.2h      Tau1N  HBCA          51 38444016   60.0
 ============ ====== ========== ===== ======== ======
 Totals                            51 38444016   60.0
 database: 51 (64380858851 bytes, 60.0 GB)
 BbkFiles --dbsite=man --dataset=Tau33-Run3-* --remote='*'
                                   To Import (normal)
                                     (status=lC)
 Skim Release Stream Components Files   Events GBytes
 ============ ====== ========== ===== ======== ======
 14.3.2h      Tau33  HBCA          56 42577579   66.6
 ============ ====== ========== ===== ======== ======
 Totals                            56 42577579   66.6
 database: 56 (71523569016 bytes, 66.6 GB)
After select the dataset, the next step will be generate the files with pointers to the data in the database.
 BbkDatasetTcl -t 10000 -ds "Tau1N-Run3-OnPeak-R14" -b TauDataset --site man
 BbkDatasetTcl: wrote TauDataset-1.tcl (834241 events)
 BbkDatasetTcl: wrote TauDataset-2.tcl (829013 events)
 BbkDatasetTcl: wrote TauDataset-3.tcl (837648 events)
 BbkDatasetTcl: wrote TauDataset-4.tcl (850764 events)
 BbkDatasetTcl: wrote TauDataset-5.tcl (789362 events)
 BbkDatasetTcl: wrote TauDataset-6.tcl (793423 events)
 BbkDatasetTcl: wrote TauDataset-7.tcl (869339 events)
 BbkDatasetTcl: wrote TauDataset-8.tcl (865590 events)
 BbkDatasetTcl: wrote TauDataset-9.tcl (838361 events)
 BbkDatasetTcl: wrote TauDataset-10.tcl (854116 events)
 BbkDatasetTcl: wrote TauDataset-11.tcl (829229 events)
 BbkDatasetTcl: wrote TauDataset-12.tcl (859525 events)
 BbkDatasetTcl: wrote TauDataset-13.tcl (858229 events)
 BbkDatasetTcl: wrote TauDataset-14.tcl (865216 events)
 BbkDatasetTcl: wrote TauDataset-15.tcl (854086 events)
 BbkDatasetTcl: wrote TauDataset-16.tcl (830521 events)
 BbkDatasetTcl: wrote TauDataset-17.tcl (847940 events)
 BbkDatasetTcl: wrote TauDataset-18.tcl (861614 events)
 BbkDatasetTcl: wrote TauDataset-19.tcl (867448 events)
 BbkDatasetTcl: wrote TauDataset-20.tcl (353112 events)
 BbkDatasetTcl: wrote TauDataset-21.tcl (459827 events)
 BbkDatasetTcl: wrote TauDataset-22.tcl (131603 events)
 BbkDatasetTcl: wrote TauDataset-23.tcl (83369 events) 
 BbkDatasetTcl: wrote TauDataset-24.tcl (857981 events)
 BbkDatasetTcl: wrote TauDataset-25.tcl (550437 events)
 BbkDatasetTcl: wrote TauDataset-26.tcl (278269 events)
 Selected 26 collections, 18750263/444539210 events, ~272.6/pb
Top

Last modified:
Copyright 2004 Manchester University
Feedback to: jamwer@hep.man.ac.uk