Grid plays a very important role in HEP. Users are able to test different theories dealing with enormous databases (Terabytes of information), with efficient use of time (limited by their research fixed-term contracts).
Another important aspects are share costs and balance available resources. Researchers are not submitting everyday.
There is a cycle process of submission, analysis, discussion, assumptions re-evaluation, and again resubmission.
Grid allows share resources around the world in transparent way for the user.
This documents presents a job submission system for Babar project that provides a system whereby members of the collaboration can locate datasets and resources, and submit analysis jobs to various sites. Initially this would be at Manchester Production farm (80 CPUs) and testbed (10 CPUs) - Run 1 and 2, and them at GridPP Tier 1 and Tier 2 centres (Run 3 and 4), and in due course to grid centres internationally, as far information have been published and standards followed.
Web pages were used instead usual reports to allow a wide discussion at grid community, HEP community, and metadata community. A multidisciplinary project would not succeed submiting conventional reports.
This prototype (and future production system) is the best way to use grid at babar project, with their peculiarities and constraints. Users will be able to configure the software as far they are able to discover more information about the sites and configuration. A further analysis would allow use it in other experiments as well.
The description of what job submission is and the impact of babar management and politics in grid environment have are described in Requirements ... under these circunstances.
To overcome these difficulties, job submission system was divided in two layers:
The development methodology was Rapid Application Prototyping (RAP), which allows an efficient link between development and users and earlier software adaptation to users requirements. From begining, users had available a submission system to test grid, experiment software and job submission. Equipments could be in use and productive from day 0.
Job submission provides distributed resources (configured in Components and infrstructure) to achieve users requirements. Understand the relationship between parts is crucial to configure software properly. The structure was divided in two components:
The software was designed using Job Submission System state-machine for an efficient and userfriendly result. A toolbox with utilities are also provided.
Grid at babar have been very unreliable and managed poorly. To provide users with a more reliable solution, an assessment and risk analysis was developed, and contains all listing codes and description for crash recovering and analysis.
Integration requirement for a heterogeneous architecture, distributed around the world, are described in Application and Installation Standards conformity. There are standards that need to be follow to allow a transparent execution in any computer. Initialisation scripts must be in designed places, and be accessible in the same way, etc. Applications will not need to know where they are running, or what is the site configuration.
Most problems in the past occurs due lack of Upgrade policies. Performing installation in mirror environment before introduce the changes in production allows a preliminary test and run benchmark, with reduction of crashes and errors. This is a guarantee of reliability, but problems always happens. I propose an effective Production support schema to provide help to users.
|
|
|
Feedback to: jamwer@hep.man.ac.uk |