From: "Chip Watson" To: "Ruth Pordes" Cc: "Reagan Moore" ; ; "Ian Bird" Subject: [Ppdg-steering] Re: JLAB-SRB Project Activity Review Date: Thursday, April 18, 2002 10:54 AM Ruth, I'll provide short answers here, then documentation ahead of the May 6 review, then longer answers in discussions as people like. Chip Ruth Pordes wrote: > a) What s/w the JLAB application use today that is contributed by PPDG > middleware or any other PPDG group than JLAB itself? The functionality we are currently deploying is Replica Catalog, Storage Resource Manager, file browsing, file transfers (2 and 3 party), and interaction with our silo. The SRM uses the PPDG spec document (but is a superset). The file transfer is multi-protocol as defined in the SRM document (we have integrated ftp, http, jarss so far, and we are now starting to integrate gridFTP). The replica catalog does not use the Globus product because it did not offer the functionality we desired. All components except file transfer speak SOAP. > What is the schedule for using SRB middleware in the JLAB application? This is a misunderstanding. We intended to develop interoperability with SRB through a common SOAP spec. That opens the door to future JLab collaborators to deploy or re-use SRB (e.g. universities, or national labs which have chosen SRB as their data management package). JLab has its own data management software which does silo management and integrates with batch system work, and it does not seem productive to replace it. (No other product provides the functionality we use for managing a StorageTek silo with multiple disk cache systems and multiple data movers. > What is the schedule for using Globus Java GSSAPI implementation in JLAB > application and what are the needs from Globus in order to do this? When we began Globus did not provide a pure java solution, so we quickly implemented one. It is on our todo list to look at this again, but we have not yet decided on a schedule. Obviously we must integrate with gsi to do gsiftp, so we'll be looking at that soon. > What are the plans for using other PPDG middleware in the JLAB application? I'm interested in batch system components to build a meta facility. > b) What is the status of bringing the common storage and file > replication interface specification to be used by other experiments? It is slow going, but we are making progress. I hope by the May 6 review to be able to say SRB and JLab have adopted some common WSDL parts, but we will be doing this in an incremental way. I have suggested to Arie that others within the SRM area would probably want to get involved in this WSDL definition work that we are now doing with SRB, so that the resulting definitions are used PPDG wide (and EU DG also?) > What is the roadmap and what do you need from the Steering Committee > in order to do this? This is evolving, but I think this is where we are for near term: 1. common WSDL for replication requests to homogeneous storage systems (gets the symmantics, error status, and visible authentication issues resolved) 2. replication between heterogenous SRB+JLab systems, using first one system then the other as the replica catalog, with the opposite serving as just a storage system (using gridFTP for actual transfer; gets more symmantics resolved) This this point I think it would be necessary to involve a larger group so that the investment by SRB and JLab has more value. From Reagan: Chip has produced a specification that builds upon the SRM asynchronous storage system interface. We are discussing the development of web service interfaces that integrate access across the JLab and SRB environments. The discussions are oriented towards creating similar semantic definitions for the parameters that will be used in the replication service. Please note that at the last Global Grid Forum meeting, both Chip Watson and Arcot Rajasekar demonstrated web service interfaces for replication. The current discussions target: - synchronous creation of a replica - file naming based upon global file names - access control specification - authentication specification - protocol specification What has become apparent is that the two systems have made different assumptions about where and when the distributed state information is established. In particular, we are coming to agreement on: - which state information is provided prior to the service invocation - which state information is provided as part of the service invocation parameters - which state information is determined during the service invocation by calls to subsidiary services. This decision is the heart of the data management architecture. We expect the Globus toolkit to provide its own breakdown for state information across these management approaches. Once we have the Globus specification, we will interate on our specification. > c) As you know PPDG is an integration and end-to-end application and > common middleware extension and production deployment project. What is the > profile of effort on this compared to development of the core JLAB portal > framework to date? Do you see this changing in Year 2 of PPDG? JLab's PPDG effort to date has been 1. interacting with broader community to help define common functionality 2. ... to define common API to support interoperability 3. modifying in-house software to conform to this common API 4. prototyping the use of SOAP to test / demonstrate its feasibility for use in a data grid, and uncover related implementation issues soon this will include: 5. integrating globus components 1-3 and 5 are "integration and common" parts, and 4 is more development, but the prototyping still has value to the broader community Going forward I hope that there will be a greater amount of software which we can integrate in view of Globus' adoption of web services -- see below. > d) As you move to extend the job management portal and job submission > and control what requirements do you have from the Grid middleware > providers, in particular the Condor Project, on PPDG for working with them > and functionality from code you would use? 1. define meta-data for job submission (in progress within PPDG and GGF) 2. define WSDL interfaces for job submission, migration, control (I presume this will follow, since Globus is moving to web services) 3. implement such web services (job submission, scheduling, dispatch to compute element) where compute element is running PBS, and data files (in and out) are accessed using the common SRM web services defined in the first part of this project From Reagan: SDSC has talked to Miron about the Virtual Data Language. We illustrated the ability to automate the extraction of metadata from VDL files, and register the information into a derived data product catalog. With the next release of the VDL specification, we will update the metadata extraction tool. > e) Your plan is that the Lattice Hadron Physics Collaboration through > the SciDAC A National Computational Infrastructure for Lattice Gauge Theory > will use the portal you have developed in this Project Activity first in > prototype and then in production mode. What does the collaboratory proposal > say about the portal work that JLAB and SRB are working on? Is JLAB the only > institute working on this software from the SciDAC project? Are there other > implementations being developed by that community that perhaps PPDG should > be aware of and be prepared to work with? It is understood that the majority > of development in this SciDAC project is application work. We have been involved in the GGF Grid Computing Environments portal workshops and discussions, although not in their recent testbeds. Perhaps these will produce solutions to 1-3 above in which case we could build on those our Lattice QCD specific portal. > f) Could you please comment on the status of the project with respect > to the items in the proposal which I include below. > > From the PPDG proposal , the JLAB section: > > One component of this infrastructure upgrade will be a Jefferson Lab Data > Grid that combines in-house silo and disk cache management software with > components from PPDG. The enhanced capabilities will be integrated into an > next-generation analysis framework for the CLAS collaboration (CEBAF Large > Acceptance Spectrometer, Hall B) and used for the future Hall D program. Integrates common WSDL specs, gsi, gridftp > In collaboration with the Lattice Hadron Physics Collaboration, including > MIT, Jefferson Lab is prototyping the use of web technologies to build a > simulation and data analysis meta-facility that will give access to > distributed batch systems and data management resources. Already the Lattice > Portal[i] provides the ability to submit and control batch jobs and retrieve > data files from the Jefferson Lab silo. This software will be extended to > multiple sites, with file transfers between the sites handled by components > from PPDG. See above comments on batch components > Grid Services at Jefferson Lab > Near-term major milestones in this project include: > · Sept 2001 Replicated data services (raw and > reconstructed data) between Jefferson Lab, MIT, and ODU (CS-4, CS-5, CS-6) We missed this one by 6 months > · Feb 2002 Automated policy based replication (push) of > raw data (a subset) and reconstructed data to several universities involved > in running experiments (CS-2, CS-3, CS-4, CS-5) We are deploying in May automated push of theory (simulation) data from MIT to JLab using the web services based components -- 3 months late > To achieve these milestones, Jefferson Lab will in the first year of this > proposal: > · work with developers within PPDG involved in standardizing > interactions between client applications and the disk resource manager > (protocols, application programming interfaces, etc.), as a first step > towards integrating this capability into the CLAS framework (CS-4, CS-5, > CS-6) Worked with Arie to define SRM functional; working with SRB to define WSDL interface > · deploy and integrate the Globus developed GridFTP component, > integrating the server piece with the existing disk management software to > support both file retrieval and authenticated uploading of files into the > disk cache (CS-6) From Reagan: One of the important integration activities that is being pursued is the integration of the GridFTP transport servers into the SRB environment. We are in discussion with Carl Kesselman on the set of functions that can be supported through GridFTP. Please note that the SRB works through a RPC-like mechanism, in which functions and associated parameters are sent to a remote server for execution. The remote server in turn maps the functions to the local file system. The SRB system executes 16 different Unix I/O commands against the remote storage system. The challenge is identifying which of these commands can be executed by GridFTP. There are another 80 functions that the SRB executes to support latency management, metadata manipulation, database access, etc. These would continue to be provided through a SRB server. The integration of GridFTP access with the SRB is important to the JLab project, because this would provide a similar transport protocol that could be used by both systems. We could then look at the direct transport of data from a JLab data managed system into a SRB managed system. This will raise additional levels of metadata integration concerns. We may need to define an explicit import/registration web service to manage the conversion between different semantics. in May > · begin a study of selecting datasets for analysis based upon data > characteristics rather than filenames Not in the first year. > Additional tasks to begin as prototyping work in the first year and move > into full development in the second and third years include: > · managing the flow of datasets to and from off-site batch jobs > (CS-1, CS-2) > · migrating jobs and/or data between sites (load balancing), taking > into account load, network bandwidth, etc. (CS-1, CS-2) > · monitoring the state / health / load of the integrated system > (silo, disk, compute, network) with interactive web interfaces (CS-3) Prototype above PBS done as web pages, not yet as web services. > · generating trend presentations (for capacity planning) on the web > (CS-3) Done a first version, also including usage of a single user (detected via X.509 certificate, no explicit user logon) > · supporting easy integration of additional university sites (a > deployable package, including documentation, etc.) (CS-4, CS-5, CS-6) starting prototype deployments at MIT, FSU Regards, Chip With additional comments as noted from Reagan