PPDG Collaboration Meeting 16 Dec. 2003 Dataset Catalog Service http://www.ppdg.net/archives/talks/2003/ppt00077.ppt The JAS and Tech-X folks, with input from a lot of parties, have come up with a Dataset Catalog Interface and a reference implemenation in hopes that it is something that will be useful to a wider audience. The interface is defined in a WSDL so that language specific bindings can be generated. The XPath query language is good for supporting hierarchical queries, but it does not support wildcards/regexp to specify a value pattern. XPath2 should support it. The interface does not define the query language, but clients must still know the query language to use it. Introspection (getQueryLanguage()) is one way to determine the query language used by the specific DCS implementation. The DCS might be useful in a larger MetaData service, with the ability to associate metadata with more than just datasets, such as a lookup service. In this case, the Dataset class is not useful. A new extension or implementation of the service that doesn't contain the Dataset class could be useful as a Metadata Catalog Service. When the DCS is used in more contexts than just Datasets, it should still be useful, though there may be more requirements from, say, a Job Catalog Service. Questions were raised about the need for Dataset.getSupplierGSH(). The DCS is for metadata, so why not pass the opaque dataset id to a separate dataset service to get the data? This was done to prevent the client from needing any knowledge of the dataset format/provider. There are no methods in the interface for adding datasets/metadata to the catalog. An extension of the interface would be needed to create a mutable DCS. The current interface is read-only. The interface has no restrictions that datasets can't contain folders, and that folders can't contain metadata. However, the reference implementation does impose such restrictions. Dave Adams wants more metadata exposed as functions on the Dataset object. There is no typechecking or existence enforcement on metadata. Scalability has not been tested with this interface. It would be nice to see how it can perform for an experiement with many many small datasets. It would also be useful to know how well XQuery as a query language scales. Can you use it, and why/why not? JAS/Tech-X - They wrote it, they'll use it. GAE/CAIGEE - Definitely useful. Conrad wrote a Clarens implementation during the conference and plans to move forward with it. Implementation should be complete and useful in January 2004. Will extend the interface to allow adding data to the catalog. Atlas Data Analysis - Atlas is concerned about the query language since they use SQL, not XML. Atlas is also concerned about being "forced" into a hierarchical structure when such a structure is not necessary. Dave Adams will flesh out the Atlas Metadata requirements/interface more and then see how it matches this interface. STAR - STAR has no immediate need to implement a new interface for accessing dataset metadata. They commit to discuss it's usefulness. PHENIX - Will definitely look at it before implementing their own. POOL - No representatives available for comment. D0 - No representatives available for comment. CMS - No representatives available for comment. ARDA - ARDA will also commit to looking at the interface.