Notes from JDL discussion, 16 Dec 2003
Editorial note: A lot of the discussion was too detailed to easily include in these notes. Overall, the discussion was very fruitful and there was a clear desire by all to work together.
Job description languages
"language" is loaded terminology and should be used carefully. The WSDL
is not the language, but takes a description written in the language
as a parameter. ie: "submit" submits a job written in JDL. The WSDL
methods parse the jobs in the language.
Should the JDL contain enough information to do fine-grained interactive
analysis, such as using a slider to change processing parameters and having
the job change and run immediately. Gabrielle thinks that the langauge
should be flexible or extensible enough for this.
Can different experiments share implementations of a JDL? Possibly,
each one would have to examine their jobs in JDL to find common areas
of functionality. A service interface should be generic, but the JDL
may have experiment-specific extensions that aren't shared, such as the
dataset definition and specific analysis tasks. There was some discussion
about the need for generic APIs for JDL management (submission, splitting),
and such. Some of this has already been discussed before in CS11 and
at the Caltech workshop. Joe reviewed what has already been discussed:
CS11 API definitions and
Caltech Workshop.
It was decided to start investigating the interface for this JDL service
(starting from David Adams' analysis service http://www.ppdg.net/archives/talks/2003/ppt00078.ppt)
boolean has_application(Application)
??? install_application(Application)
boolean install_task(Application, Task)
JobID submit(Application, Task, Dataset, Config)
Job job(JobID)
boolean kill(JobID)
Data object types:
Application - an installed software package
Task - add-on scripts and configuration to an Application.
Dataset - A description of the dataset to use. This can be either
physical or logical (such as a query input to the
Dataset Catalog service)
Configuration -
Result -
Job -
This only allows coarse job control. It does not allow more fine-grained
job control, such as changing processing parameters while a job is
running (without killing and restarting the job) or pausing a job. A
previously discussed Capabilities API would be used to determine if
these behaviours exist. The capabilities API could be part of the JDL service
or attached to the Job/Task.
Provenance an be a problem because it includes not only the parameters
for processing, but also configuration files and scripts that aren't
carried along in the JDL. Full provenance must include these parameters
and scripts, which might change over time (they are in a user's home directory),
thus, the JDL is not able to completely store the provenance. This
information is supposed to be part of the task, but it could grow quite large,
especially if the tasks are repeated in the JDL. Perhaps a "delta" task
containing a "diff" between tasks can help with this.
Tasks should be generic enough that a scheduler or other service can deal with
generic tasks, but using typed tasks has value.
How do we proceed from here to arrive at a language and service specification?
David Alexander will pull the schema for the JDL out from the job description
language specification.