Overview of the Universal Provenance Infrastructure

The Universal Provenance Infrastructure is the provenance-based solution offered by Universal Provenance. It is a technology-independent, application-agnostic, vendor-independent solution that allows provenance to be captured for business data across an Enterprise.

 

Universal Provenance Infrastructure Components

The Universal Provenance Infrastructure consists of five separate components illustrated in the figure below.

  1. The Universal Provenance Store (UPS) is a persistent, secure repository of provenance information. For inter-operability reason across the enterprise, it is exposed as a Web Service.  It is typically deployed on a secure, managed infrastructure machine, offering long-term archiving capabilities.
  2. For remote clients to be able to interact with the UPS, all communications with the UPS are mediated by the Provenance Support Library (PSL). The PSL is a Java-based library converting requests into SOAP-based messages aimed at the Web Service UPS.  To help create messages aimed at the UPS, the PSL integrates a complete Java-based representation of the provenance data model. It also exposes the core interfaces offered by the UPS, namely recording and querying capabilities.
  3. On the left hand side of the picture, we see the recording functionality, referred to as the Adaptor, a generic approach to intercept the flow of data in business applications, and mechanisms to record a description of this into the UPS, by means of the PSL. The Adaptor component is in fact a family of components, specialising in specific technologies used to build business applications. Currently, two adaptors are support for Enterprise Java Beans (EJB) and plain old Java objects (POJO).   Overtime, novel adaptors for other technologies will be developed.   All adaptors rely on a common adaptation layer, which allows the configuration to be defined independently of the technology used.
  4. On the right-hand side of the picture, we see the querying/analysis functionality. The Provenance Analysis Engine (PAE) is the component that is capable of retrieving the provenance of data, analyse it and reason about it, in order to support novel functionality identified in the use cases.  The PAE, while conceptually a single entity, is a distributed component, partly embedded in the UPS to access data efficiently, and partly embedded in the client application to deliver novel functionality.
  5. Finally, the remaining component is concerned with Provenance Visualisation. This is not a general-purpose visual tool, but a tool aimed at developers, programming with the Universal Provenance Infrastructure and needing to visualise the contents of the UPS.

Integration of the Universal Provenance Infrastructure

The following picture depicts how the Universal Provenance Infrastructure can integrate with legacy applications in order to provide new functionality to end-users.





On the recording side,  applications need to be provenance enabled, that is, they need to be adapted so that they produce a faithful description of what they perform and record such a description in the Universal Provenance Store.   To minimize the effort involved in adapting applications, the Adaptor component family (currently for EJBs and Pojos) intercept the flow of information and records a description of it according to the provenance data model.  For the technology supported, it is a matter of redeploying the applications, putting in place the appropriate interceptors and configuring the Adaptor to record adequately information, according to the application's needs. If the technology is not supported yet, more effort of integration will be required, typically involving programming.

The Universal Provenance Store is designed to operate on most relational databases, since it relies on an Object-Relational Mapping (ORM) layer to access the database. Specifically, UPS uses the Hibernate layer, known to operate with a vast range of commercial and non-commercial relational databases.  

On the querying/analysis side, the functionality provided by the Provenance Analysis Engine  is exposed by means of a programmatic interface (API), which allows its integration with third-party applications, in order to provide enriched functionality to end-users.