What is Provenance?

Definition

According to the Oxford English Dictionary, provenance is defined as (i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; concretely, a record of the ultimate derivation and passage of an item through its various owners.

The term is still relatively rarely used in the context of Information Technology. In order to get a feel for what provenance can bring to businesses,  we review the application of provenance in multiple domains before focusing on its use in Information Technology.

 

Provenance Matters

The concept of Provenance is used in many application domains, which we briefly discuss, before raising the problem of provenance of business data produced by IT systems.

Food Provenance

From wine to meat, from dairy products to whisky, from coffee to vegetables, the food industry is very keen to be able to demonstrate the origin of the ingredients we purchase and eat.  Many reasons drive this interest in food provenance: best crops from a well-defined region, combined with local water with characteristic taste, using a traditional distillation process, and long-term ageing in oak barrels, with themselves a remarkable provenance, is a stamp of quality, recognised by single malt whisky lovers across the world. Likewise, locally produced vegetables with traditional production techniques have a better carbon footprint because of their low food mileage.  Ethical labels also guarantee that products, e.g, such as coffee, were produced under decent working conditions, while promoting local sustainability and fair terms of trade for farmers and workers in the developing world. Understanding the provenance of food, i.e. its origin, how it is produced, transported, and delivered to us, is turned into a competitive advantage by the food industry, since it allows it to demonstrate quality (in taste, in carbon foot print, or in ethics).

Furthermore, across the world, governments and associated regulatory authorities are interested in food safety. (Food safety refers to the conditions and practices that preserve the quality of food to prevent contamination and foodborne illnesses.) In this context, the term of choice is "traceability". Regulations, such as the EU Food law, require the traceability of food, feed, food-producing animals and any other substance intended to be, or expected to be, incorporated into a food or feed to be established at all stages of production, processing and distribution. Similar laws, such as the US Bioterrorism Act, deal with the security aspect, and  deliberate contamination of food by terrorists. Whenever contaminated food is discovered, the ability to trace all its ingredients, suppliers, manufacturers, is critical, as illustrated by  the  food scandals that regularly show up on the front page of newspapers; for instance, the Sudan 1 scandal originated from traces of this carcinogenic dye  found in spicy food; it resulted in a withdrawal of many products from supermarket shelves across Europe.

Provenance in Design and Manufacturing

Manufacturers focus on compliance and traceability initiatives for a variety of reasons. Companies are increasingly focused on reducing manufacturing costs and particularly the cost associated with poor quality by using end-to-end traceability of product and processes. Understanding past processes is critical to discover bottlenecks, inefficiencies, wastage, and learn how to improve them.  Exact traceability is essential to manage product recalls efficiently and minimise their economic impact.  Similarly to the food industry, provenance of products is used to build customer trust.

Provenance in Art

In the domain of fine arts,  there is a long tradition of caring about the provenance of art artifacts. The usage of the term is so common and recognised that it is part of the dictionary definition. Concretely, in this context, the provenance of an art object includes the artist who created it, the person who may have commissioned the work, the materials used to produce it, its various owners and the circumstances in which they acquire the work (auction, inheritance, etc), the location where the  work of art resided during its lifetime, etc.  Being able to ascertain without doubt the artist who created a work of art inevitably raises its value; sometimes, it is simply the fame of previous owners that increase the artifact's values. Since the provenance of art objects is so important, available evidences are typically produced before auctions in order  to maximizse the price obtained for these objects. 

Many museums are conducting research on art works in their collection. An important part of that research is the effort to establish the provenance (here, chain of ownership) for a work, from the moment it leaves the artist's hands to the present. An important concern for curators has been to identify objects that had gaps in their chain of ownership for the Nazi era. Gaps in the provenance of a particular work may be attributable to different causes, from an owner's desire for anonymity to the unavailability of records of purchase and sale.  Thus, incomplete provenance information does not necessarily mean that a work has been tainted by the events of the Nazi era.

Provenance of Business Data Produced by IT Systems

The above brief survey shows that in many sectors, provenance is critical to the integrity, reputation, efficiency, reliability, safety, security of products and businesses. Given that Information Technology is now the underpinning backbone of most businesses, it would be appropriate to question the support for provenance, if any, that is offered by computer systems.  How often did not we wonder whether the latest data are being included in a report? How often did not we wish to have explanations on how results were derived? How often did we ask how a piece of data (e.g. phone number or budget figure) ended up unintended-ly in a document without us being able to explain how this was possible?  How often do not we want to reproduce a computation since we do not understand its outcomes? How often do we want to check that the actual processes executed by an IT system are the ones we intended?

The questions go on, and on. Why? Simply because IT systems have been optimised over the years to produce results efficiently, but essentially without leaving any audit trail.  Yes, we do have the odd logs,  for instance, listing web accesses and  database transactions, etc, but these logs are specific to a given software product, they are not related to data, they do not explain processes and the data that result from them. These logs do not inter-operate, and therefore it is excessively difficult to match the log of a software against the log produced by another software. They are typically distributed at different locations and there is no way of analysing them, yet alone reading them uniformly. And finally, they are typically aimed at systems programmers, enumerating low level operations without explaining the actual business functions that are being performed.

 

Provenance Use Cases

In this section, we intend to discuss various usages of provenance in the context of of business IT Systems. For each of these, we will demonstrate the benefits of provenance technology.

 

  1. Auditing (introspection of system, data and past processes)
  2. Quality Control (verifying that the best processes are executed to produce the best quality)
  3. Performance Checks (identifying bottlenecks, slow sections, efficient processes)
  4. Validating and Verifying Results (reproducing results, checking that result is correct)
  5. Process Oriented Compliance Checks (demonstrating that processes followed to produce a result are following established rules and best practices)
  6. License Checks (verifying the licenses of data and programs used in a computation)
  7. Forensic Analysis (analysis to understand the causes of breaches or disasters)
  8. Metrics Computation (deriving measures of trust, data quality, error propagation)
 

How does it work

Introduction

In business, some users, reviewers, auditors, or even regulators may have to perform a variety of checks on  business data. For instance, they have to verify that results are up-to-date; they have to establish that the process that led to some result is compliant with specific regulations or methodologies; they have to demonstrate that specific procedures were used to produce business data; they have to prove that results are derived independently of services or databases with given license restrictions; and, they need to establish that data was captured at source by instruments that possess some precise technical characteristics.

This problem is particularly important since our IT landscape is evolving as illustrated by applications that are open, composed dynamically, and that discover results and services on the fly. Against this challenging background, it is crucial for users to be able to have confidence in the results produced by such applications.  Provenance is the technology that helps users trust the data produced by their applications.

Overview

IT systems are efficient at producing data, but they are poor at describing what they do. A key observation is that electronic data does not typically contain the necessary historical information that would help end-users, reviewers, or regulators make the necessary verifications. 

To address this concern, Universal Provenance technology introduces the idea of an observer that observes the processes and the flow of information inside IT systems and faithfully records them in a secure repository. With this extra information that describes what has actually occurred in the IT system, analysis and auditing capabilities are able to derive how business data was produced, and ultimately help obtain trust in business applications. As illustrated by the following figure, the observer and the analysis capabilities allow business data to be enriched with information describing how it was derived.

 Concretely, how does this work?

The Universal Provenance Infrastructure consists of several components.  An adaptor, which can be regarded as a kind of big brother, observes the flow of information in IT Systems, and faithfully records it in a repository called provenance store.  This extra information, which we name process documentation, describes what actually occurred at execution time. The provenance store's role is to offer a long-term persistent, secure storage of process documentation. Applications that have been fitted with such adaptor/provenance store is said to provenance-aware.

Once process documentation has been recorded, provenance  analysis and reasoning tools can query the provenance store, extract the provenance of business data, and analyse it to suit the user's needs.  Analysis can vary from simple extractions identifying all the parameters or source data that influenced a result, to sophisticated rule-based checks. Such checks can determine, for example, that source data are appropriately licensed, that computations are undertaken with the required precision, or that the process that was executed is following established practices.

A data exporting capability is available so that it can be used to provide feedback to business applications, by means of asynchronous notifications or alarms; continuous monitoring or audit functionality can also be programmed.  Using this feedback mechanism, existing applications can be empowered with novel functionality, which can be exposed to users, in order to provide them with the necessary absolute confidence, over their business data.