The concept of Provenance is used in many application domains, which we briefly discuss, before raising the problem of provenance of business data produced by IT systems. Food Provenance
From wine to meat, from dairy products to whisky, from coffee to vegetables, the food industry is very keen to be able to demonstrate the origin of the ingredients we purchase and eat. Many reasons drive this interest in food provenance: best crops from a well-defined region, combined with local water with characteristic taste, using a traditional distillation process, and long-term ageing in oak barrels, with themselves a remarkable provenance, is a stamp of quality, recognised by single malt whisky lovers across the world. Likewise, locally produced vegetables with traditional production techniques have a better carbon footprint because of their low food mileage. Ethical labels also guarantee that products, e.g, such as coffee, were produced under decent working conditions, while promoting local sustainability and fair terms of trade for farmers and workers in the developing world. Understanding the provenance of food, i.e. its origin, how it is produced, transported, and delivered to us, is turned into a competitive advantage by the food industry, since it allows it to demonstrate quality (in taste, in carbon foot print, or in ethics).
Furthermore, across the world, governments and associated regulatory authorities are interested in food safety. (Food safety refers to the conditions and practices that preserve the quality of food to prevent contamination and foodborne illnesses.) In this context, the term of choice is "traceability". Regulations, such as the EU Food law, require the traceability of food, feed, food-producing animals and any other substance intended to be, or expected to be, incorporated into a food or feed to be established at all stages of production, processing and distribution. Similar laws, such as the US Bioterrorism Act, deal with the security aspect, and deliberate contamination of food by terrorists. Whenever contaminated food is discovered, the ability to trace all its ingredients, suppliers, manufacturers, is critical, as illustrated by the food scandals that regularly show up on the front page of newspapers; for instance, the Sudan 1 scandal originated from traces of this carcinogenic dye found in spicy food; it resulted in a withdrawal of many products from supermarket shelves across Europe.
Provenance in Design and Manufacturing
Manufacturers focus on compliance and traceability initiatives for a variety of reasons. Companies are increasingly focused on reducing manufacturing costs and particularly the cost associated with poor quality by using end-to-end traceability of product and processes. Understanding past processes is critical to discover bottlenecks, inefficiencies, wastage, and learn how to improve them. Exact traceability is essential to manage product recalls efficiently and minimise their economic impact. Similarly to the food industry, provenance of products is used to build customer trust.
Provenance in Art
In the domain of fine arts, there is a long tradition of caring about the provenance of art artifacts. The usage of the term is so common and recognised that it is part of the dictionary definition. Concretely, in this context, the provenance of an art object includes the artist who created it, the person who may have commissioned the work, the materials used to produce it, its various owners and the circumstances in which they acquire the work (auction, inheritance, etc), the location where the work of art resided during its lifetime, etc. Being able to ascertain without doubt the artist who created a work of art inevitably raises its value; sometimes, it is simply the fame of previous owners that increase the artifact's values. Since the provenance of art objects is so important, available evidences are typically produced before auctions in order to maximizse the price obtained for these objects.
Many museums are conducting research on art works in their collection. An important part of that research is the effort to establish the provenance (here, chain of ownership) for a work, from the moment it leaves the artist's hands to the present. An important concern for curators has been to identify objects that had gaps in their chain of ownership for the Nazi era. Gaps in the provenance of a particular work may be attributable to different causes, from an owner's desire for anonymity to the unavailability of records of purchase and sale. Thus, incomplete provenance information does not necessarily mean that a work has been tainted by the events of the Nazi era.
Provenance of Business Data Produced by IT Systems
The above brief survey shows that in many sectors, provenance is critical to the integrity, reputation, efficiency, reliability, safety, security of products and businesses. Given that Information Technology is now the underpinning backbone of most businesses, it would be appropriate to question the support for provenance, if any, that is offered by computer systems. How often did not we wonder whether the latest data are being included in a report? How often did not we wish to have explanations on how results were derived? How often did we ask how a piece of data (e.g. phone number or budget figure) ended up unintended-ly in a document without us being able to explain how this was possible? How often do not we want to reproduce a computation since we do not understand its outcomes? How often do we want to check that the actual processes executed by an IT system are the ones we intended?
The questions go on, and on. Why? Simply because IT systems have been optimised over the years to produce results efficiently, but essentially without leaving any audit trail. Yes, we do have the odd logs, for instance, listing web accesses and database transactions, etc, but these logs are specific to a given software product, they are not related to data, they do not explain processes and the data that result from them. These logs do not inter-operate, and therefore it is excessively difficult to match the log of a software against the log produced by another software. They are typically distributed at different locations and there is no way of analysing them, yet alone reading them uniformly. And finally, they are typically aimed at systems programmers, enumerating low level operations without explaining the actual business functions that are being performed.