Web Magazine for Information Professionals

The Video Active Consortium: Europe's Television History Online

Johan Oomen and Vassilis Tzouvaras provide an insight into the background and development of the Video Active Portal which offers access to television heritage material from leading archives across Europe.

Europe's audiovisual heritage contains both a record and a representation of the past and as such it demonstrates the development of the 'audiovisual culture' we inhabit today. In this article we hope to offer an insight into the development of the Video Active Portal [1] which provides access broadcast heritage material retained by archives across Europe. We will explain how Video Active needed to find solutions for managing intellectual property rights, semantic and linguistic interoperability and the design of a meaningful user experience. We will also mention the use of Semantic Web technology and the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) as main components at the back-end of the Portal.

The greatest promise of the Internet as a public knowledge repository is to create seamless access for anyone, anywhere, to all knowledge and cultural products ever produced by mankind. Mainly due to increased bandwidth availability, Web sites offering online video material have managed to mature over a short period to become extremely popular. Web sites like YouTube [2], MySpace [3], Revver [4] and many others show how the idea of making and manipulating images (once mostly the preserve of professionals) has been embraced as a way of broadcasting who we are to anyone prepared to watch. The most popular site to date, YouTube, was launched in early 2005 and serves over 100 million videos daily [5]. Furthermore, it is estimated that online video will be responsible for 30% of all Internet traffic by 2011 [6].

Looking at these numbers, it is evident that the potential for releasing material from audio-visual archives online is enormous. To date, however, despite the many millions of hours of material held in these archives, only a few percent can be found online [7]. Many of the existing online services are based on user-generated content. And if professional content is offered (i.e. Joost [8], MSN Video [9], Miro [10], Blinkx [11]) the focus tends rather to concentrate on recent material.

Audio-visual archives need to overcome several obstacles before they can set up meaningful online services. These include: managing intellectual property rights; technological issues concerning digitisation and metadata standardisation; and issues related to the way the sources are presented to users. The latter is even more challenging if the aim is to present material from several countries in a structured way; indeed this is an aim and the starting point of the Video Active Project.

The main challenge facing Video Active is to remove the main barriers listed above in order to create multi-lingual access to Europe's television heritage. Video Active achieves this by selecting a balanced collection of television archive content, which reflects the cultural and historical similarities and differences of television from across the European Union, and by complementing this archive content with well-defined contextual metadata. The technical infrastructure facilitates the process of enrichment and asset management. Video Active offers an enormous resource for exploring both the representation of cultural and historical events within and across nations and the development of the medium itself at a cross-cultural level.

This article firstly introduces the main challenges involved and in the second part will provide some details on the technical infrastructure that was created in the first year of this three-year project.

The Project

Video Active is funded within the eContentplus Programme of the European Commission (Content Enrichment Action) and started in September 2006 for a duration of 36 months [12]. The first version of the portal will be fully operational by January 2008.

The Consortium and the Associate Members

The consortium consists of major European audio-visual archives, academic partners and ICT developers. The archives will supply their digital content; the universities are the link to end-users and play an important role in developing a strategy for selecting the content and in delivering the necessary contextual information. The ICT developers will be responsible for supplying the required technology.

Core archive partners
Associate partners
British Broadcasting Corporation, UK
Danish Broadcasting Corporation, DK
Deutsche Welle, DE
Hungarian National Audiovisual Archive, HU
Istituto Luce, IT
Hellenic Audiovisual Archive, GR
Netherlands Institute for Sound and Vision,
NL Österreichischer Rundfunk, AT
Radio-Télévision Belge de la Communauté Française, BE
Swedish Institute for Sound and Image, SE
Televisio de Catalunya, ES
Vlaamse Radio- en Televisieomroep (BE)
Moving Image Communications Ltd (UK)

Table 1: Core archive and associate partners

Eleven archives are represented in the core consortium. Taken together, their collections comprise over 4.5 million hours of audio and video material from 1890 to the present day.

diagram (56KB) : Figure 1 : Distribution of the consortium partners

Figure 1: Distribution of the consortium partners

As of June 2007 two new associate members have joined the project: the Vlaamse Radio- en Televisieomroep (VRT) from Belgium and the footage library Moving Image Communications Ltd from the UK. By welcoming these new partners Video Active will offer an even richer collection of television heritage to the Video Active portal. Video Active hopes to include more associate members soon. All organisations holding television content are welcome to join.

Amsterdam-based Noterik Multimedia specialises in online video solutions and is responsible for the development of the Video Active portal application. The second technical partner is the National Technical University of Athens, contributing expertise in knowledge representation as well as being responsible for metadata management. The media studies faculties of Utrecht University and Royal Holloway, University of London complement the consortium.

Users and Their Demands

The demand for access to television archive content online has been growing, and this demand has been driven by a number of distinct sectors: education, the general public and the heritage sector.

Digitisation of archive content transforms cultural heritage into flexible 'learning objects' that can be easily integrated into today's teaching and learning strategies. For the academic community the rich holdings of television archives are valuable teaching and research resources. Up to now access has been limited since much of the archive content is stored on legacy formats and often only with a minimum set of descriptive metadata. Although this is changing, with many of the preservation and digitisation projects now underway in large audio-visual archives across Europe, the comparative dimensions of European television content remain as yet largely unexplored.

As noted in our introduction, public demand for archive content has risen with the growth and affordability of the Internet and media publishing tools. Cultural heritage is of interest to everyone, not just specialists and students. The 19th century saw a huge development in museums, libraries, galleries and related heritage institutions, all with public access. Many such institutions have very low charges (or are free) in order to render access truly public and democratic. Audio-visual collections are by comparison much less accessible and democratic. Broadcast archives are closed to the public, most 'public' film and video institutions charge by the hour for personal access, and many such institutions are not actually public. Instead, they require proof of research status before allowing access to their general collections.

The digital age has also had an impact on the work of professionals in the heritage domain, such as museum curators, organisers of exhibitions, journalists, documentalists, etc. They can conduct their activities and render services faster, better, more efficiently and sometimes at a lower cost. In short, a so-called e-culture is emerging. Additionally, in the digital age, the value of heritage institutions lies increasingly in their role as mediators between networks that produce culture and those which impart meaning. To an increasing degree, they will find themselves contributing their knowledge and content within a cultural arena where a host of highly diverse players are in action, including non-cultural sector institutions, in addition to the audience or users. This means that the added value which heritage organisations seek to provide will grow increasingly dependent on the extent to which they are able to make knowledge sharing, crossovers, and structural co-operation part of their 'core business'.

These user groups have specific expectations and profiles, and the Video Active Project has to understand and encompass them to ensure both user satisfaction and revisits. Surveys, face-to-face interviews and desk research have been conducted in the initial stages of the project. The resulting insight into user requirements became fundamental to defining the technical specifications and hence the technical architecture. Further requirements testing will take place on the first release of the portal; comprehensive evaluation with key users will provide the project with input as it develops the second release, planned for the second year of the project.

Sustainability

After the formal project duration within the EC framework comes to an end, a new organisation (the Video Active Association) will continue the management of the portal. The consortium has already started to think about business models to support these activities. Most notably, Video Active will offer high-resolution versions of the items to be found on the portal to interested professionals in the creative industry. Once the stakeholders have agreed on the price and pre-conditions for reuse of a certain clip, the portal supports a secure file transfer using peer-to-peer technology. Furthermore, the portal is used as a 'shop window' to display pre-produced DVD's and other products created by the archives

.

Content Selection and Intellectual Property Rights

By definition, the selected content on the Video Active Portal is heterogeneous in many ways, language being one. A multi-lingual thesaurus allows multi-lingual access to the holdings. Ten languages will be supported in the first release of the Video Active Portal.

Other challenges regarding the approach to the selection of content relate to the variety of archive material across both historical periods and genres held by content providers for the project. Moreover, the availability of supplementary content (images, television guides etc.) as well as metadata produced by the content providers is not equally distributed amongst the partners. As a consequence, comparative research by academics and exploration by the general public will remain impossible.

In order to tackle these issues, Video Active has developed a content selection strategy which adopted a comparative perspective, namely seeking to explore and demonstrate both the cultural and historical similarities as well as differences in television content across Europe through various themes [13]. The thematic approach allows for the development of a rich resource that explores the history of Europe using television archive content from across a number of genres and periods. So far 40 different themes have been selected and together with the historical coverage, a matrix for content selection has been created. This comparative approach is also reflected in the data management and information architecture of the portal. Not only do the existing metadata in the archive need to be syntactically aligned, they must also be semantically enriched in order to facilitate understanding and analysis of the material selected. Several Video Active-specific fields were added to the Dublin Core element set [14], including Television Genre, European Dimension and National Relevance.

Intellectual property rights (IPR) represent a further and final major factor to influence content selection in relation to programmes. In almost all cases, individual rights owners need to be contacted before material can be published online and agreements need to be set up. Material cannot be made available until such agreements are in place with all relevant parties involved. The project does not have the financial means to finance rights clearances, so needless to say, not all content selected in the first instance will find its way onto the portal. Every country has different IPR regulations. For example, in some instances it is not permitted to store the video files on a server physically located abroad. As a consequence, the Video Active infrastructure was required to facilitate a distributed solution for content storage; this meant the central portal needed to link to a number of remotely distributed servers.

Encoding of the selected material is performed by the archives. Ingest formats (notably MPEG 1-2) are transcoded to Flash and Windows Media streaming formats by what is termed the Transcoding Factory. The Transcoding Factory is an integral part of the Contribution Application which lies at the heart of the asset management process of Video Active.

Video Active Architecture

The Video Active system comprises various modules, all using Web technologies. The whole workflow, from annotation, uploading material, transcoding material, keyframe extraction, metadata storage and searching is managed by these components. Figure 1 shows the architecture which lies behind the portal.

diagram (33KB) : Figure 2 : The Video Active Architecture

Figure 2: The Video Active Architecture

Video Active provides multi-lingual annotation, search and retrieval of the digital assets using the ThesauriX technology [15]. ThesauriX is a Web-based multilingual thesauri tool based on the IPTC standard [16]. The system also exploits Semantic Web technologies supporting automation, intelligent query services (i.e. sophisticated query) and semantic interoperability with other heterogeneous digital archives. In particular, a semantic layer has been added through the representation of its metadata in Resource Description Framework (RDF) [17]. The expressive power of RDF supports light reasoning services (use of implicit knowledge through subsumption and equivalence relations), merging/aligning metadata from heterogeneous sources and a sophisticated query facility based on SPARQL RDF query language [18]. Additionally, XML and relational database technologies have been used to speed up some processes where semantic information is not required. Finally, the Video Active metadata are public and ready to be harvested using the OAI-MPH technology [19].

In the Video Active system each archive has the ability either to insert the metadata manually, using the Web annotation tool, or semi-automatically, using a uniform (i.e. common to all the archives) XML schema. The Video Active metadata schema has been based on the Dublin Core [14] metadata schema with additional elements essential in capturing the cultural heritage aspect of the resources. The video metadata are produced automatically and are represented in a schema that is based in MPEG-7 [20]. In order to support semantic services, the metadata are transformed into RDF triples and stored in a semantic metadata repository.

The Annotation Process

The annotation process can be carried out either manually or semi-automatically. In the manual process, the archives use the Web Annotation Tool to insert the metadata. In the semi-automatic process, the archives export their metadata (the ones that have mappings to the Dublin Core elements) using a common XML schema. The elements that cannot be mapped to the Video Active schema (or are missing from the legacy databases, e.g. thesauri terms) are inserted manually (see Figure 3).

diagram (24KB) : Figure 3 : RDF representation of the final Video Active schema

Figure 3: RDF representation of the final Video Active schema

The Web Annotation Tool supports the entry and management of the metadata associated with the media and also handles the preparation of the actual content, i.e. format conversion (low/medium bit rate for streaming service, etc.).

It produces an XML file that contains metadata, based on Dublin Core, as well as content encoding and key frame extraction information. The XML is then transformed into RDF triples (Figure 2) and stored in the semantic repository. The use of an ontology language, such as RDF which has formal semantics, supports rich representation and reasoning services that facilitate sophisticated query, automation of processes and semantic interoperability. Semantic interoperability permits common automatic interpretation of the meaning of the exchanged information, i.e. the ability to process automatically the information in a machine-understandable manner. The first step to achieving a certain level of common understanding is a representation language that exchanges the formal semantics of the information. Thereafter, systems that understand these semantics (reasoning tools, ontology querying engines etc.) can process the information and provide Web services such as search and retrieval, etc. Semantic Web technologies provide the user with a formal framework for the representation and processing of different levels of semantics.

diagram (42KB) : Figure 4 : Architecture of Sesame data insertion system

Figure 4: Architecture of Sesame data insertion system

Storing and Querying

The semantic metadata store that is used in Video Active is Sesame [21]. Sesame is an open source Java framework for storing, querying and reasoning with RDF. It can be used as a database for RDF triples, or as a Java library for applications that need to work with RDF internally. It supports storing RDF triples in several storage systems (e.g. Sesame local repository, MySQL database). The procedure for the insertion of the assets into the RDF Store (Sesame) is depicted in Figure 3.

In order to transform the XML documents into RDF triples, Video Active uses the Jena Semantic Web Framework [22]. Jena is a JAVA API for building semantic web applications. It provides a programmatic environment for RDF, RDFS [23] and OWL [24], and XML [25]. In this application, Jena is mainly used for generating the RDF documents from the XML data representation.

The query service of Video Active system has been based on the SPARQL RDF query technology. [26] SPARQL is a W3C Candidate Recommendation towards a standard query language for the Semantic Web. Its focus is on querying RDF triples and has been successfully used to query the Video Active metadata.

The end-user has the ability to perform simple Google type searches but the query service also allows browsing through the metadata using pre-defined filters, an approach best compared with the Apple iTunes interface.

Metadata OAI Repository

All the metadata stored in Sesame, with the help of an OAI-compliant repository are exposed to external systems/archives. The OAI-PMH [19] defines a mechanism for harvesting records containing metadata from repositories. The OAI-PMH gives a simple technical option for data providers to make their metadata available to services, based on the open standards HTTP (Hypertext Transport Protocol) and XML (Extensible Markup Language). The metadata harvested may be in any format that is agreed by a community (or by any discrete set of data and service providers), although unqualified Dublin Core is specified to provide a basic level of interoperability.

Conclusion: Towards a European Digital Library

The European Commission's i2010 Digital Libraries initiative advocates the need for integrated access to the digital items and collections held in Europe's cultural heritage institutions via a single online access point: The European Digital Library (EDL).

Practical steps towards this goal are currently undertaken in many projects, large and small. The EC recently launched a co-ordinative action to align these efforts, called EDLnet [27]. Video Active is an invited member of the 'European Digital Library Thematic Partner Network' within EDLnet. This network aims to bring on board key cultural heritage stakeholders from European countries to prepare the ground for the development of an operational service for the European Digital Library, to be operational in 2008.

As this article has indicated, simply digitising and uploading archive content does not release the full potential of audio-visual content. The added value of archives lies in their ability to place material in a context that is meaningful to different user groups, and by enriching the metadata to allow interactive exploration. For a pan-European service, the infrastructure should meet very specific requirements, dealing with semantic and linguistic interoperability, the handling of intellectual property rights and so on. As more archives join Video Active, a vital part of our heritage will become available online for everybody to study and enjoy.

References

  1. Video Active http://www.videoactive.eu/
  2. YouTube http://www.youtube.com/
  3. MySpace http://www.myspace.com/
  4. Revver http://one.revver.com/revver/
  5. YouTube serves up 100 million videos a day online, USA Today, 16 July 2006 http://www.usatoday.com/tech/news/2006-07-16-youtube-views_x.htm
  6. Video Drives Net Traffic: Cisco credits the popularity of video and peer-to-peer networking with boosting net traffic 21 percent in 5 years, PCWorld, 18 August 2007 http://www.pcworld.com/article/id,136069-page,1/article.html
  7. Annual Report on Preservation Issues for European Audiovisual Collections (2007) http://www.prestospace.org/project/deliverables/D22-8.pdf
  8. Joost http://www.joost.com/
  9. MSN Video http://video.msn.com/video.aspx/?mkt=en-gb
  10. Miro http://www.getmiro.com/
  11. Blinkx http://www.blinkx.com/
  12. eContentplus Programme http://ec.europa.eu/information_society/activities/econtentplus/index_en.htm
  13. Content selection strategy report http://videoactive.files.wordpress.com/2007/10/23_content_selection_strategy_report.pdf
  14. Dublin Core Metadata Initiative http://dublincore.org/
  15. Multi-lingual Thesaurus http://www.birth-of-tv.org/birth/thesaurix.do?term=4620
  16. International Press Telecommunications Council http://www.iptc.org/pages/index.php
  17. Resource Description Framework (RDF) http://www.w3.org/RDF/
  18. SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/
  19. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#Introduction
  20. MPEG-7 Standard, ISO/IEC 15938-1, Multimedia Content Description Interface http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm
  21. Broekstra, J., Kampman, A., Harmelen, F. (2002). "Sesame: A Generis Architecture for Storing and Querying RDF and RDF Schema". 1st International Semantic Web Conference (ISWC2002).
  22. Jena - A Semantic Web Framework for Java http://jena.sourceforge.net/
  23. Dan Brickley, RDF Vocabulary Description Language 1.0: RDF Schema, http://www.w3.org/TR/rdf-schema/
  24. Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D. L., Patel-Schneider, P. F., & eds., L. A. S. (2004). OWL web ontology language reference.
  25. Extensible Markup Language (XML) http://www.w3.org/XML/
  26. Andy Seaborne, HP Labs Bristol, RDQL - A Query Language for RDF, http://www.w3.org/Submission/2004/SUBM-RDQL-20040109/
  27. EDLnet http://digitallibrary.eu/edlnet/

Author Details

Johan Oomen
Policy Advisor
Netherlands Institute for Sound and Vision

Email: joomen@beeldengeluid.nl
Web site: http://www.beeldengeluid.nl

Vassilis Tzouvaras
Senior Researcher
National Technical University of Athens

Email: tzouvaras@image.ntua.gr
Web site: http://www.image.ntua.gr/~tzouvaras/

Return to top