Publishers get boost to improve visibility and delivery of online news
By IPTC Editor Jonathan Engel
London (England) - December 2011 -- The two new powerful tools, in the form of the recently approved standards rNews and RightsML for describing content, will help electronic publishers raise the profile of their own online news, and deliver content more effectively to their business customers.
The ratification of these standards reflects an ongoing strategy by the IPTC to engage with a wider group of publishers online, improve the visibility of multimedia news, and enable more efficient delivery based on usage rights.
"Publishing news on the web requires publishers to communicate not only to the end user but also to automated systems," said Michael Steidl, IPTC Managing Director. "These new standards allow publishers to embed into web pages descriptions of the news content that can be understood by the web crawlers in an unambiguous way."
Each standard, approved after months of consultation within the industry, addresses separate issues involving the descriptive mark-up of news content. In both cases, however, the changes enable these descriptive metadata labels to be read by computers.
As a result, news content can be discovered and displayed more effectively by search engines, and distributed more easily to appropriate downstream clients.
rNews for metadata in web pages
The first standard, rNews 1.0, provides a blueprint for embedding machine-readable publishing metadata into the HTML format that governs the display of web pages. rNews makes it simple for machines to read structural metadata such as an article's headline directly from a web document, a task that would otherwise be difficult given the huge volume of web content.
"rNews is an effective way to embed news-specific metadata into HTML pages, making it easier to identify for web intermediaries such as search engines," said Stuart Myles, Deputy Director of Schema Standards at the Associated Press and head of the IPTC's working groups on the semantic web and rights expressions.
As for rights metadata, he noted that "Publishers need to express rights on the uses of content, often for third parties, and clients need to know permissions and restrictions when they select that content."
The rNews standard received a major endorsement in September from schema.org, a consortium of the search engine companies Google, Yahoo! and Microsoft. The backing of key players in the search arena gives publishers a strong incentive for adopting rNews and holds the promise of increased search quality for publishers that adopt rNews.
Employing rNews is also a useful step for publishers hoping to leverage the growing strands of Linked Data in the evolving Semantic Web.
For example, rNews-formatted articles could be linked easily to databases of newsmakers or organisations adhering to the web-standard Resource Description Framework (RDF) - where entities are identified by unique IDs plus descriptive properties and their values.
The development of rNews 1.0 was swift compared with most standards, taking little over one year. The quick pace owed much to the enthusiasm of three IPTC members -- Evan Sandhaus, Lead Architect for Semantic Platforms at The New York Times Company; Andreas Gebhard, a managing editor of Getty Images; and the AP's Stuart Myles.
Sandhaus developed the initial rNews proposal based on his experience with The New York Times' web site. When he presented the ideas to fellow IPTC members, they were quick to grasp the significance.
Myles, in his role as head of the IPTC's Semantic Web Working Group, and Gebhard, an IPTC board member, worked with Sandhaus to develop the guidelines into a version that would meet wide industry needs and be consistent with the IPTC's G2 family of B2B news standards.
"Web pages are written in a language that's great for specifying how things should look but really bad at saying what things mean," said Sandhaus. "This language makes it easy to say 'place this block of text above the article and make it really big.' But it is impossible to say that this same block of text is the headline," he added. "With rNews, however, publishers now have a simple industry-supported tool for easily expressing this type of structural metadata."
The new standard helps resolve several long-standing issues with web publishing. These problems stem from the fact that much of the structure of the metadata that describe headlines, bylines, introductory paragraphs, people, organisations and images is lost when it is rendered in HTML and delivered to web browsers.
Without such structured data, social networks, search engines and news aggregation sites struggle to create effective links or alerts to relevant content.
What's more, overly simplistic assumptions from automated classifiers - looking only at the full text of articles -- may lead to inappropriate ad placement. Myles noted that an ad for a cruise liner once appeared alongside an article about a survivor of the Titanic disaster.
This often-embarrassing situation can be avoided by the rNews mark-up, which highlights the clarifying metadata.
The aim of rNews is to mark up the elements of a news web page by type, or class, and then identify the class attributes and values. For example, a news item will have a specific headline, while an organisation will have attributes like its business name and ticker symbol.
A few semantic mark-up syntax standards have evolved for Web use; rNews was developed to support two of them. One of these, RDFa (i.e. RDF in attributes) is used by web applications such as Facebook, while the other, Microdata, is recommended by schema.org and thus has the backing of the consortium's major search engines.
IPTC contributes to the Linked Data effort
One of the themes of the IPTC's recent meeting in Vienna was the emergence of reliable sources of Linked Data, including DBpedia, which provides machine readable Wikipedia content; and Freebase, which contains structured definitions about people, places and things.
Following the successful development of rNews, the IPTC agreed that online publishing could get a further boost if major sources of Linked Data mapped their content to the IPTC's Media Topic News Codes. These codes describe common news subjects like Arts and entertainment or Disaster and accident.
"There are many ways for information providers to describe their Linked Data sets, so we need a set of common terms," said Myles. "The IPTC's Media Topic codes can provide that common language for developing the Semantic Web, allowing multiple sources of data to be discoverable through consistent mapping to this established standard."
RightsML for expressing rights on digital media
The second mark-up language, RightsML 0.9, is designed to express permissions and restrictions for content. For some time, the B2B parties of news exchange and clients publishing content online have needed machine-readable formulations of these rights, for selecting, distributing and publishing appropriate content.
The development of RightsML marks another benefit for businesses involved in online publishing . They needed a framework for defining content rights, restrictions and duties. Amid complex permission options, reduced editorial staff and greater application integration, they also needed these terms to be machine readable.
In this business-to-business environment, the basic requirement was to define the parties in the agreement, the actions permitted, the type of content and the conditions of use.
RightsML is based on an existing rights expression framework, the Open Digital Rights Language (ODRL).
The rights language includes a consistent vocabulary for actions that can be performed with news content, such as indexing it, aggregating it, translating it and sharing it. As a condition of use, for example, the rights expression could include the duty to obtain a content license.
The approval of RightsML version 0.9 signalled the formal transfer of management responsibility to the IPTC from an associated group of publishers developing the Automated Content Access Protocol (ACAP).Aside from the AP, the IPTC and the Newspaper Licensing Agency, which represents UK newspapers, organisations working on the new standard included Getty Images and the Wall Street Journal.