Relationships drive improvements in news content and delivery
LONDON (England) - September 2012 -- There was a time when newsroom expertise in relationships was confined to the agony aunt.
Now, relationships – in the wider sense of linkages among newsmakers, news subjects and events – are the foundation for more efficient and effective news delivery.
The change adds a richer level of context -- a “semantic” layer of relationships -- to the descriptive tags on news content.
Instead of capturing just the traditional hierarchy of news subject descriptions -- such as football being a type of sport -- the new approach extends to applying associations among people, organizations, locations and events that were mentioned in the article.
Thus the background descriptive tags for a football match report would identify specific players, teams, venues and results that could be compared easily with previous performances.
“Databases and hierarchies don’t really reflect the real world,” said John O’Donovan, the information technology director at Britain’s news agency Press Association, a leading proponent of the new approach. Now, he added, “Rather than organizing the stuff into groups inherently, you are treating them all as objects, and you are then putting those together based on concepts.”
O'Donovan and other news executives discussed these issues at a meeting of the International Press Telecommunications Council (IPTC), the industry body for news delivery technology, in London. Other news organisations and online information providers participating in the discussion included the BBC, Thomson Reuters, the Associated Press, Agence France-Presse, dpa, The New York Times, Pearson and Google.
The structure of these relationships (or ontologies in the expert’s language) is the key to delivering meaningful information to appropriate users – especially in a web environment. Ontologies allow reporters and editors not only to record that two things are related, but to show how they are related.
This new method for news delivery utilizes two key building blocks of the developing “semantic web,” where the focus is on data rather than documents. To start with, there is the Resource Description Framework (RDF), supplemented with Linked Data that is defined by that framework.
According to the RDF, a uniquely identified “thing” like a political party will have several properties, such as a leader. These attributes in turn are defined with a specific value, in this case the leader’s name plus a globally unique identifier. The same structure of RDF “triples” can be applied to a diverse range of newsworthy things, from works of art to sports teams, making it easier for information providers and their clients to identify relationships across information systems.
With the explosive growth of media content, “you really have to help your customers find things,” said O’Donovan, who delivered one of the main presentations at the IPTC conference.
He noted that traditional obstacles to successful information retrieval include limited multi-media functions of content management systems and disparate models for describing the various types of content.
Content “has historically been badly organised in the short-term,” with information providers focusing on immediate publication and not on future, wider uses, he said. “It makes your content more valuable if you can deal with those problems,” for example retrieving and linking content for a retrospective news package on the anniversary of a key event.
Otherwise, he added, “you’ve created something quite valuable (but soon) it’s valueless because you can’t find it.”
The new approach is a logical development in the use of descriptive metadata, to define content once and then deliver it through multiple publishing channels, either in ready-made packages or as components that clients can assemble themselves.
Another speaker at the conference, Madi Solomon from Pearson, described how the publishing and education company has embraced relation-based metadata to transform the way it delivers information.
Solomon, who is the company’s Director for Content Standards and Global Content Management, developed an “asset-enrichment pipeline” using among other tools RDF and DBpedia, a Linked Data source that structures Wikipedia content. She said she essentially offers a “metadata laundry service,” where each Pearson unit offers content metadata from its own repository, and gets it returned with related metadata added automatically.
This process can extract keywords from a content summary, then match these terms against DBpedia to generate additional, related keywords. From these steps, for instance, the generic term “dinosaur” might be added when the educational content mentioned its close zoological relative, a prehistoric bird.
O’Donovan from PA praised the BBC, his former employer, for taking the lead in sports coverage with its relationship-rich coverage of football’s World Cup in 2010. “The BBC during the World Cup completely changed the sport – not just the sport technical strategy, but the sport’s editorial strategy.”
The BBC representatives at the conference – Silver Oliver and Tom Grahame – said their work on the sports ontology focuses on key sports concepts, not on traditional, hierarchical web site navigation. In this “domain-driven design,” web pages devoted to teams or athletes are updated dynamically and automatically, not manually.
These efficiencies are realized by using relationship-based content tags delivered in flexible XML format, rather than via hard-coded HTML . Within these pages, once-separate areas for sports stories and statistics are unified, and users can also connect to externally maintained content formatted as Linked Data , for instance sites devoted to specific football clubs.
Oliver said the sports ontology was part of a wider BBC initiative to create strategic datasets for people, events and locations.
O’Donovan noted that in breaking news situations, relevant relationships often take time to be defined, and need to be added to the developing news archive. He explained that several years ago, in the early days of the swine flu epidemic, the initial PA slugline just called it the “farm pig” story.
The relationship-based approach allows more meaningful descriptions to evolve with the story. “One of the problems this solves is that as new concepts come up and these things become related to content you can add them in and start to fill out this ‘mind map’ of things that your story is related to,” he said.
Automated content classification systems are playing an increasing role in news delivery, participants noted, by recognizing news-making people, organizations and events that can be matched with attributes and concepts, and by separately identifying news concepts themselves.
“By using semantics one of the other things you do is start to create global identifiers,” O’Donovan said, such as for globally referenced football teams like Manchester United. These identifiers can be used by different content models, decreasing the reliance on static content types in fixed formats.
However, participants acknowledged that one of the remaining obstacles to defining news objects with their related attributes and concepts was the tricky issue of establishing object identifiers that were both unique and widely recognised.
For instance, “There’s no complete, satisfactory sports metadata solution” for entities with accessible, unique IDs, said Paul Kelly, director of software development at XML Team Solutions and chair of the IPTC's Sports Content Working Party.
Johan Lindgren, of Swedish news agency Tidningarnas Telegrambyra (TT), and also a member of the working party, noted that two players in the same sport can share the same name, making precise identification by name difficult.
Yet for entities identified consistently within a news organisation’s own content management system, or mapped to a recognized external source by Linked Data, a rich, multi-faceted description can emerge.
Once all news is organized in this fashion, PA’s O’Donovan said, news providers’ IT departments can focus on new opportunities to deliver content that clients can customize, rather than merely fixing bugs and maintaining legacy systems.
“When you start to map out those few things, it’s remarkable how those little atomic elements build up really quickly into very powerful and rich combinations of data.”