NewsML proved to be stable in production environment: since its introduction it was updated only two times, the current version is 1.2 of October 2003.
- Support the representation of electronic news entities such as news-items, parts of news-items, collections of news-items, relationships between news-items and metadata associated with newsitems.
News may be delivered as single items, or in packages of several related items, and has to have the metadata to allow efficient production, delivery, and use (including sorting and searching).
- Be usable throughout the news lifecycle.
While the main use will probably be for news interchange, the standard may also be applied to the creation, management and publication of news in networked systems, and for archive applications.
- Allow news-items to consist of arbitrary mixtures of media types, languages and encodings.
News packages can consist different types of content - text, images, video, audio - all of which are treated equally. The same news item may also exist in a number of different forms, such as translations of text into different languages or the presentation of images in alternative formats.
- Be usable either as a replacement for or allow the transport of all existing news formats and encodings.
The hope is that NewsML will gradually come to replace older news exchange formats - such as the Information Interchange Model IIM. However, where other formats perform different functions (like the News Industry Text Format NITF with its formatting capabilities) it must be possible to include them as self-contained items within NewsML.
- Support a number of different physical constructions of the same data. Depending on user demands, and the delivery systems in use, there may be a need to supply the same news content in different ways.
Some users may want all of a providers output delivered directly, while others may prefer to receive notification of availability (with an indication of content) and then retrieve the item if they want to use it.
- Support the management and development of news-items over time.
News stories often develop gradually so there is a need to update, add to, or replace earlier versions. Items in different media may not be available at the same time, so may have to be brought together.
- Be simply extensible and flexible.
Requirements are liable to change as the markets develop - a fixed structure could rapidly become out-of-date. In addition individual users may wish to add their own features and extensions.
- Allow for authentication and signature of metadata and newsitem content.
The value of news content, and its associated metadata, depends on its reliability.
- Not be unduly verbose.
Transmission systems vary in capacity throughout the news industry and the demands on them keep growing, so there are advantages in keeping the transmission overhead as small as possible (provided the other requirements are met). NewsML also needs to be suitable for use with both push and pull delivery systems.
- Use XML and other appropriate standards and recommendations.
Adopting XML makes it possible to build on a proven - and fast growing -technology and will help to ensure acceptance by the wider information industry. Since XML is now well established software tools and development expertise should be generally available.
Representation and management of news throughout its lifecycle is the aim with NewsML, while the standard has been designed to give considerable flexibility and allow for straightforward extension to suit individual user needs. Inevitably this has resulted in a rather complex and layered structure that can appear difficult to understand. However, there is no need to use all the features - so it would be possible to have a relatively simple implementation for, say, text handling - and the underlying logic is straightforward.
NewsML takes the form of an XML document, which has a series of components, or elements, that are used to structure and process the actual news content. These elements may have attributes to specify their properties and can carry content in the form of other elements (sub elements) and/or character data or external references.
Efficient use of metadata is a key feature for NewsML and considerable effort has been put into the development of a core set of metadata. This work was able to draw on the substantial intellectual capital represented by the earlier IIM (Information Interchange Model) and NITF (News Industry Text Format) standards, but has been substantially extended, making use of some advanced XML features.
In general, the design of NewsML tries to keep the metadata as close as possible to the item it describes, while much of the metadata is optional.
At the lowest level that could contain news data - the "ContentItem" - attributes can be added to describe the physical character of the news representation.
At the next higher level - the "NewsComponent" - several types of metadata can be added:
- AdministrativeMetadata deals with information about the origin of the NewsItem and includes the file name. The Provider and Creator of the news object can be identified, along with the source of the information, while specific provision has been made for identification of syndicated items. A Property element allows for the addition of any other administrative metadata that may be required for specific applications.
- RightsMetadata deals with the copyright of the NewsComponent, including details of any usage rights that have been granted to other parties by the copyright holder. Where supplied, this information is in text form along with (optional) links to machine processable data.
- DescriptiveMetadata is used to describe the content of a NewsItem with specific provision made for Language, Genre (the nature of the NewsItem, such as: Current, Analysis, Forecast, Interview, Retrospective); OfInterestTo (target audience), and TopicOccurence. Again, there is a Property element - to allow inclusion of any other descriptive metadata needed for a specific application.
- NewsLines can be thought of as being a human-readable (text) representation of some of the metadata - generally they have a property of being both machine readable and human readable; apply across different media types; have specific relevance to news, and are publishable. Examples of NewsLines that have been identified as likely to be widely used and so specifically identified as elements are: HeadLine; SubHeadlines; ByLine; Date-Line; CreditLine; CopyrightLine; RightsLine; SeriesLine; SlugLine; and KeywordLine. Use of these NewsLines is optional and each NewsLine can only be included once in a NewsComponent.
Version 1.2 was released in October 2003. In 2008 a normative XML Schema was added.
The specifications and the documentation can be downloaded from the Specification and Documentation tabs in the navigation bar on the top of this page.