Machine-readable versions of NewsCodes

IPTC NewsCodes can be downloaded in machine-readable versions from the IPTC CV server. Content is available in different formats to better integrate with the Semantic Web.

IPTC shares its Controlled Vocabularies (CV) using a server at http://cv.iptc.org/newscodes/

This section provides guidelines in using this server for the retrieval of full CVs or only single concepts.

Key Features of the cv.iptc.org Server

  • It implements IPTC’s CV design: each CV and each concept in a CV has an http URL as its identifier. This allows one to retrieve the data of the CV or concept by accessing the corresponding URL.

  • It provides a catalog of all available CVs at http://cv.iptc.org/newscodes/

  • Each CV is delivered as a list of concepts pertaining to this CV and additional CV-specific details.

  • Any concept which is a member of an IPTC CV is delivered as a dataset.

  • The datasets of the CVs and concepts are delivered in five different formats:

    • HTML as human readable variant

    • NewsML-G2 Knowledge Items (XML)

    • RDF/XML

    • RDF/Turtle

    • JSON/JSON-LD

CV Server Quick Start Guide

Go to the catalog of available IPTC CVs - at http://cv.iptc.org/newscodes/

All the names and definitions of CVs and concepts are displayed in the preferred language of your web browser if a translation into that language is available. If no translation exists names and definitions are displayed in the default language, British English (language tag "en-GB").

Browse the available CVs - and click on the Scheme URI of a CV to see all its member concepts.

To see a single concept, click on the Concept ID (URI) link displayed for each concept in this list.

To see a CV or a concept in another language:

  • to display all available languages: append ?lang=x-all to the web address in the browser

  • to display a specific language: append ?lang=<language tag> to the web address in the browser.

  • Languages currently available are: Brazilian Portuguese (pt-BR), British English (en-GB), French (fr), German (de), Norwegian (no), Simplified Chinese (zh-Hans), Spanish (es), Portuguese for Portugal (pt-PT), and Swedish (se).

If you need the data in a machine-readable format: find a guideline below.

Semantic Design of IPTC CVs

IPTC CVs implement the design and rules of IPTC’s QCodes and of W3C SKOS as IKOS - IPTC Knowledge Organisation System:

  • Each CV has an http-URL as Globally Unique Identifier (GUID)

  • For each CV, a name and a definition are provided (at least) in British English.

  • Each concept has an http-URL as Globally Unique Identifier (GUID): the first part of it is inherited from the CV URL and the code of this concept is appended making a new URL (see QCodes in a Nutshell)

  • For each concept a name and a definition are provided (at least) in British English.

  • Further dates of creating, modifying or retiring the concept and notes about it are provided.

  • Hierarchical relationships of concepts inside a scheme are expressed by skos:broader or skos:narrower terms

  • The mapping of concepts of a CV to concepts in other CVs is expressed by skos:closeMatch, skos:exactMatch or skos:broadMatch

  • Facets of a concept are supported by IKOS relationships.

Catalog of Available CVs

A catalog (list) of all available IPTC CVs can be found at http://cv.iptc.org/newscodes/. Accessing this URL delivers a list of the CVs as HTML page. No other formats are available.

Delivery of CVs or Concepts by URLs

For a CV, the URL assigned as its GUID must be applied to an http request.

The response delivers the data in the requested format and language; see below.

For a concept, the URL assigned as its GUID must be applied to an http request.

The response delivers the data in the requested format and language; see below.

How to Select Different Formats and Languages for Delivery

Which data format and which language is used by the server’s HTTP response can be selected by modifying the HTTP request.

Using HTTP content negotiation

One option is the use of HTTP content negotiation.

For the selection of the format the HTTP request sends an Accept header with a specific IANA Media Type (also known as MIME Type) which corresponds to the requested format. If the server is able to deliver this format it returns 200 as status code and the data in the requested format. Further the server adds the MIME type of this format to the Content-Type header of the HTTP response. If the format can not be delivered the IPTC CV server returns a 404 status code.

If no MIME type is set in the Accept header, HTML is delivered as default format.

These IANA Media (MIME) Types may be used: * for HTML data: text/html or application/xhtml+xml * for NewsML-G2 Knowledge Items: application/vnd.iptc.g2.knowledgeitem+xml * for RDF/XML data: application/rdf+xml * for RDF/Turtle data: text/turtle * for JSON data: application/json

the properties of CVs and Concepts supported by this JSON are defined by IKOS. The JSON is designed to conform to JSON-LD, it includes as linked @context; by ignoring or deleting the @context the data can be used as native JSON - see the Interpreting JSON as JSON-LD section of the JSON-LD 1.1 Recommendation.

For selection of language the HTTP request sends an Accept-Language header with one or many accepted languages tags as defined by IETF BPC 47 - e.g. fr for French, es for Spanish or de for German.

The IPTC CV server uses only the first tag if multiple tags are in the header. If the natural language properties (name, definition, notes) of the CV or concept are available in this language they are delivered, if not these properties are delivered in British English as default language.

Using URL parameters

Another option for selecting data format and language is the use of a URL parameter.

For the selection of the format a parameter format must be used with one of these values: * for HTML data: format=html * for NewsML-G2 Knowledge Items: format=g2ki * for RDF/XML data: format=rdfxml * for RDF/Turtle data: format=rdfttl * for JSON/JSON-LD data: format=json

For the selection of a language a parameter lang must be used, e.g.: * lang=fr …​ French, selected by its tag * lang=x-all …​ all available languages for this CV or concept are delivered. Be aware that this could create a high data volume.

Example 1: http://cv.iptc.org/newscodes/mediatopic/20001128/?format=json&lang=fr delivers the Media Topic "Weather Forecast" in French using the JSON format.

Example 2: http://cv.iptc.org/newscodes/scene/?format=g2ki&lang=de delivers the concepts of the Scene NewsCodes CV as NewsML-G2 Knowledge Items with the natural language properties in German.

Conditions/limitations for using the IPTC CV server

IPTC provides access to all of its Controlled Vocabularies on the CV server under these conditions:

  • They are copyright protected and can be used under the conditions of the Creative Commons Attribution 4.0 license - see the full license agreement at http://creativecommons.org/licenses/by/4.0/

  • They can be used free of any royalty fee.

  • The IPTC CV server is not intended for production use. Regular requests more frequently than ten per hour may be blocked.

Tools for Retrieving CVs or Concepts (in different formats or languages)

For retrieving CVs or concepts beyond HTML find below two of the many tools which may be used to retrieve IPTC NewsCodes in non-HTML formats:

wget

This widely used command line tool for retrieving web content can be tailored to request one of the formats above. The command line example below retrieves the IPTC Scene NewsCodes as IPTC G2 Knowledge Item and stores them into an XML file named IPTCscene with file name extensions corresponding to the format.

For IPTC G2:

wget -O IPTCscene-g2.xml --header="Accept:application/vnd.iptc.g2.knowledgeitem+xml" http://cv.iptc.org/newscodes/scene/

or

wget -O IPTCscene-g2.xml http://cv.iptc.org/newscodes/scene/?format=g2ki

For RDF/XML:

wget -O IPTCscene.rdf --header="Accept:application/rdf+xml" http://cv.iptc.org/newscodes/scene/

or

wget -O IPTCscene.rdf http://cv.iptc.org/newscodes/scene/?format=rdfxml

For RDF/Turtle:

wget -O IPTCscene.ttl --header="Accept:text/turtle" http://cv.iptc.org/newscodes/scene/

or

wget -O IPTCscene.ttl http://cv.iptc.org/newscodes/scene/?format=rdfttl

For JSON:

wget -O IPTCscene.json --header="Accept:application/json" http://cv.iptc.org/newscodes/scene/

or

wget -O IPTCscene.json http://cv.iptc.org/newscodes/scene/?format=json

Modify Header add-ons of web browsers

Using browser add-ons, you can modify/replace the Accept header for the data format and the Accept-Language header for the language.

Additional Resources