All rights reserved
© 2014 IPTC
 

NewsCodes Retrieval in Different Formats

NewsCodes can be retrieved from the IPTC controlled vocabulary server (http://cv.iptc.org). The default format for the response is human readable HTML - but three additional machine readable formats are available:

  • G2 Knowledge Item - with one to many concepts
  • RDF/XML using SKOS - with one to many concepts
  • RDF/Turtle using SKOS - with one to many concepts

How the NewsCodes can be obtained in these formats is described below.

Selecting the vocabulary or concept

First to say: the source of the retrieved data is always the same. It is the set of concepts hosted on and made available by the IPTC CV server http://cv.iptc.org.
Each and every concept is managed as a member of one of the IPTC controlled vocabularies - which are also called 'schemes' by the G2-Standards.
The vocabularies are branded as NewsCodes to give an instant feeling for their purpose.

Everybody with access to the Web can retrieve either a full vocabulary/scheme or a single concept from the IPTC CV server.
Which vocabulary/scheme or which concept should be delivered by the server is selected by the URL:

  • it always starts with http://cv.iptc.org/newscodes/ as basic URL
  • appended to this basic URL is the name of a scheme and a slash. A list of available schemes can be found on the View NewsCodes page. Example:"genre/"
  • if one applies this URL - like http://cv.iptc.org/newscodes/genre/ - the full controlled vocabulary/scheme is delivered by the server's HTTP response.
  • further the code of a specific concept may be appended, like "Actuality" of Genre: http://cv.iptc.org/newscodes/genre/Actuality - in this case data about only this single concept are delivered in the server's HTTP response.
Note: if you use basic URLs like http://cv.iptc.org or http://cv.iptc.org/newscodes/ only you will receive a 404 error as the do not address a vocabulary/scheme or concept.

Selecting the response format

Which data format is used by the server's HTTP response can be selected by the web request:

The little magic which does this is called HTTP content negotiation. For that purpose the HTTP request sends an Accept header with a specific MIME type which corresponds to the requested format. If the server is able to deliver this format it returns 200 as status code and the data in the requested format. Further the server adds the MIME type of this format to the Content-Type header of the HTTP response. If the format can not be delivered the IPTC CV server returns a 415 status code.
If no MIME type is set in the Accept header HTML is delivered as default format.

A basic distinction between formats is the intended receiver:
- a human person, or
- a machine
For that reason the IPTC qualifies the formats as either "human readable" or "machine readable".
How to select one of the formats available by the IPTC CV server is explained below.

** Human Readable Format: XHMTL

The required MIME Types in the Accept header of the HTTP request are:
text/html or application/xhtml+xml

The HTTP response is an HTML page with all concepts of a vocabulary/scheme or a single concept, depending on what was selected - see above.
This format is also the default format if no Accept header was sent or an empty Accept header was sent.

** Machine Readable: IPTC G2 Knowledge Item

The required MIME Type in the Accept header of the HTTP request is: application/vnd.iptc.g2.knowledgeitem+xml

The HTTP response is a IPTC G2 Knowledge Item with all concepts of a vocabulary/scheme or a single concept, depending on what was selected - see above.

** Machine Readable: RDF/XML with SKOS

The required MIME Type in the Accept header of the HTTP request is:
application/rdf+xml

The HTTP response is a RDF document in XML with all concepts of a vocabulary/scheme or a single concept, depending on what was selected - see above.
Relationships between concepts and other details are expressed using SKOS of the W3C.

** Machine Readable: RDF/Turtle with SKOS

The required MIME Type in the Accept header of the HTTP request is:
text/turtle

The HTTP response is a RDF document in Turtle with all concepts of a vocabulary/scheme or a single concept, depending on what was selected - see above.
Relationships between concepts and other details are expressed using SKOS of the W3C.

*** Which tools to use:

Find below two of the many tools which may be used to retrieve IPTC NewsCodes in non-HTML formats:

* wget

The famous command line tool for retrieving web content can be taylored to request one of the formats above. The command line example below retrieves the IPTC Scene NewsCodes as IPTC G2 Knowledge Item and stores them into an XML file named IPTCscene.xml

* For IPTC G2:
wget -OIPTCscene-g2.xml --header="Accept:application/vnd.iptc.g2.knowledgeitem+xml"  http://cv.iptc.org/newscodes/scene/

* For RDF/XML:
wget -OIPTCscene-rdf.xml --header="Accept:application/rdf+xml" http://cv.iptc.org/newscodes/scene/

* For RDF/Turtle:
wget -OIPTCscene.txt --header="Accept:text/turtle" http://cv.iptc.org/newscodes/scene/

 

*  Firefox with the "Modify Headers" plug-in

For the Firefox browser a plug-in for tweaking the HTTP request headers is available, it is named "Modify Headers": search for it, download and install it.
Then this can be input into the Modify Headers user interface:
Action = 'modify'
Name = 'accept'
Value: input one of the MIME types above
When you enable such an entry in the Modify Headers user interface and apply a URL for a full scheme or a single concept Firefox will retrieve it. How exactly Firefox reacts on the machine readable formats depends on its settings, in most cases it will ask you to open or to save the response - we recommend: save it and open the save file with an appropriate tool.

 

 
 
close
Recipient's email-address
Sender's email-address
(if empty the recipient's email address will be used)
Mail subject
Mail message
Please calculate and enter the result into the box to the right
=
EMAIL AS Text PDF

NEWSCODES IN A NUTSHELL

NewsCodes is ...

... the brand name for IPTC's controlled vocabularies/taxonomies.

Go and see the group of

Descriptive NewsCodes
Administrative NewsCodes
NewsML-G2 NewsCodes
NewsML 1 NewsCodes
Photo Metadata NewsCodes

A controlled vocabulary

... or taxonomy or scheme is a set of terms to express a facet of news content. Facets could be e.g. the subject, the genre, the urgency etc. A controlled vocabulary could be a flat list of terms or a hierarchical structure. In the context of the G2-Standards a vocabulary is called a 'scheme'.

Retrieve NewsCodes in many formats

IPTC NewsCodes can be downloaded from the IPTC CV server by vocabulary or by single concept. The download is available in different formats to better integrate with the Semantic Web - see >>>

Use Scheme Browser tool

For downloading and the local management of NewsCodes on your computer the Scheme Browser tools is available - for free - DOWNLOAD

Subscribe to NewsCodes feed

Feed A feed in the Atom 1.0 format is providing all changes to any of the IPTC NewsCodes