SportsML 2.0: IPTC G2-Standards Compliance Guide

This version:
Final Release: July 3, 2008 (Draft 4)

Abstract

This document is a guide for users of SportsML Version 1 who wish to upgrade to SportsML 2.0. SportsML 2.0 complies with the IPTC G2-Standards, a framework for news metadata. It explains the G2 news format with reference to sports content and standard SportsML structures.

Table of Contents

1. Overview of IPTC G2-Standards

The IPTC's G2 Standard provides a unified framework for packaging and exchanging news content. It specifies a standard model for news metadata regardless of the content or media type. G2 has powerful taxonomy management, better interactivity among IPTC standards (SportsML, EventsML and NITF), and flexibility regarding level of implementation. For more information visit the IPTC's G2 Standards Home Page.

While SportsML 2.0 files can exist as standalone files outside of the G2 Framework, they can also take advantage of the more flexible and extensive metadata and packaging capabilities inherent in G2.

This document explains the G2 model as applied to SportsML. It compares "G1" SportsML to the same news content rendered into the G2 architecture, and is meant as a guide for those familiar with SportsML who are interested in converting to SportsML-G2.

2. Introducing SportsML-G2

SportsML users already familiar with SportsML 1.8 standalone content will note the following basic differences between a SportsML 2.0 / G2 file and a SportsML 1.8 file:

The basic content, whether news or statistics, of SportsML remains largely the same, whether "G1" or G2 (see accompanying sample documents). What changes is the way metadata about the document and its contents are handled. All the metadata previously stored in SportsML elements such as "sports-metadata", "event-metadata", "team-metadata", etc. now "bubble up" to the universal newsItem metadata properties. For example, the declaration of the sport, league and teams to which a document refers is expressed as follows in SportsML 1.8:

<sports-content>
	<sports-content-codes>
		<sports-content-code code-type="sport" code-key="15007000" code-name="Baseball"/>
		<sports-content-code code-type="league" code-key="l.mlb.com" code-name="Major League Baseball"/>
		<sports-content-code code-type="team" code-key="l.mlb.com-t.19" code-name="Philadelphia Phillies"/>
		<sports-content-code code-type="team" code-key="l.mlb.com-t.26" code-name="Arizona Diamondbacks"/>
	</sports-content-codes>

This is transferred to the "contentMeta" portion of the G2 wrapper:

<newsItem>
  ...
  <contentMeta>
     <subject qcode="subj:15007000">
            <name xml:lang="en-US">Baseball</name>
            <broader qcode="subj:15000000"/>
     </subject>
     <subject type="spcpnat:league" qcode="league:l.mlb.com">
             <name xml:lang="en-US">Major League Baseball</name>
             <broader qcode="subj:15007000"/>
     </subject>
     <subject type="spcpnat:team" qcode="team:l.mlb.com-t.19">
             <name>Philadelphia Phillies</name>
     </subject>
     <subject type="spcpnat:team" qcode="team:l.mlb.com-t.26">
             <name>Arizona Diamondbacks</name>
     </subject>
  </contentMeta>

Some changes in SportsML-G2 occur in the in line SportsML. The most important of these is the conversion of all controlled codes and vocabularies into two-part "QCodes". Here is an example of a QCode for a team's unique key:

<team-metadata team-key="team:l.nhl.com-t.14" alignment="away">

The formation "team:l.nhl.com-t.14" is a scheme-code pair. The left side of the colon--indicated by the alias "team"--represents the scheme and the right side is an actual code value from the controlled list of team IDs. QCodes will be discussed later in this document. For now, just keep in mind that they offer a more powerful and flexible way to handle controlled vocabularies as well as the ability to resolve the semantics of disparate vocabularies.

The other important change is in the structure of names. This is discussed in the section IPTC G2-Standards Person Naming Format.

3. The IPTC G2-Standards <newsItem>

The newsItem element is the main G2 container for news content. It consists of four main parts: catalog references, item medadata, content metadata and the in line or referenced content. The following will introduce each of these four parts and show where SportsML 1 metadata items are slotted in G2.

3.1 catalogRef

The <catalogRef/> element can be one of the first children of <newsItem>, and provides pointers to externally stored vocabularies. If you are running a G2 Processor, you can validate your dynamic controlled vocabularies against XML catalogs of allowed values.

SportsML 2.0 / G2 Example:

<newsItem guid="tag:xmlteam.com,2008:xt.5932656-pitcher-preview" standardversion="2.0" standard="SportsML-G2">
  <catalogRef href="http://iptc.org/std-dev/NAR/1.1/specification/IPTC-NewsCodesCatalog_6.xml"/>
  <catalogRef href="http://sportsml.org/NAR/1.0/specification/IPTC-SportsCodesCatalog_1.xml"/>
  <catalogRef href="http://www.xmlteam.com/specification/xts-SportsCodesCatalog_1.xml"/>
	...

The example above has references to three catalogs. The first is to the main IPTC news codes. The second is to sports-specific codes defined by SportsML. The third is a proprietary catalog for metadata properties beyond the scope of the IPTC. Any organization can add their own code catalog to suit their specific needs. Codes and vocabularies will be discussed further in QCodes.

The item's unique ID is goes into the "guid" attribute. You also declare the version and version standard in the newsItem element.

3.2 itemMeta

<itemMeta> stores information about the item itself (not its content), usually pertaining to the management of the news item. This would include things like provider, publication timestamp, document ID, publication status, media type, etc.

SportsML 2.0 / G2 Example:

    <itemMeta>
        <itemClass qcode="ninat:text"/>
        <provider qcode="web:www.xmlteam.com"/>
        <versionCreated>2007-05-28T11:17:00-04:00</versionCreated>
        <pubStatus qcode="stat:usable"/>
        <fileName>xt.5932656-preview.xml</fileName>
    </itemMeta>

SportsML 1.8 placed this info in the following elements:

Old SportsML 1.8 Example:

  <sports-metadata date-time="20070528T111700-0400" doc-id="xt.5932656-preview" language="en-US">
    <sports-title>Preview: 
Arizona Diamondbacks (29-23) at Philadelphia Phillies (26-24), 7:05 p.m.</sports-title>
    <sports-content-codes>
      <sports-content-code code-name="XML Team Solutions, Inc." code-key="xmlteam.com" code-type="distributor" />
      <sports-content-code code-type="priority" code-key="normal" />
    </sports-content-codes>
  </sports-metadata>

3.3 contentMeta

<contentMeta> stores information that describes the content of the item. This would include the information about the creation of the item (publisher, writer, etc.), what kind of a content item it is (genre) and what the item is about (sports-event, team, player, etc.).

This information would reside in sports-metadata, event-metadata, team-metadata, player-metadata and other metadata elements in SportsML 1.

The contentMeta element can contain the following elements:

SportsML 2.0 / G2 example:

<contentMeta>
    <contentCreated>2007-05-28T11:17:00-04:00</contentCreated>
    <located qcode="city:Philadelphia">
        <broader qcode="reg:Pennsylvania"/>
        <broader qcode="cntry:USA"/>
    </located>
    <creator qcode="web:sportsnetwork.com">
        <name>The Sports Network</name>
    </creator>
    <altId type="idtype:tsn-id" id="sportsnetwork.com-5932656"/>
    <altId type="idtype:revision-id"
        id="l.mlb.com-2007-e.19358-pre-event-coverage-sportsnetwork.com"/>
    <genre type="gnre:fixture" qcode="fixture:pitcher-preview">
        <name xml:lang="en-US">Game Pitcher Preview</name>
        <broader qcode="doc-class:event-summary"/>
    </genre>
    <genre type="sml-genre:doc-class" qcode="doc-class:event-summary"/>
    <genre type="xts-genre:tsn-fixture" qcode="tsn-fixture:mlbpreviewxml"/>
    <language tag="en-US"/>
    <subject qcode="subj:15000000">
        <name xml:lang="en-US">sport</name>
    </subject>
    <subject qcode="subj:15007000">
        <name xml:lang="en-US">Baseball</name>
        <broader qcode="subj:15000000"/>
    </subject>
    <subject qcode="subj:15007001">
        <name xml:lang="en-US">Major League Baseball</name>
        <broader qcode="subj:15007000"/>
    </subject>
    <subject type="spcpnat:conf" qcode="conf:l.mlb.com-c.national">
        <name xml:lang="en-US">National</name>
        <broader qcode="league:l.mlb.com"/>
    </subject>
    <subject type="spcpnat:event" qcode="event:l.mlb.com-2007-e.19358"/>
    <subject type="spcpnat:team" qcode="team:l.mlb.com-t.19">
        <name>Philadelphia Phillies</name>
    </subject>
    <subject type="spcpnat:team" qcode="team:l.mlb.com-t.26">
        <name>Arizona Diamondbacks</name>
    </subject>
    <subject type="spcpnat:person" qcode="person:l.mlb.com-p.456">
        <name>Doug Davis</name>
        <sameAs qcode="fssID:45679"/>
    </subject>
    <subject type="spcpnat:person" qcode="person:l.mlb.com-p.123">
        <name>Freddy Garcia</name>
        <sameAs qcode="fssID:45680"/>
    </subject>
    <headline>Pitcher Preview: Arizona Diamondbacks (29-23) at Philadelphia Phillies
        (26-24), 7:05 p.m.</headline>
    <slugline separator="-">AAV!PREVIEW-ARI-PHI</slugline>
</contentMeta>

SportsML 1.8 example (the metadata properties that correspond to those in the G2 example above):

<sports-content>
  <sports-metadata date-time="20070528T111700-0400" doc-id="xt.5932656-preview" language="en-US" 
  revision-id="l.mlb.com-2007-e.19358-pre-event-coverage-sportsnetwork.com" fixture-key="pre-event-coverage" 
  document-class="event-summary" fixture-name="Game Preview">
    <sports-title>Preview: Arizona Diamondbacks (29-23) at Philadelphia Phillies (26-24), 7:05 p.m.</sports-title>
    <sports-content-codes>
      <sports-content-code code-name="The Sports Network" code-key="sportsnetwork.com" code-type="publisher" />
      <sports-content-code code-name="XML Team Solutions, Inc." code-key="xmlteam.com" code-type="distributor" />
      <sports-content-code code-type="sport" code-key="15007000" code-name="Baseball" />
      <sports-content-code code-type="league" code-key="l.mlb.com" code-name="Major League Baseball" />
      <sports-content-code code-type="priority" code-key="normal" />
      <sports-content-code code-type="conference" code-key="c.national" code-name="National" />
      <sports-content-code code-type="team" code-key="l.mlb.com-t.19" code-name="Philadelphia Phillies" />
      <sports-content-code code-type="team" code-key="l.mlb.com-t.26" code-name="Arizona Diamondbacks" />
    </sports-content-codes>
  </sports-metadata>

	...

<event-metadata date-coverage-type="event" event-key="l.mlb.com-2007-e.19358" event-status="pre-event" 
start-date-time="20070528T190500-0400">

	...

<team-metadata team-key="l.mlb.com-t.26" alignment="away">
    <name first="Arizona" last="Diamondbacks" />
</team-metadata>

	...

<player-metadata player-key="l.mlb.com-p.456">
    <name full="Doug Davis" />
</player-metadata>

3.4 contentSet

The contentSet element contains the reference to the news item or the news item itself within the inlineXML element.

SportsML 2.0 / G2 Example:

<contentSet>
  <inlineXML contenttype="application/sportsml+xml">
	  <sports-content>
           	 <sports-event>
           	 ...

You can also put NITF news articles also into <inlineXML>:

<contentSet>
  <inlineXML contenttype="application/nitf+xml">
	  <nitf>
  		 <body>
  		 ...

Note: the value of the "contenttype" attribute of <inlineXML> is a mime type.

4. QCodes

QCode stands for "qualified code" and is a compact way of identifying a concept such as a league, team, event or player. A QCode contains a scheme alias and a code value separated by a colon. Here is an example:

    <subject qcode="team:l.mlb.com-t.19">
        <name>Philadelphia Phillies</name>
    </subject>

In the example above, "team" is the alias for a scheme that contains the code, or identifier, for the Philadelphia Phillies (l.mlb.com-t.19). The scheme's identifier is a URI listed in the SportsML catalog file which is referred to in a <catalogRef> element at the top of the news item:

    <catalogRef href="http://sportsml.org/std/catalog/catalog.IPTC-SportsCodesCatalog_1.xml"/>

This file contains a list of scheme URIs and their aliases:

  <catalog>
	  <scheme alias="team" uri="http://cv.sportsml.org/sportscodes/team/" />
	  <scheme alias="spcpnat" uri="http://cv.sportsml.org/sportscodes/sportscpnature/" />
	  <scheme alias="conf" uri="http://cv.sportsml.org/sportscodes/conference/" />

The "team" scheme has a URI of http://cv.sportsml.org/sportscodes/team/ which is the identifier of another G2 item class called a "knowledge item". Knowledge items list any number of concepts (such as baseball teams), their identifiers and other concept properties. Here is an excerpt from the knowledge item for "team":

<knowledgeItem>
   	  <conceptSet>
       	  <concept>
           	  <conceptId type="subj:team" qcode="team:l.mlb.com-t.19"/>
           	  <type qcode="spcpnat:organization"/>
           	  <name>Philadelphia Phillies</name>
           	  <broader qcode="div:l.mlb.com-d.nleast">
               	  <name xml:lang="en-US">NL East</name>
               	  <broader qcode="conf:l.mlb.com-c.national">
                   	  <name xml:lang="en-US">National</name>
               	  </broader>
           	  </broader>
           	  <sameAs qcode="fssID:Phi"/>
           	  <sameAs qcode="stID:PHILADELPH"/>
           	  <sameAs qcode="tsnID:008"/>
           	  <sameAs qcode="tsnID:022"/>
           	  <sameAs qcode="dbestID:PHI"/>
       	  </concept>
       	  <concept>
           	  <conceptId type="subj:team" qcode="team:l.mlb.com-t.17"/>
           	  <type qcode="spcpnat:organization"/>
           	  <name>Washington Nationals</name>
		...

The first concept asserts l.mlb.com-t.19 is the code for the Philadelphia Phillies. It also states, using the "sameAS" element, that there are several other codes from other content providers--each with its own alias--which refer to the same team.

Anyone can create their own controlled vocabulary and reference it within QCodes in much the same way namespaces are referenced in XML. A provider should make schemes publicly available as a knowledgeItem.

4.1 QCodes inside SportsML

QCodes apply to all controlled code sets and vocabularies. Proper implementation in SportsML requires all keys to conform to QCodes style. As a result SportsML attributes such as league-key, event-key, team-key, player-key, etc. must contain QCodes in order to comply with the G2 standard. Here are some examples of what this would look like:

<event-metadata event-key="event:l.mlb.com-2007-e.19358" date-coverage-type="event" event-status="pre-event" 
start-date-time="20070528T190500-0400">

	...

<team-metadata team-key="team:l.mlb.com-t.26">
    <name first="Arizona" last="Diamondbacks" />
</team-metadata>

	...

<player-metadata player-key="player:l.mlb.com-p.456">
    <name full="Doug Davis" />
</player-metadata>

4.2 QCodes and Data Interoperability

QCodes were designed with "semantic web" technologies such as RDF, RDFa, OWL, etc. in mind. These W3C standards offer uniform ways to expose your data on the web in machine-readable format. This document cannot provide a full description of the Semantic Web and its technologies. See the Further Reading section for more information.

5. IPTC G2-Standards Naming Format

G2 provides a system for marking up names that is flexible and culturally neutral. SportsML 1 has explicit attributes of a name element such as "first" and "last". For example:

<player-metadata player-key="player:l.mlb.com-p.456">
	<name full="Doug Davis" first="Doug" last="Davis"/>
</player-metadata>

Instead G2 provides the single property "name" with two attributes: role and part. For example:

<player-metadata player-key="player:l.mlb.com-p.456">
	<name role="nrol:full">Doug Davis</name>
	<name part="nprt:given">Doug</name>
	<name part="nprt:family">Davis</name>
</player-metadata>

"nrol" and "nprt" are controlled vocabularies which can be expanded as needed. Role can have values like "short", "full", "alternate", "adjectival", "sort", "display", etc. Part can have values like "given", "family", "middle", "salutation", "acadTitle", etc.

Note: SportsML 2.0 still supports the "classic" name attributes of SportsML 1. These are now deprecated, however, and will be removed from the next major upgrade of the specification.

6. Further Reading and Tools