Photo credit: Jill Laurinaitis
By Stuart Myles
Chairman of the Board of Directors, IPTC
IPTC holds face-to-face meetings in several locations throughout the year, although, most of the detailed work of the IPTC is now conducted via teleconferences and email discussions. Our Annual General Meeting for 2017 was held in Barcelona in November. As well as being the time for formal votes and elections, the AGM is a chance for the IPTC to look back over the last year and to look ahead about what is in store. What follows are a slightly edited version of my remarks at IPTC’s AGM 2017 in Barcelona.
IPTC has had a good year – the 52nd year for the organization!
We’ve updated our veteran standards, Photo Metadata – our most widely-used standard – and NewsML-G2 – our most comprehensive XML standard, marking its 10th year of development.
We’re continuing to work in partnership with other organizations, to maximize the reach and benefits of our work for the news and media industry. In coordination with CEPIC we organized the 10th annual Photo Metadata Conference, looking to the future of auto tagging and search, examining advanced AI techniques – and considering both their benefits and their drawbacks for publishers. With the W3C we have crafted the ODRL rights standard and are launching plans to create RightsML as the official profile of the ODRL standard, endorsed by both the IPTC and W3C.
We’ve also tackled problems that matter to the media industry with technology solutions which are founded on standards, but go beyond them. The Video Metadata Hub is a comprehensive solution for video metadata management that allows exchange of metadata over multiple existing standards. The EXTRA engine is a Google DNI sponsored project to create an open source rules based classification engine for news.
We’ve had some changes in the make-up of IPTC. Johan Lindgren of TT joined the Board. Bill Kasdorf has taken over as the PR Chair. And we were thrilled to add Adobe as a voting member of IPTC, after many years of working together on photo metadata standards. Of course, with more mixed emotions, we have also learnt that Michael Steidl, the IPTC Managing Director, for 15 years will retire next Summer. As has been clear throughout this meeting and, indeed, every day between the meetings on numerous emails and phone calls, Michael is the backbone of the work of the IPTC. Once again, I ask you to join me in acknowledging the amazing contributions and dedications that Michael displays towards the IPTC.
Later today, we will discuss in detail our plans to recruit a successor for the crucial role of the Managing Director. And this is not the only challenge that the IPTC faces. We describe ourselves as “the global standards body of the news media” and that “we provide the technical foundation for the news ecosystem”. As such, just as the wider news industry is facing a challenging business and technical environment, so is the IPTC.
During this meeting, we’ve talked about some of the technical challenges – including the continuing evolution of file formats and supporting technologies, whilst many of us are still working to adopt the technologies from 5 or 10 year ago. We’ve also talked about the erosion of trust in media organizations and whether a combination of editorial and technical solutions can help.
But I thought I would focus on a particular shift in the business and technical environment for news that may well have a bigger impact than all of those. That shift can be traced back to 2014 which, by coincidence, is when I became Chairman of the IPTC. Last week, Andre Staltz published an interesting and detailed article called “The Web Began Dying in 2014, Here’s How“. If you haven’t read it, I recommend it. The article makes a number of interesting points and backs them up with numerous charts and statistics. I will not attempt to summarize the whole thing, but a few key points are worth highlighting.
Staltz points out that, prior to 2014, Google and Facebook accounted for less than 50% of all of the traffic to news publisher websites. Now those two companies alone account for over 75% of referral traffic. Also, through various acquisitions, Google and Facebook properties now share the top ten websites with news publishers – in the USA 6 of the 10 most popular websites are media properties. In Brazil it is also 6 out of 10. In the UK it is 5 out of 10. The rest all belong to Facebook and Google.
Both Facebook and Google reorganized themselves in 2014, to better focus on their core strengths. In 2014, Facebook bought Whastapp and terminated its search relationship with Bing, effectively relinquishing search to Google and doubling down on social. Also in 2014, Google bought DeepMind and shutdown Orkut, its most successful social product. This, along with the reorganization into Alphabet, meant that Google relinquished social to Facebook and allowing it to focus on search and – even more – artificial intelligence. Thus, each company seems happy to dominate their own massive parts of the web.
But … does that matter to media companies? Well, Facebook said if you want optimal performance on our website, you must adopt Instant Articles. Meanwhile, Google requires publishers to use its Accelerated Mobile Pages or “AMP” format for better performance on mobile devices. And, worldwide, Internet traffic is shifting from the desktop to mobile devices.
Then, if you add in Amazon, Apple and Microsoft, it is clear that another huge shift is going on. All of the Frightful Five are turning away from the Web as a source of growth and instead turning to building brand loyalty via high end devices. Following the successful strategy of Apple, they are all becoming hardware manufacturers with walled gardens. Already we have Siri, Cortana, Alexa and Google Home. But also think about the investments going on by these companies in AR and VR as ways to dominate social interactions, e-commerce and machine learning over the Internet.
So, just as news companies must confront these shifts in the global business and technology environment, so must the IPTC. During this meeting, we’ve talked about our initial efforts to grapple with metadata for AR, VR and 360 degree imagery. We’ve also discussed techniques which are relevant to news taxonomy and classification, including machine learning and artificial intelligence. At the same time, Facebook, Google and others are not totally in control, as they – along with Twitter – found themselves having to explain the spread of disinformation on their platforms and under increased government scrutiny, particular in the EU. So, all of us, whether we describe ourselves as news publishers or not, are dealing with a rapidly changing and turbulent information, technical and business environment.
What does this mean for IPTC? IPTC is a news technology standards organization. But it is also unique in that we are composed of news companies from around the world. We know from the membership survey that both of these factors – influence over technical solutions and access to technology peers from competitors, partners, diverse organizations large and small – are very important to current members. In order to prosper as an organization, IPTC needs to preserve these unique benefits to members, but also scale them up. This means that we need to find ways to open up the organization in ways that preserve the value of the IPTC and fit with the mission, but also in ways that are more flexible. We need to continue to move beyond saying that the only thing we work on is standards and instead use standards as a component of the technical solutions we develop, as we are doing with EXTRA and the Video Metadata Hub. We need to work with diverse groups focused on solving specific business and journalistic problems – such as trust in the media – and in helping news companies learn the best ways to work with emerging technologies, whether it is voice assistants, artificial intelligence or virtual reality.
I’m confident that – working together – we can continue to reshape the IPTC to better meet the needs of the membership and to move us further forward in support of solving the business and editorial needs of the news and media industry. I look forward to working with all of you on addressing the challenges in 2018 and beyond.
Stuart Myles is the Director of Information Management at Associated Press.
An updated version 2.26 of NewsML-G2 is available as Developer Release
- XML Schemas and the corresponding documentation are updated
- the Structure Matrix Excel sheet is updated
Packages of version 2.26 files can be downloaded:
- All XML Schemas plus Structure Matrix (about 60MB) from https://www.iptc.org/std/NewsML-G2/NewsML-G2_2.26.zip
- The same without XML Schema documentation in HTML (about 1MB) from https://www.iptc.org/std/NewsML-G2/NewsML-G2_2.26-noXMLdocu.zip
- New: in the newsml-g2 repository on GitHub: https://github.com/iptc/newsml-g2
All changes of version 2.26 can be found on that page: http://dev.iptc.org/G2-Approved-Changes
Reminder of an important decision taken for version 2.25 and applying to version 2.26 too: the Core Conformance Level will not be developed any further as all recent Change Requests were in fact aiming at features of the Power Conformance Level, changes of the Core Level were only a side effect.
The Core Conformance Level specifications of version 2.24 will stay available and valid, find them at http://dev.iptc.org/G2-Standards#CCLspecs
By Johan Lindgren
The Sports Content Working Group of IPTC started in the early 2000’s, initially to develop the XML standard SportsML. But the group has evolved to handle many aspects of reporting sports in the news.
The initial big question for news organisations handling sports is to decide if it should be handled as text or as data. The sports articles have, obviously, more in common with articles about other subjects. It is the results, schedules, statistics and standings that provide the dilemma. You can choose to provide the results ready for display on screen or on paper. Or you can provide the results as detailed marked up data and let the receiver handle the formatting, depending on purpose.
In fact, with using both NewsML-G2 and SportsML from IPTC you can provide both variants in parallel, if you wish so. In a NewsML-G2 news item as wrapper you provide one rendition of the content with the results as data in SportsML markup, and in another rendition you provide the same results, but in a displayable format like HTML5.
Vocabularies and Media Topics
Another big issue in handling sports data is knowing all the terms, what they mean and how they are used. The people in the sports group have spent a lot of time on this and provide very extensive vocabularies. Some are found in the Media Topics, maintained by the NewsCodes Working Group of IPTC. The same is true for the new addition to this, called facets. Facets refine the semantics of a Media Topic.
Example: If you try to combine Nordic skiing, female, relay, freestyle, 4×5 km as constituting one combined Media Topic and think of all the variations resulting from alternates to those terms, and then expand that thought to all sports events, the number of Media Topics will be overwhelming. Instead, IPTC chose to minimize the number of Media Topics and instead create a system of facets that qualify these broader topics. So, for example, “male” and “female” can apply to many, many sport competition topics, eliminating the need to create separate Media Topic terms for all of them.
Apart from the topics and their facets there is a huge number of metadata property values maintained by the sports group. These values are listed in 113 vocabularies (they can be downloaded), 37 of them are used for the core of SportsML and the other 76 are used for sport-specific additions. In total there are 1,850 values defined and listed as concepts in 113 knowledge items. The list of metadata values and their explanations is fundamental know-how in the sports reporting. You can have names and definitions in several languages.
Example of a code saying the player started the game on the field:
<conceptId qcode=”spplayerstatus:starter” />
<name xml:lang=”en-US”>starter</name><name xml:lang=”en-GB”>starter</name>
<definition xml:lang=”en-GB”>A member of the lineup that enters the field at the commencement of play.</definition></concept>
SportsML is used by news organisations around the world both for everyday sport reporting and big events. BBC, for example, built their handling of the Olympic results in London around SportsML. It is also used by organisers of so-called fantasy sports leagues. Even by just using the core you can handle most normal news reporting of all sports events and competitions. There are also plugins for eleven sports, when you want to handle very in-depth data of these sports. And more plugins can be added. There are also ways to extend the standard with your own values or constructs. When developing SportsML the aim has always been to handle things in the core if the things are applicable to more than one sport. But some things are very specific to one sport and will instead be placed in its own schema which is imported and linked in proper places.
To illustrate this we can use this snippet from a soccer game:
<team-stats score=”0″ score-opposing=”2″ event-outcome=”speventoutcome:loss”>
The first line is general with the score and outcome. But the two other lines are soccer-specific with a line-formation and the number of corner-kicks this team shot in this game.
SportsML for JSON
Up until now SportsML has mainly been serialized using XML. But with increasing interest in JSON the sports group is working on also providing a schema of SportsML for JSON usage. The work is close to being ready for the first public release. Some details of the schema need to be finalized and then the Working Group provide samples and some tools. We’re hoping to have this ready to release by early 2018.
The release of 3.0 of SportsML in XML also provided some tools (see our Github repository), mainly to transform between the earlier version, 2.2, and 3.0. One of the big developments in 3.0 was the possibility to handle statistics either in generic structures or in specific structures. So there are tools to transform between the two variants. To show this we can compare the above soccer example with the similar generic sample:
<stat stat-type=”spsocstat:line-formation” value=”433″/>
<stat class=”spct:offense” stat-type=”spsocstat:corner-kicks” value=”2″/>
As you see the attribute names become type-values in the generic stat-construction.
The work in the Sports Content Group is completely done by volunteers. The members of the group work in the news business and contribute to the group as much as their work allows. We welcome all interested persons, e.g. by joining our public discussion forum. The more people who can contribute the better, and there seem to be a never-ending flow of interesting topics when you start talking about sports data.
Johan Lindgren is the Chair of the Sports Content Working Group and a developer at TT Nyhetsbyrån, Sweden.