Day 3 of the Lisbon meeting was all about metadata and controlled vocabularies, rights, and a look to the future of IPTC’s work plan.
We started with an update from Jennifer Parrucci, Senior Taxonomist at New York Times and lead of the IPTC NewsCodes Working Group, who gave an update of the group’s activities over the past six months. We have been focussing on updating our core subject taxonomy Media Topics, including updates to term labels and definitions, and also integrating and updating mappings to Wikidata entities that were kindly provided by Thad Guidry from the schema.org community.
Integrating Wikidata mappings was an interesting challenge as we didn’t always have good mappings, for example for “arts, culture, entertainment and media” there is no Wikidata entity that is broad enough to encompass all of those terms. But for the leaves of our tree, most terms had mappings, and for those that didn’t we will be suggesting new terms in Wikidata to accommodate them. We will also look at updating the mappings from Wikidata back to Media Topics now that we have updated the mappings in the other direction. Brendan Quinn presented some new tools used for managing NewsCodes internally, plus a new web tree browser view of Media Topics which will be launched very soon.
Translating Media Topics is another hot issue, with a recent contribution from the Swedish media that is now available as a Swedish language version of Media Topics. We have made it easier to find the language translations in the NewsCodes browser, and have also added some new terms that were suggested by the Swedish media consortium that will be using the new Swedish translation of Media Topics as their categorisation system for sharing content in the future. We realise that nearly ten years after moving from SubjectCodes to Media Topics as the standard IPTC subject classification, we still don’t support as many languages in Media Topics as we do in SubjectCodes so we want to make it as easy as possible to perform translations. Our discussion was based on the useful idea that anything with an existing translation in SubjectCodes can be directly taken into a Media Topics translation, and we can use the SubjectCode and Wikidata mappings to extract suggested term to get a translation team started. We have interest in creating Media Topics translations in Portuguese (for both Portugal and Brazil) and Chinese. If you are interested in helping with translations, please let us know.
Johan Lindgren from TT in Sweden spoke about the project that led to the Swedish translations and also discussed how they are approaching handling entities (names and organisations). This led to a wider discussion led by Stuart Myles of how to handle lists of entities and whether IPTC should be working on a standard or a best practice document in that area. We also discussed the idea of a taxonomy for describing images in a stylistic way (such as “happy”, “blue”, or “outdoors”) as opposed to describing the content. Such a standardised controlled vocabulary could be useful to image libraries and AI classification engines. This is an area of active work for us and more information will be available in the coming months. If you want to help, talk to us!
Invited guest Carlos Amaral from local company Priberam demonstrated their text mining and visualisation system created in partnership with Deutsche Welle and other broadcasters for use in browsing stories according to subject, image, extracted entities and keywords.
Stéphane Guérillot from AFP presented his new API for retrieving news content, which led to more discussion of whether IPTC should be standardising an API that could be used by multiple news providers to share their content.
Michael Steidl spoke on RightsML and Blaise Galinier from BBC talked about their current project looking at viewing news content based on rights. Two key insights from Blaise’s talk: Firstly, any demonstration of what is or isn’t usable is always based on the particular user and the context in which they want to use a piece of media. Also, it’s not enough to show a journalist what they can and can’t use; they need to know why a piece of content is “red” “green” or “amber”.
Everyone had a great time at this year’s 2019 Spring Meeting, we’re already planning the next one in Ljubljana, Slovenia in October. Members: please save the dates 14 – 16 October 2019. If you’re not a member but you would like to present at the meeting, please get in touch!
Tuesday was our biggest day in terms of content and also in terms of people! We had 40 people in the meeting room which was a tight squeeze, thanks to everyone for your understanding!
The topic focus for Day 2 was Photo and Video, so it was natural that the day was kicked off by Michael Steidl, lead of the IPTC Photo and Video Working Groups. As we had a lot of new members and new attendees in the audience, Michael gave an overview of how IPTC Photo Metadata has come to where it is today, used by almost all photography providers and even used in Google Image Search results (see our post from last year on that subject). The Photo Metadata Working Group is currently conducting a survey of Photo Metadata usage across publishers, photo suppliers (such as stock photo agencies and news wires), and software makers. Michael gave a quick preview of some of the results but we won’t spoil anything here, you will have to wait for the full results to be revealed at the 2019 IPTC Photo Metadata Conference in Paris this June. Brendan Quinn also presented a status report on the IPTC Photo Metadata Crawler which examines usage of IPTC Photo Metadata fields at news providers around the world. This will also be revealed at the Photo Metadata Conference.
Next, invited visitors Ilkka Järstä and Marina Ekroos from Frameright presented their solution to the problem of cropping images for different outlets, for example all of the different sizes required for various social media. They embed the crop regions using embedded metadata which is of great interest to the Photo Metadata Working Group, as we are looking at various options for allowing region-based metadata to cover not only an image as a whole but a region within an image, in a standardised way.
We had a workshop / discussion session on the recently ratified EU Copyright Directive which will impact all media companies in the next two years. Voted through by the European Parliament this month after intense lobbying from both sides, it could easily be bigger than GDPR, so it’s important for media outlets around the world. Discussion included how and whether IPTC standards could be used to help companies comply with the law. No doubt we will be hearing more about this in the future.
Michael then presented the Video Metadata Working Group‘s status report, including promotional activities at conferences and investigations to see what use cases we can gather from various users of video metadata amongst our members and in the wider media industry.
Then Abdul Hakim from DPP showed a practical use of video metadata in the DPP Metadata for News Exchange initiative which is based on NewsML-G2. An end-to-end demonstration of metadata being carried through from shot planning through the production process all the way to distribution via Reuters Connect. See our blog post about the Metadata for News Exchange project for more details.
Then Andy Read from BBC presented the BBC’s “Data flow for News” project, taking the principles of metadata being carried through the newsroom along with the content, looking at how to track the cost of production of each item of content and also its “audience value” across platforms to calculate a return on investment figure for all types of content. Iain Smith showed the other side of this project via a live demonstration of the BBC’s newsroom audience measurement system.
After lunch, Gan Lu and Kitty Lan from new IPTC member Yuanben presented their approach to rights protection using blockchain technology. Yuanben run a blockchain-based image registry plus a scanner that detects copyright infringements on the web. Using blockchain as proof of existence has been around for a while but it’s great to see it being used in such a practical context, very relevant for the media industry.
Lastly, another new member Shutterstock was represented by Lúí Smyth who gave us an overview of Shutterstock’s current projects relating to large-scale image management: they have over 260 million images, with over 1 million images added each week! Shutterstock are using the opportunity of refreshing their systems to re-align with IPTC standards and to learn what their suppliers, partners and distributors expect, and we look forward to helping them tackle shared challenges together.