Day 2 of the IPTC Autumn Meeting 2019 was just as busy as Day 1: we heard from the IPTC NewsCodes Working Group, the AI Expert Group, and the News Architecture Working Group including updates on IPTC’s work on trust and credibility projects. We also had updates from the Video Metadata Working Group, an update on IPTC’s Rights work, and news from the Sports Content Working Group. Phew!
Jennifer Parrucci from the New York Times, lead of the IPTC NewsCodes Working Group, introduced IPTC NewsCodes and discussed recent progress, including cleaning up large parts of the Media Topics vocabulary. The Working Group also announced new language translations coming very soon: Portuguese and Brazilian Portuguese are ready, Chinese is almost ready, and some other language versions are in progress.
We also had an interesting and productive discussion about the workflow and process around Media Topics translations. As the team adds and retires terms and definitions, how should translations be managed? Should we not publish changes until we have translations in all languages? Or should there be a core of languages that require translations? Should we publish interim versions with un-synced changes and less frequent “stable” versions of Media Topics including all translations? We are having success using GitHub issues to manage regular changes to the taxonomy: can technology also help in managing the translation process and if so, which tools? Many ideas and thoughts were shared, including the perspectives of many member organisations who already work across multiple languages.
Tao Chen, VP of Machine Learning at 500px and lead of the new AI Expert Group, gave a great overview of the latest developments in AI affecting the media industry. From practical developments, like removing backgrounds from stock images, detecting copyright infringement and assessing the commercial potential of images, to the dangers of face swapping apps and a potential future of completely generated images that feature no real human beings, we learned a lot about how AI affects us today and tomorrow. We are building up the AI Expert Group to become the place where media technologists can go to learn the latest on AI and Machine Learning issues, apply the latest techniques in the media industry, and share ideas with their peers. If you’re a member and not yet involved, please talk to Tao or Brendan to get started.
Next up, Brendan Quinn spoke about IPTC’s recent work with the Journalism Trust Initiative and The Trust Project, on mapping their “trust indicators” to IPTC standards (particularly NewsML-G2) so news providers can show how they comply with trust criteria. Look out for some announcements about this work in the next few weeks. Then Dave Compton of Refinitiv, lead of the News Architecture Working Group gave an update on recent work on NewsML-G2, including the trust and credibility work, a NewsML-G2 2.28 errata release fixing some small typo errors, updates to the NewsML-G2 Guidelines and the NewsML-G2 Specification documents, work on making local extensions to Media Topics, and future work, including looking at how to represent auto-generated content, and better alignment with ninjs (see Monday’s wrap-up post for more on our recent ninjs updates).
After lunch, Pam Fisher of The Media Institute at University College London spoke about her project to build a read/write API that maps metadata between various video formats. We will link to a demo as soon as it is available. Pam also discussed “compact video signatures”, part of MPEG7, which are being used to make content fingerprints for video content, used for infringement detection and content matching.
Pam’s talk was very relevant to the next discussion by Michael Steidl, lead of the Video Metadata Working Group updating on recent progress. The Working Group has been looking at new video APIs and understanding how IPTC members and others are using video metadata in their work, either with or without IPTC Video Metadata Hub.
In the afternoon Michael Steidl presented again with an update on his work with W3C’s ODRL group which impacts on RightsML. Johan Lindgren presented in lieu of Paul Kelly, new Lead of the Sports Content Working Group, giving an update on the Working Groups efforts to interview IPTC members and others about their use of sports data and to position SportsML and our work on SportsJS in the context of the news and media industry.
Finally we bade farewell to Stuart Myles, outgoing Chair of IPTC. We presented Stuart with a small token of our thanks for chairing the Board of Directors of IPTC since 2014, and has been involved with IPTC as a delegate since 1999! We will definitely miss his contributions, intelligence, common sense and enthusiasm, and we hope to see him involved with IPTC again in the future in some way.
We are now back after a stimulating and entertaining IPTC Autumn Meeting in beautiful Ljubljana, Slovenia last week!
Thanks very much to Aljoša Rehar from IPTC member Slovenska tiskovna agencija (STA) for inviting us and helping out so much with the organisation, along with his colleague Marjana Polajnar and with support from Marko Grobelnik from another IPTC member organisation in Slovenia, the Josef Stefan Institute.
Over three days, we heard presentations from all IPTC Working Groups, the new AI Expert Group and the 2019 IPTC Annual General Meeting. We also heard presentations from invited startups and research projects such as the Content Personalisation Network from Digital Catapult UK and Slovenian projects EventRegistry, NewsMapper, Embeddia, Finspektor and more. Look out for our detailed post about Wednesday afternoon’s session for more about our invited speakers.
On Monday, Brendan Quinn, Managing Director of IPTC gave an introduction to the event and all attendees introduced themselves. We had a great turnout with members coming from all over Europe, Asia and both coats of the US. Brendan also gave an update on recent work of the IPTC Board and some decisions that are coming soon.
Monday’s focus was on both Photo Metadata and JSON standards. We heard from Michael Steidl, Lead of the Photo Metadata Working Group who gave an update on the recent work of the group, including the Photo Metadata Conference 2019 in Paris, recent work with Google, our latest Photo Metadata Survey, and exciting new work on introducing an Image Region capability to the IPTC Photo Metadata Standard, which will let photographers and image creators annotate specific areas of an image with any metadata fields, such as naming each person in an image exactly; identifying products, brands, logos, barcodes or other objects in an image, identifying composite images correctly, and allowing AI annotations to be embedded in the image file rather than distributed in a separate file alongside the image.
Johan Lindgren, who has recently moved from leading the Sports Content Working Group to now leading the News in JSON working group, spoke about the recent work on reviving the group. We are now meeting every two weeks like the other working groups, and plan to make many changes to our main JSON standard ninjs in the coming months. Johan presented an overview of how IPTC members are currently using JSON in their news distribution work, either based on ninjs or using their own formats. Based on change requests received in out GitHub project, the working group identified some “quick wins” that we could easily add to ninjs, and so Johan proposed ninjs 1.2 to the Standards Committee. Johan also showed recent work on a new ninjs User Guide to replace the pages at dev.iptc.org, and on a test suite so we know changes we make to the ninjs schema will be compatible with previous work and not introduce any errors.
Day 1 ended with a presentation of the Content Personalisation Network project from Luca and Anthony from Digital Catapult in the UK. The work on tailoring content for users based on metadata is very relevant to our members and we hope to be able to work a lot more with the Digital Catapult team in the future.
Day 1 ended with a group dinner in a restaurant at Ljubljana Castle, overlooking the beautiful Old Town. After a day sitting inside it was great to have some good exercise walking up the steep hill to get there, and we were rewarded with some great local food.
Stay tuned for more updates from the Ljubljana meeting. If you couldn’t make it to Ljubljana, why not attend our next event in Tallinn, Estonia in May 2020?
Last week’s 2019 IPTC Photo Metadata Conference was again hosted in association with the CEPIC Congress. This year’s conference was held in a slightly rainy Paris but at least that meant that we didn’t mind staying indoors in late May.
The event kicked off with an introduction from event chair Stéphane Guérillot from AFP, who is also on the Board of IPTC and Chair of the IPTC Standards Committee. The theme of the afternoon was “putting IPTC metadata to work for your image collections” and the emphasis on practical outcomes was a constant refrain.
The first panel was around the question of “do we still need IPTC Photo Metadata?” Michael Steidl, lead of the IPTC Photo Metadata Working Group started off by presenting results from the IPTC Photo Metadata surveys that the Working Group has undertaken earlier this year. Lúí Smyth from Shutterstock showed how metadata has helped them to organise millions of photos from thousands of sources. Isabelle Wirth, photo editor at AFP discussed how the agency uses IPTC Photo Metadata along with other IPTC standards such as News Codes and NewsML-G2 to make content searchable and shareable for their clients. And independent photographer and 3D photogrammetry expert with Deep3D, Simon Brown, explained how metadata was crucial for creating 3D views of sunken shipwrecks via tens of thousands of still photographs and some innovative software. In Simon’s words: “For more than one 3D project, projects with multiple contributors, or projects conducted over a longer period of time, IPTC entry becomes mandatory.”
The next session examined how creating and editing IPTC Photo Metadata could be improved. Sarah Saunders representing CEPIC presented results from the IPTC Photo Metadata surveys of both image suppliers and software makers showing that metadata usage has grown in sophistication but still varies greatly between independent photographers and large companies. Andrew Wiard, photographer and member of the British Press Photographers’ Association, spoke with passion about how we could improve the handling of photo metadata once it leaves the photographer’s desk, a constant goal of the Photo Metadata Working Group and which will form part of our work plan for the rest of 2019. Mayank Sagar from Image Data Systems showed some exciting tools with videos showing how their AI algorithms can detect objects from luggage and handbags for commuters to brands and logos on advertisements in sports footage, and talked about the current limits of AI classification and future issues such as how to handle artificially synthesised images. Andreas Gnutzmann of popular photo management software Fotoware showed how their system is moving to the cloud, putting metadata at its core even more than previously.
The third session looked at the end-user side and how the industry can benefit from photo metadata. Brendan Quinn of IPTC presented the Photo Metadata Crawler project, examining how news publishers around the world are embedding photo metadata in the images used on their sites. Michael Steidl showed results of the Photo Metadata Working Group’s updated analysis of social media systems and sharing platforms, which will be shared through an IPTC news article in the coming months. And Anna Dickson of Google gave us an update on her history working with images as photo editor at Huffington Post and Dow Jones among others, and discussing how Google are working with metadata and the IPTC, including our shared challenges of encouraging more site owners to publish embedded metadata so that it can be picked up by Google Search and other services. At the event, Google also announced some very interesting features that are currently in the pipeline.
Michael Steidl and Stéphane Guérillot closed out the event talking about the work the the IPTC Photo Metadata Working Group would be undertaking this year as a result of the discussions and of the survey results.
All slides from the day are available in PDF format from the event page, both to IPTC members and non-members.
Key findings from the Photo Metadata surveys will be shared in future news posts, so please watch this space for updates.
More information about the Google presentation and their proposed new features around image metadata is available to all IPTC members who have joined the Photo Metadata Working Group.
Thanks to all the speakers, to CEPIC for their assistance in hosting the conference, and to everyone who attended for making the event such a success!
We’re excited that the biggest week in the photo metadata calendar has arrived – the IPTC Photo Metadata Conference 2019 will be held in Paris this Thursday, 6 June.
We are looking forward to hearing from some IPTC members: Andreas Gnutzmann from Fotoware, Lúí Smyth from Shutterstock, Isabelle Wirth of Agence France Presse and Michael Steidl, Chair of the Photo Metadata Working Group and honourable member of IPTC. Stéphane Guerrilot, CEO of AFP Blue will be chairing the event.
We will also be hearing from independent photographer Andrew Wiard representing the British Press Photographer’s Association (BPPA), plus Anna Dickson, Visual Lead, Image Search at Google attend, bringing her expertise as one of Google’s experts on images but also with a history leading photography teams at Dow Jones and Huffington Post. Mayank Sagar from Image Data Systems will be speaking about the latest developments in automatic image tagging, and Simon Brown of Deep3D will look at the photographer’s view around embedding metadata.
Michael Steidl and Sarah Saunders will be presenting the results of the 2019 Photo Metadata Survey, where we have obtained the views of image creators, publishers and software makers regarding embedded image metadata.
Brendan Quinn, Managing Director of IPTC will be presenting the IPTC Photo Metadata Crawler which looks at usage of embedded photo metadata among news publishers.
We’re looking forward to analysing the world of photo metadata from the perspective of image creators and editors, software makers, publishers, search engines and end users.
There are still some tickets available, so please join us! Attendance is free for CEPIC Congress attendees, but if you just want to come for the IPTC event on Thursday afternoon you can register using this form for €100 + VAT.
See you there!
Day 3 of the Lisbon meeting was all about metadata and controlled vocabularies, rights, and a look to the future of IPTC’s work plan.
We started with an update from Jennifer Parrucci, Senior Taxonomist at New York Times and lead of the IPTC NewsCodes Working Group, who gave an update of the group’s activities over the past six months. We have been focussing on updating our core subject taxonomy Media Topics, including updates to term labels and definitions, and also integrating and updating mappings to Wikidata entities that were kindly provided by Thad Guidry from the schema.org community.
Integrating Wikidata mappings was an interesting challenge as we didn’t always have good mappings, for example for “arts, culture, entertainment and media” there is no Wikidata entity that is broad enough to encompass all of those terms. But for the leaves of our tree, most terms had mappings, and for those that didn’t we will be suggesting new terms in Wikidata to accommodate them. We will also look at updating the mappings from Wikidata back to Media Topics now that we have updated the mappings in the other direction. Brendan Quinn presented some new tools used for managing NewsCodes internally, plus a new web tree browser view of Media Topics which will be launched very soon.
Translating Media Topics is another hot issue, with a recent contribution from the Swedish media that is now available as a Swedish language version of Media Topics. We have made it easier to find the language translations in the NewsCodes browser, and have also added some new terms that were suggested by the Swedish media consortium that will be using the new Swedish translation of Media Topics as their categorisation system for sharing content in the future. We realise that nearly ten years after moving from SubjectCodes to Media Topics as the standard IPTC subject classification, we still don’t support as many languages in Media Topics as we do in SubjectCodes so we want to make it as easy as possible to perform translations. Our discussion was based on the useful idea that anything with an existing translation in SubjectCodes can be directly taken into a Media Topics translation, and we can use the SubjectCode and Wikidata mappings to extract suggested term to get a translation team started. We have interest in creating Media Topics translations in Portuguese (for both Portugal and Brazil) and Chinese. If you are interested in helping with translations, please let us know.
Johan Lindgren from TT in Sweden spoke about the project that led to the Swedish translations and also discussed how they are approaching handling entities (names and organisations). This led to a wider discussion led by Stuart Myles of how to handle lists of entities and whether IPTC should be working on a standard or a best practice document in that area. We also discussed the idea of a taxonomy for describing images in a stylistic way (such as “happy”, “blue”, or “outdoors”) as opposed to describing the content. Such a standardised controlled vocabulary could be useful to image libraries and AI classification engines. This is an area of active work for us and more information will be available in the coming months. If you want to help, talk to us!
Invited guest Carlos Amaral from local company Priberam demonstrated their text mining and visualisation system created in partnership with Deutsche Welle and other broadcasters for use in browsing stories according to subject, image, extracted entities and keywords.
Stéphane Guérillot from AFP presented his new API for retrieving news content, which led to more discussion of whether IPTC should be standardising an API that could be used by multiple news providers to share their content.
Michael Steidl spoke on RightsML and Blaise Galinier from BBC talked about their current project looking at viewing news content based on rights. Two key insights from Blaise’s talk: Firstly, any demonstration of what is or isn’t usable is always based on the particular user and the context in which they want to use a piece of media. Also, it’s not enough to show a journalist what they can and can’t use; they need to know why a piece of content is “red” “green” or “amber”.
Everyone had a great time at this year’s 2019 Spring Meeting, we’re already planning the next one in Ljubljana, Slovenia in October. Members: please save the dates 14 – 16 October 2019. If you’re not a member but you would like to present at the meeting, please get in touch!
Tuesday was our biggest day in terms of content and also in terms of people! We had 40 people in the meeting room which was a tight squeeze, thanks to everyone for your understanding!
The topic focus for Day 2 was Photo and Video, so it was natural that the day was kicked off by Michael Steidl, lead of the IPTC Photo and Video Working Groups. As we had a lot of new members and new attendees in the audience, Michael gave an overview of how IPTC Photo Metadata has come to where it is today, used by almost all photography providers and even used in Google Image Search results (see our post from last year on that subject). The Photo Metadata Working Group is currently conducting a survey of Photo Metadata usage across publishers, photo suppliers (such as stock photo agencies and news wires), and software makers. Michael gave a quick preview of some of the results but we won’t spoil anything here, you will have to wait for the full results to be revealed at the 2019 IPTC Photo Metadata Conference in Paris this June. Brendan Quinn also presented a status report on the IPTC Photo Metadata Crawler which examines usage of IPTC Photo Metadata fields at news providers around the world. This will also be revealed at the Photo Metadata Conference.
Next, invited visitors Ilkka Järstä and Marina Ekroos from Frameright presented their solution to the problem of cropping images for different outlets, for example all of the different sizes required for various social media. They embed the crop regions using embedded metadata which is of great interest to the Photo Metadata Working Group, as we are looking at various options for allowing region-based metadata to cover not only an image as a whole but a region within an image, in a standardised way.
We had a workshop / discussion session on the recently ratified EU Copyright Directive which will impact all media companies in the next two years. Voted through by the European Parliament this month after intense lobbying from both sides, it could easily be bigger than GDPR, so it’s important for media outlets around the world. Discussion included how and whether IPTC standards could be used to help companies comply with the law. No doubt we will be hearing more about this in the future.
Michael then presented the Video Metadata Working Group‘s status report, including promotional activities at conferences and investigations to see what use cases we can gather from various users of video metadata amongst our members and in the wider media industry.
Then Abdul Hakim from DPP showed a practical use of video metadata in the DPP Metadata for News Exchange initiative which is based on NewsML-G2. An end-to-end demonstration of metadata being carried through from shot planning through the production process all the way to distribution via Reuters Connect. See our blog post about the Metadata for News Exchange project for more details.
Then Andy Read from BBC presented the BBC’s “Data flow for News” project, taking the principles of metadata being carried through the newsroom along with the content, looking at how to track the cost of production of each item of content and also its “audience value” across platforms to calculate a return on investment figure for all types of content. Iain Smith showed the other side of this project via a live demonstration of the BBC’s newsroom audience measurement system.
After lunch, Gan Lu and Kitty Lan from new IPTC member Yuanben presented their approach to rights protection using blockchain technology. Yuanben run a blockchain-based image registry plus a scanner that detects copyright infringements on the web. Using blockchain as proof of existence has been around for a while but it’s great to see it being used in such a practical context, very relevant for the media industry.
Lastly, another new member Shutterstock was represented by Lúí Smyth who gave us an overview of Shutterstock’s current projects relating to large-scale image management: they have over 260 million images, with over 1 million images added each week! Shutterstock are using the opportunity of refreshing their systems to re-align with IPTC standards and to learn what their suppliers, partners and distributors expect, and we look forward to helping them tackle shared challenges together.
Last week brought IPTC members together for our twice-yearly Face-to-Face Meeting to discuss news credibility, taxonomies and controlled vocabularies, updates in sports standards and much more!
This year’s IPTC Spring Meeting was in Lisbon, Portugal, and over 40 IPTC member delegates, member experts and invited guests gathered for three days to discuss all the latest developments in news and media technology.
On Monday, IPTC Chair and Director of Information Management for Associated Press Stuart Myles gave a great introduction and overview of what was to come in the meeting. After everyone introduced themselves, Stuart discussed some changes that the IPTC Board has been thinking about, including looking at updating the Mission and Vision of the organisation to reflect how we operate in 2019.
Then Robert Schmidt-Nia from dpa Deutsche Presse-Agentur introduced their C-POP project (in collaboration with STT and the Sanoma group in Finland) which follows on from the Performing Content we saw at the previous meeting in Toronto. It was interesting hearing about the agency’s shift in focus from a strict business-to-business model to a “B2B2C” model thinking about what consumers needed and how agencies could help publishers to deliver on the needs of readers and subscribers, ideally using feedback from publishers to agencies on how well their content is performing according to real metrics like loyalty and subscription revenue. IPTC will be involved in the C-POP project so you can expect to hear more about this in the future.
On the same topic, Andy Read from BBC gave an overview of the “Telescope” internal measurement tool, showing how BBC staff can view in real time how their content is being consumed by region, topic or device.
James Logan from the BBC and Brendan Quinn of IPTC gave an overview of IPTC’s work with news trust and credibility projects The Trust Project and the Journalism Trust Initiative. We decided at the Autumn 2018 Meeting that IPTC wouldn’t create its own standard around news credibility, disinformation and “fake news”, but that we would work with existing groups and help them to incorporate their standards in IPTC’s work. With The Trust Project, that has been going well, and we are almost ready to publish some best practices on implementing the Trust Project’s Trust Indicators in NewsML-G2 content. Trust Project indicators are already used in schema.org markup by over 120 news providers so it’s great to see such strong uptake.
Separately we have been working with Reporters Sans Frontières’ Journalism Trust Initiative which is at an earlier stage and is looking at documenting general standards for trustworthy and ethical journalism. IPTC is part of the JTI’s Technical Task Force which is working with the drafting teams on making their statements specific enough to be answered with data and indexed by machines. Hopefully it will end up with similar indicators to the Trust Project indicators
With both news credibility projects, some questions still need to be addressed, such as assessing the credibility of claims (when a news organisation says they are trustworthy, how can you trust them!), and how these trust indicators work in a multi-provider workflow: if a news agency sends some content to a publisher who then merges it with original reportage, who determines the trust indicators that are attached to the final story? There is definitely a lot more work to do!
Joaquim Carreira from local agency Lusa showed us the “Combate Às Fake News” project focussing on media literacy and helping readers to know what to look for, including the idea of a “nutrition label” for news content looking at criteria such as factuality, readability and use of emotional language.
The day was rounded off with Johan Lindgren of Swedish agency TT presenting the recent work of IPTC’s Sports Content Working Group. The group has recently been tidying up the spec and incorporating suggestions for changes, plus looking at eSports and Chess as two non-traditional sports that are both seeing an increase in interest – in the case of eSports, it is becoming a huge industry. Our tests showed that in simple cases eSports results can be addressed with existing SportsML 3 structures, but to handle more detailed play-by-play results we may need to at least introduce a new controlled vocabulary. Please let us know if you would like to implement SportsML for eSports!
Johan also presented the draft of SportsML 3.1 to be voted on by the IPTC Standards Committee.
Stay tuned for an update on Days 2 and 3!
This report was presented by Stuart Myles, IPTC Chairman, at the IPTC Annual General Meeting in Toronto, Canada on October 17 2018.
IPTC has had a good year – the 53rd year for the organization!
We’ve updated key standards, including NewsML-G2, the Video Metadata Hub and the Media Topics, as well as launching RightsML 2.0, a significant upgrade in the way to express machine processable rights for news and media.
Of course, IPTC standards are a means, not an end. The value of the standards is the easier exchange, consumption and handling of news and media by organizations large and small around the world. So it is important that we continue to focus on making our standards straightforward to use and have them adopted as widely as possible. I think we are making progress on the usability front, such as moving away from zip’d PDFs towards actual HTML web pages for documentation of NewsML-G2. Over the last year, we’ve continued to work with other organizations – W3C, Europeana and MINDS – to develop standards, increase adoption – and, perhaps most importantly, to open up IPTC to other perspectives. And we have had a huge win in the recognition of key photo metadata by Google Images. But we clearly need to do more for both usability and adoption. During the course of this meeting, we’ve had some good discussion about what more we can do in both areas and I encourage all members to help spread the word about IPTC standards, and suggest ways we can accelerate adoption.
Of course, the nature of news and media continues to evolve. On the one hand, new forms of story telling are emerging, such as Augmented Reality and Virtual Reality. Equally, using data as the way to power stories continues to increase both data-driven stories and data-supported stories. By data-driven stories, I mean journalists reviewing large databases of information and creating stories based on the trends they find. By data-supported stories, I mean content creators using visually-interesting graphics to support their content. The automated production, curation and consumption of news and media is likely to increase for the foreseeable future, driven by both technological improvements and the seductive economics of replacing people with algorithms. And it is not only economics which are driving these changes and challenges, just as it is no longer fill-in-the-blank text stories being written by robot journalists. Synthetic media – such as “deep fakes” – are able to produce increasingly convincing photo, video and audio stories that are indistinguishable from “real” media. Inevitably, the existence and debunking of these fakes will be used to deny legitimate reporting, with the implications of continued erosion of trust in media. All of these trends – AR, VR, data-powered journalism and dealing with trust, credibility and misinformation – are topics which IPTC has discussed over the last few years, but we have not developed any tracks of work to try to address them. In part, this is because these are, by definition, outside of the areas that our member organizations traditionally deal in and are so quite difficult to tackle in terms of establishing standards.
However, even within the context of standards, IPTC is opening up to new forms of experimentation. As we heard on Monday, the joint project between IPTC and MINDS, to allow for the identification of audience and interest metadata, has lead to the introduction of structures within NewsML-G2 to support rapid prototyping and experimentation. I see this as a positive move, with great potential to accelerate the work we do and to help keep it lightweight and relevant.
Of course, IPTC has had significant changes of its own over the last year. We bid goodbye to Michael Steidl as our Managing Director of 15 years, and welcomed Brendan Quinn as our new Managing Director this summer. We’re grateful that we continue to benefit from Michael’s skills and experience, as he has remained the Chairman of the Photo and Video Working Groups. And I think that Brendan has made a great start in his new role in helping us keep the IPTC moving forward.
As part of the handover from Michael to Brendan, we decided to scan a lot of the old paper documents (link available to members only), including various types of IPTC newsletter, dating back to 1967, two years after the organization was founded. I thought I would look back to what IPTC was up to in the year 2000, the year I became a delegate to the IPTC, back when I worked for Dow Jones.
And there I am in the photo at the top of the page. Or, at least, the back of my head. Some things are quite reminiscent of this week’s meeting – the birth of NewsML, a focus on improved communications, cooperation with other organizations e.g. MPEG-7.
Then I thought I would look back on IPTC in 1968, the year I was born:
Some things were similar to today – such as a focus on fine technical details such as Alphabet Number 5 and a plan to go to Lisbon next year for a meeting. However, most of the focus in those days was mainly on lobbying against tariffs and satellite monopolies.
So I think it is fair to say that the IPTC has never been just a standards body. It is also, more broadly, a community of practice. We are a group of people from around the world who have a common interest in news and media technology. The process of sharing information and experiences with the group, through these face to face meetings and the online development of standards, means that the members of IPTC learn from each other, and so have an opportunity to develop professionally and personally. I hope you will agree that yesterday’s discussion of news search and classification was an excellent example of exchange of experiences, both good and bad, which can help many of us avoid problems and seize opportunities, and so accelerate our work.
I think it is helpful for us to recognize that IPTC is a community which continues to evolve, as the interests, goals and membership of the organization change. I’m confident that – working together – we can continue to reshape the IPTC to better meet the needs of the membership and to move us further forward in support of solving the business and editorial needs of the news and media industry. I look forward to working with all of you on addressing the challenges in 2019 and beyond.
This is the report of Day 3 of the IPTC Autumn 2018 Meeting in Toronto. See the report from Day 1 and the report from Day 2. All the presentations are available to IPTC members in the IPTC Members Only Zone.
Day 3 of IPTC Autumn Meetings always includes the Annual General Meeting, where all Voting Members can have their say in the future of the organisation. This time new Managing Director Brendan Quinn gave his first MD’s report, alongside Stuart Myles’ Chairman’s Report (which will be posted to the IPTC blog soon). Materials from the AGM are available to members in the IPTC Members Only Zone.
Rounding out the discussions for the three days, we had some broad-ranging and future-facing conversations regarding News Credibility projects, where Stuart Myles took us on a tour of the wide range of projects and initiatives around misinformation, the credibility of news and news sources, and the perceived problems of “fake news.” IPTC or IPTC members are helping out several organisations in their efforts in this area such as the w3C Credible Web community group and the Journalism Trust Initiative.
We also had a discussion on funding opportunities and potential IPTC projects, which is an internal discussion involving members only.
Lastly, speaking about the future, we had Michael Young from Civil Media speak to us about their plans to use blockchain technologies to power small newsrooms and fulfil their broad goal to “power sustainable journalism throughout the world.” A lot of focus has been on Civil’s Initial Coin Offering, which closed underfunded and will be returning investors’ money, but they have many other activities, including a suite of WordPress-based plugins allowing news providers to join the Civil ecosystem and pledge openness, fairness and transparency according to the Civil Foundation’s constitution. Mike explained how blockchain based voting and decisions mean that members can be rewarded for pointing out breaches of the constitution, and bad actors can be punished or even removed from the network entirely.
The event ended with a few of us attending the Canadian Journalism Foundation’s event with journalism pundits Vivian Schiller, Jeff Jarvis, Jay Rosen and Matthew Ingram, talking about misinformation and misuse of social media (video recording available via the above link), and ten of us went on a networking and team bonding trip to Niagara Falls and to a local winery on the Thursday.
Overall it was a great Autumn Meeting which set the scene and built the foundation for many more great IPTC meetings to come!
This is the report of Day 2 of the IPTC Autumn 2018 Meeting in Toronto. See the report from Day 1 and the report from Day 3. All the presentations are available to IPTC members in the IPTC Members Only Zone.
Day 2 of the IPTC Autumn 2018 Meeting in Toronto was a deep dive into search and classification. Many of our members are working hard to make their content accessible quickly and easily to their customers, and user expectations are higher than ever, so search is a key part of what they do.
First up we had Diego Ceccarelli from Bloomberg talking through their search architecture. Users of Bloomberg terminals have very high expectations that they will see stories straight away: They have 16m queries and 2m new stories and news items per day, with requirements for a median query response time of less than 200ms and for new items to be available in search results in less than 100ms. And as Diego says, “with huge flexibility comes huge complexity.” For example, because customers expect to see the freshest content straight away, the system has no caching at all!
To achieve this, the Bloomberg team use Apache Solr – in fact they have 3 members of staff dedicated to working on Solr full-time, and have contributed a huge amount of code back to the project, including their machine-learning-based “learning to rank” module which can be trained to rank a set of search results in a nuanced way. Bloomberg also worked with an agency to develop open source code used to monitor a stream of incoming stories against queries, used for alerting. Other topics Diego raised included clustering of search results, balancing relevance and timeliness, crowdsourcing data to train ranking systems, combining permissions into search results, and more – a great talk!
Our heads already reeling with all the information we learned from Bloomberg, we then heard from another search legend, Boerge Svingen, one of the founders of FAST Search in Norway and now Director of Engineering at the New York Times. He spoke about how NYT re-architected their search platform to be based around Apache Kafka, a “distributed log streaming” platform that keeps a record of every article ever published on the Times (since 1851!) and can replay all of them to feed a new search node in around half an hour. The platform is so successful that it is used to feed the “headless CMS” (see yesterday’s report) based on GraphQL which is used to render pages on nytimes.com for all types of devices. Boerge and his team use Protocol Buffers as their schema to keep everything light and fast. More information in Boerge’s slide deck, available to IPTC members.
Next up was Chad Schorr talking about search at Associated Press, discussing their Elastic implementation on Amazon Web Services. Using a devops approach based on “immutable infrastructure” meant that the architecture is now very solid and well-tested. Chad was very open and spoke about issues and problems AP had while they were implementing the project and we had a great discussion about how other organisations can avoid the same problems.
Then Robert Schmidt-Nia from DPA talked about their implementation of a content repository (in effect another “headless CMS”!) based on serialising NewsML-G2 into JSON using a serverless architecture based on Amazon Lambda functions, AWS S3 for storage, SQS queues and Elasticsearch. Robert told of how the entire project was built in three months with one and a half developers, and ended up with only 500 lines of code! It can now be used to provide services to DPA customers that could not be provided before, including subsets of content based on metadata such as all Olympics content.
Next, Solveig Vikene and Roger Bystrøm from Norway’s news agency NTB spoke about and gave a live demo of their new image archive search product. They demonstrated how photographers can pre-enter metadata so that they can send their photos to the wire a few seconds after taking them on the camera. Some functions like global metadata search and replace and a feature-rich query builder made their system look very impressive.
Veronika Zielinska from Associated Press spoke about AP’s rule-based text classification systems, showing the complexity of auto-tagging content (down to disambiguating between two US Republican Congressmen both called Mike Rogers!) and the subtlety of AP’s terms (distinguishing between “violent crime” events versus the social issue of “domestic violence”) therefore the necessity of manually creating, and maintaining, a rules-based system.
Stuart Myles then took us on a tour through AP’s automated image classification activities, looking at whether commercial tools are yet up to the task of classifying news content, the value of assembling good training sets but the difficulties in doing so, and the benefits of starting with a relatively small taxonomy that is easier for machine learning systems to understand.
Dave Compton talked us through Thomson Reuters Knowledge Items used by the OpenCalais classifier and how they use the PermID system to unify concepts across their databases of people, organisations, financial instruments and much more. Dave described how Knowledge Items are represented as NewsML-G2 Knowledge Items, and are mapped to Media Topics where possible.
On that subject, Jennifer Parrucci of the New York Times, and chair of the IPTC NewsCodes Working Group, gave an update on the latest activities of the group, including the ongoing Media Topic definitions review, adding new Media Topic terms after suggestions by the Swedish media industry, and work with schema.org team on mapping between schema.org and Media Topics terms.
As you can see, it was a very busy day!