We have made it to the end of 2020. And what a year it has been!
The news and media industry has perhaps been affected less than the travel or hospitality industry, but 2020 was still a hugely eventful year for us all professionally and personally. Congratulations on getting through it, and our thoughts go out to those who have suffered in any way this year.
Of course our member meetings, planned for Tallinn Estonia and New York USA this year, quickly became virtual events held via Zoom. It worked surprisingly well, and even allowed us to bring on some speakers and guests who wouldn’t have been able to attend or present if we had held the events physically.
The IPTC Photo Metadata Conference was very interesting this year: from our usual small room hosted as part of the CEPIC Congress, we went to a virtual event with over 200 attendees. If you missed it, or want to re-visit, videos of the sessions are available on YouTube.
The News in JSON Working Group submitted ninjs 1.3 for approval at the Spring Meeting, which added fields for trust indicators and genres, support for different types of headlines and alternative IDs. The ninjs generator, showing how easy it is to create a ninjs document by filling in a web form, was very popular and was the inspiration for some related tools in other working groups. Since then, the working group has been looking at more features to be included in future versions of ninjs. If you handle news in JSON in any way and you haven’t completed our News in JSON survey, please do it now!
The NewsML-G2 Working Group released NewsML-G2 2.29 in July which added some fields required for the trust and credibility project, and a new NewsML-G2 Generator tool based on the ninjs one. The group also participated in the trust and credibility projects described below. The NewsML-G2 specifications and guidelines documents have now been updated to version 2.29.
The Video Metadata Working Group released Video Metadata Hub 1.3 during the summer, which added fields to track the editing of metadata (as opposed to editing the actual video), parent video identifier, and updated the mappings to EBUCore and EIDR. The group is hard at work on promoting Video Metadata Hub and creating more introductory materials to help new users understand VMHub and why it is useful.
The NewsCodes Working Group published three updates this year, in March, June and August, and a new update will be published very soon. The NewsCodes Guidelines document was released this year, and is already proving useful both for those wishing to learn how to use NewsCodes better and for the Working Group to establish clear guidelines about when and how to add new terms. MediaTopics is now available in 11 languages and we have more translations coming!
The Photo Metadata Working Group has been very busy, with the biggest news of the year being that Google now supports IPTC Photo Metadata to display licensor information in search results, including a link back to the image owner’s “licence this image” page. The feature was launched in beta in February and launched fully in August. We have had great take-up so far, and the interest in the Photo Metadata Conference (with over 200 people registered) showed that the industry was very keen to hear about it. We also launched updates to the GetPMD tool to support new schema.org mappings, and browser plugins for Chrome and Firefox to enable easy viewing of embedded IPTC Photo Metadata in photographs on the web.
The Sports Content Working Group has had its collective head down in 2020, re-thinking the data model for sports results, statistics and performances. We have been taking a semantic view, looking at using RDF as the main data model for sports data which can then be serialised into JSON, XML and other formats. The intention is that this will also bring the model closer to schema.org in the future. We have some RDF and semantic web experts on the group who are helping with the modelling, and are taking a use-case based approach to make sure that we’re designing something that’s both useful and usable.
A discussion group “spun out” from the NewsCodes Working Group to consider Named Entities for News. So far we have had a couple of meetings to discuss our thoughts on maintaining vocabularies for named entities such as people, companies and places, and to study different approaches used by IPTC member organisations and non-members.
An ongoing project that spans several working groups is the work on Trust and Credibility. After publishing a draft guidelines document in April and a webinar that we ran in September, we plan to publish a 1.0 version in the new year.
All of our Working Groups are always looking for new participants, so if you’re interested in any of these areas, please consider joining IPTC and taking part in a working group!
IPTC appearances at conferences and in the media
There weren’t many conferences in the first part of the year as everyone adjusted to working remotely, but in the second half of the year IPTC people made quite a few appearances at other conferences and webinars.
In July, Brendan Quinn and Robert Schmidt-Nia spoke about NewsML-G2 at an Arab States Broadcasting Union metadata workshop. In September, Michael Steidl spoke on a panel with Google and Alamy at the Perpignan photojournalism conference about Google’s “Licensable Images” feature, and Brendan Quinn hosted a webinar about our work in trust and credibility.
In October, Pam Fisher and Mark Milstein spoke about Video Metadata Hub at the DMLA conference. In November, Brendan Quinn was invited to give a keynote at the FIBEP World Media Intelligence Congress, speaking to the media monitoring / media intelligence industry who also use quite a few IPTC standards.
Also in November, Bill Kasdorf published a column in Publisher’s Weekly about Media Topics and IPTC Photo Metadata which raised a lot of interest in the publishing industry. In December, Michael Steidl was invited to present a webinar to IPTC member BVPA about IPTC Photo Metadata.
- We announced the IPTC Startup Membership category in September, and our first Startup Member to join is IMATAG.
- DATAGROUP Consulting Services joined as a Voting Member.
- New Associate Members are CBC / Radio Canada, iMatrics, and DeFodi Images.
- New Individual Members are Margaret Warren and Alison Sullivan.
We’re very happy to have them all on board and joining in the IPTC community!
Some sad news
It was with great shock that we learned in early November that longstanding member Andy Read of BBC had passed away. He was a key contributor in many areas and his friendliness and enthusiasm will be hugely missed. Rest in peace, friend.
It seems that we have come through the worst 2020 could throw at us and things are looking up for 2021. We are already thinking about 2021’s events and how we can learn from 2020 to improve things for members and friends in 2021.
Best wishes for the holiday season from all of us at IPTC.
PS: If you have any questions or thoughts about how IPTC could help you, or if you are interested in talking about joining IPTC, please contact Managing Director, Brendan Quinn at firstname.lastname@example.org.
A clear majority of professional photo businesses in Europe and North America find IPTC photo metadata highly relevant to their business. That is the message received by IPTC from its 2019 photo industry supplier survey.
According to survey results, eight out of ten photo supplier companies say that data describing images and supporting searches by users is most relevant. Eight out of ten photographers say that metadata to express ownership and usage rights is most important.
These trends are shown by a survey among photo professionals conducted by IPTC, the maker of the industry standard for embedding descriptive, rights information and administrative metadata into images. The 2019 IPTC Photo Metadata Survey results were made public on 14 August 2019 and can be downloaded from the iptc.org website.
“We know that taking the time to apply photo metadata is an investment by photo businesses, so it’s good to see that they get a return,” said Michael Steidl, lead of IPTC’s Photo Metadata Working Group. “Still, we are pleasantly surprised by the importance that photo businesses give to metadata.”
The survey investigated how and why IPTC photo metadata are used in 2019, and more than 100 supplier companies and photographers from many European countries and the USA participated. Most respondents to the supplier survey are companies active in the stock images business, but IPTC also received responses from companies dealing with news photos, cultural heritage images and video footage. The primary business areas of photographers are stock images and public relations photos.
The main reason for applying descriptions of what is depicted in an image are for supplier companies business needs, primarily to help users or customers to find an image they are looking for. Businesses apply rights and licensing data primarily because of legal requirements, but also to protect their companies revenue streams. Administrative data are added to satisfy customer needs.
For photographers, rights are of critical importance
The use of rights data by photographers is more driven by their own business needs than by legal requirements. As photographers are the first party in the supply chain of images they have a high interest to claim who is the creator and the first copyright owner of each creative work. Applying descriptions of the image is driven by customer needs and business needs of photographers. Why administrative data is applied comes also from their business needs and much less from customer needs compared to supplier companies.
IPTC photo metadata – used since 1995
The IPTC photo metadata standard originated in 1995 when Adobe and other makers of image software adopted the IPTC Information Interchange Model (IIM) standard for the panels with fields describing what an image shows, providing the name of the photographer, stating copyright and usage terms, and sharing instructions and more administrative information. In 2005 IPTC published its first Photo Metadata Standard covering fields used by photo professionals and expressed by the IIM format and the then-new XMP format. The IPTC fields were substantially extended in 2008 and since then the standard has been continuously maintained by IPTC, the global standards body of the news media.
For more information, download the full analysis of supplier survey results as a PDF.
Recently conversations on Twitter and various blogs and news sites have reported on Facebook’s use of IPTC embedded photo metadata fields to “track users”. (Reddit.com: “Facebook is embedding tracking data inside the photos you download”, The Australian: “Facebook pics tracking you”, Forbes: “Facebook Embeds ‘Hidden Codes’ To Track Who Sees And Shares Your Photos”, Financial Express: “Beware! Facebook embeds tracking data inside photos you download”).
As the creators and maintainers of the IPTC Photo Metadata Standard, we want to clarify a few points and share our own analysis of the situation.
In Spring 2019, IPTC’s Photo Metadata Working Group conducted our latest round of tests regarding how various social media platforms deal with metadata embedded in uploaded and shared images. The 2019 test results show how Facebook treats image metadata: in IIM and EXIF formats, a few fields are retained related to claiming rights while all others are removed, and in the XMP format all fields are removed.
While this was a small improvement compared to the previous IPTC test in 2016 when all Exif fields were removed, we did not rate Facebook with a “green dot” showing compliance with IPTC standards, as removing metadata embedded by the owner of an image contradicts IPTC’s strong support for keeping metadata persistent.
In addition, in both the 2016 and 2019 tests the Working Group found that two fields in the IIM format do indeed appear to be given values populated by Facebook.
IPTC looks at the facts
IPTC provides a reference image for each version of its Photo Metadata Standard which contains a test value for every specified metadata field. This makes it easy to test which fields are removed or modified.
The reference image of the 2017.1 version of the standard was uploaded to Facebook by the Working Group member David Riecks and it can still be seen here. Next the group used the IPTC’s Get IPTC Photo Metadata website tool for retrieving embedded metadata of most of the images shown on the web. Anyone can use this tool: simply fill the URL of the image into the site’s form and click to see all the metadata embedded in the image.
This test was performed using the URL of the IPTC reference image uploaded to Facebook and the result was shown instantly:
- Embedded metadata fields in the IIM format related to rights were retained: Creator, Creator Job Title, Copyright Notice, Credit Line, Source and Description Writer.
- All embedded metadata using the XMP format were removed by Facebook.
- The Creator and the Copyright Notice in the Exif format were also retained.
- The Instructions field and the Job Id field in IIM show values significantly different from what had been uploaded. The IPTC Working Group assumes these values were inserted by Facebook:
- The value of the Instructions field starts with FBMD. The IPTC Working Group retrieved this image using “Save As…” and another Facebook user uploaded it to his account. Result: the value was not changed during the second upload to Facebook. These results were shown for the re-uploaded image.
- The value of the Job Id fields looks like a unique identifier. If an uploaded image is downloaded using the Save As function and then uploaded by another Facebook user this field contains a different value.
- The IPTC Working Group searched for any documentation of these inserted values but found no specification or statement from Facebook. There have been, however, many guesses and assumptions by users and developers.
Using the Get IPTC Photo Metadata site anybody can check what Facebook values were applied to her or his photo. As a user, you can find Facebook image URLs by clicking on the image on the Facebook site and using the “Copy image address” or the “Inspect” or “Inspect Element” function of your web browser, you should then see the URL.
IPTC tests showed when a Facebook member uploads an image to the Facebook system it removes a lot of fields, keeps only a few related to rights and replaces or adds values to the Job Id and the Instructions fields. The role of these values is not publicly documented by Facebook, so they are currently the subject of significant speculation.
IPTC makes no assumptions about what the metadata values are used for, but Facebook appears to keep the value of the Instructions field constant even when the image is re-uploaded by another user. The Job ID field on the other hand changes with each separate upload.
Our recommendations are that all embedded metadata values should be retained by platforms and that no platform should be overwriting user metadata.
IPTC’s 2019 Social Media Platforms survey also looked at the metadata usage of other major social media platforms. Interested parties can find more information at Social Media Sites Photo Metadata Test Results 2019.
The example metadata values embedded into the 2017.1 reference image can be checked by going to https://getpmd.iptc.org and clicking on the green button in Option A labeled Get Photo Metadata of Web Image. No image URL is required, as by default the metadata of this reference image is retrieved and displayed.
For those interested in the technical details of embedded photo metadata, the technical formats IIM and XMP are introduced in the IPTC Photo Metadata User Guide, including a look under the hood of image files.
Home and away teams
<action sequence-number="1" team-idref="team_9572" type="esacttype:remove" comment="Nuke"></action>
<action sequence-number="2" team-idref="team_6134" type="esacttype:remove" comment="Inferno"></action>
<action sequence-number="3" team-idref="team_9572" type="esacttype:choose" comment="Cache"></action>
<action sequence-number="4" team-idref="team_6134" type="esacttype:choose" comment="Train"></action>
<action sequence-number="5" team-idref="team_9572" type="esacttype:remove" comment="Overpass"></action>
<action sequence-number="6" team-idref="team_6134" type="esacttype:remove" comment="Dust2"></action>
<action sequence-number="7" type="esacttype:remaining" comment="Mirage"></action>
Statistics for eSports teams, players and tournaments
<team-stats score="16" event-outcome="speventoutcome:win">
<outcome-totals scoping-label="T" wins="4" />
<outcome-totals scoping-label="CT" wins="12"/>
<stat stat-type="esstat:kills" value="15" />
<stat stat-type="esstat:headshot" value="6" />
<stat stat-type="esstat:assist" value="4" />
<stat stat-type="esstat:flashassist" value="2" />
<stat stat-type="esstat:deaths" value="11" />
<stat stat-type="esstat:KAST" value="78.3" />
<stat stat-type="esstat:ADR" value="68.4" />
<stat stat-type="esstat:FKdiff" value="0" />
valuewe can handle any type of statistic.
esacttype:in these examples do not currently exist in the IPTC NewsCodes catalog but could easily be set up if needed. It might be necessary to have different prefixes for different type of eSports games. But that would require some more investigation.
Last week brought IPTC members together for our twice-yearly Face-to-Face Meeting to discuss news credibility, taxonomies and controlled vocabularies, updates in sports standards and much more!
This year’s IPTC Spring Meeting was in Lisbon, Portugal, and over 40 IPTC member delegates, member experts and invited guests gathered for three days to discuss all the latest developments in news and media technology.
On Monday, IPTC Chair and Director of Information Management for Associated Press Stuart Myles gave a great introduction and overview of what was to come in the meeting. After everyone introduced themselves, Stuart discussed some changes that the IPTC Board has been thinking about, including looking at updating the Mission and Vision of the organisation to reflect how we operate in 2019.
Then Robert Schmidt-Nia from dpa Deutsche Presse-Agentur introduced their C-POP project (in collaboration with STT and the Sanoma group in Finland) which follows on from the Performing Content we saw at the previous meeting in Toronto. It was interesting hearing about the agency’s shift in focus from a strict business-to-business model to a “B2B2C” model thinking about what consumers needed and how agencies could help publishers to deliver on the needs of readers and subscribers, ideally using feedback from publishers to agencies on how well their content is performing according to real metrics like loyalty and subscription revenue. IPTC will be involved in the C-POP project so you can expect to hear more about this in the future.
On the same topic, Andy Read from BBC gave an overview of the “Telescope” internal measurement tool, showing how BBC staff can view in real time how their content is being consumed by region, topic or device.
James Logan from the BBC and Brendan Quinn of IPTC gave an overview of IPTC’s work with news trust and credibility projects The Trust Project and the Journalism Trust Initiative. We decided at the Autumn 2018 Meeting that IPTC wouldn’t create its own standard around news credibility, disinformation and “fake news”, but that we would work with existing groups and help them to incorporate their standards in IPTC’s work. With The Trust Project, that has been going well, and we are almost ready to publish some best practices on implementing the Trust Project’s Trust Indicators in NewsML-G2 content. Trust Project indicators are already used in schema.org markup by over 120 news providers so it’s great to see such strong uptake.
Separately we have been working with Reporters Sans Frontières’ Journalism Trust Initiative which is at an earlier stage and is looking at documenting general standards for trustworthy and ethical journalism. IPTC is part of the JTI’s Technical Task Force which is working with the drafting teams on making their statements specific enough to be answered with data and indexed by machines. Hopefully it will end up with similar indicators to the Trust Project indicators
With both news credibility projects, some questions still need to be addressed, such as assessing the credibility of claims (when a news organisation says they are trustworthy, how can you trust them!), and how these trust indicators work in a multi-provider workflow: if a news agency sends some content to a publisher who then merges it with original reportage, who determines the trust indicators that are attached to the final story? There is definitely a lot more work to do!
Joaquim Carreira from local agency Lusa showed us the “Combate Às Fake News” project focussing on media literacy and helping readers to know what to look for, including the idea of a “nutrition label” for news content looking at criteria such as factuality, readability and use of emotional language.
The day was rounded off with Johan Lindgren of Swedish agency TT presenting the recent work of IPTC’s Sports Content Working Group. The group has recently been tidying up the spec and incorporating suggestions for changes, plus looking at eSports and Chess as two non-traditional sports that are both seeing an increase in interest – in the case of eSports, it is becoming a huge industry. Our tests showed that in simple cases eSports results can be addressed with existing SportsML 3 structures, but to handle more detailed play-by-play results we may need to at least introduce a new controlled vocabulary. Please let us know if you would like to implement SportsML for eSports!
Johan also presented the draft of SportsML 3.1 to be voted on by the IPTC Standards Committee.
Stay tuned for an update on Days 2 and 3!
We were proud to be involved at last week’s Metadata Exchange for News interoperability demo organised by DPP (formerly known as the Digital Production Partnership).
DPP’s “Metadata Exchange for News” is an industry initiative aimed at making the news production process easier.
The DPP team looked around for existing standards on which to base their work, and when they found IPTC’s NewsML-G2, they realised that it exactly matched their requirements. NewsML-G2’s generic PlanningItem and NewsItem structure meant that it could easily be used to manage news production workflows with no customisation required.
We were treated to a demo of a full news production workflow in the DPP’s offices at ITV in London on February 6th.
A full news production workflow
As you can see from the diagram, the workflow involves these steps:
- An editor creates a planning record for a news item using Wolftech’s planning system, describing metadata for the planned story
- The system sends the planning item as NewsML-G2 to Sony’s XDCAM Air system which converts it to Sony’s proprietary planning metadata and sends it directly to a camera
- XDCAM Air retrieves the footage from the camera, links it to the planning metadata using the NewsML-G2 IDs, back into XDCAM Air which is then retrieved by some simple custom web services
- The web services send NewsML-G2 NewsItem metadata along with the MP4 video file to Ooyala’s Flex Media Platform via an Amazon Web Services S3 bucket
- Ooyala Flex Media Platform sends the media and metadata to the platforms that require it, in this case the Reuters Connect video browsing and distribution platform.
The NewsML-G2 integrations were built for the demo but the idea is that they will soon become standard features of the products involved. All parties reported that implementing NewsML-G2 was fast and fairly painless!
Thanks to all involved and special thanks to Abdul Hakim of DPP for leading the project and organising the demo day.
Look out for an IPTC Webinar on this topic soon!
This report was presented by Stuart Myles, IPTC Chairman, at the IPTC Annual General Meeting in Toronto, Canada on October 17 2018.
IPTC has had a good year – the 53rd year for the organization!
We’ve updated key standards, including NewsML-G2, the Video Metadata Hub and the Media Topics, as well as launching RightsML 2.0, a significant upgrade in the way to express machine processable rights for news and media.
Of course, IPTC standards are a means, not an end. The value of the standards is the easier exchange, consumption and handling of news and media by organizations large and small around the world. So it is important that we continue to focus on making our standards straightforward to use and have them adopted as widely as possible. I think we are making progress on the usability front, such as moving away from zip’d PDFs towards actual HTML web pages for documentation of NewsML-G2. Over the last year, we’ve continued to work with other organizations – W3C, Europeana and MINDS – to develop standards, increase adoption – and, perhaps most importantly, to open up IPTC to other perspectives. And we have had a huge win in the recognition of key photo metadata by Google Images. But we clearly need to do more for both usability and adoption. During the course of this meeting, we’ve had some good discussion about what more we can do in both areas and I encourage all members to help spread the word about IPTC standards, and suggest ways we can accelerate adoption.
Of course, the nature of news and media continues to evolve. On the one hand, new forms of story telling are emerging, such as Augmented Reality and Virtual Reality. Equally, using data as the way to power stories continues to increase both data-driven stories and data-supported stories. By data-driven stories, I mean journalists reviewing large databases of information and creating stories based on the trends they find. By data-supported stories, I mean content creators using visually-interesting graphics to support their content. The automated production, curation and consumption of news and media is likely to increase for the foreseeable future, driven by both technological improvements and the seductive economics of replacing people with algorithms. And it is not only economics which are driving these changes and challenges, just as it is no longer fill-in-the-blank text stories being written by robot journalists. Synthetic media – such as “deep fakes” – are able to produce increasingly convincing photo, video and audio stories that are indistinguishable from “real” media. Inevitably, the existence and debunking of these fakes will be used to deny legitimate reporting, with the implications of continued erosion of trust in media. All of these trends – AR, VR, data-powered journalism and dealing with trust, credibility and misinformation – are topics which IPTC has discussed over the last few years, but we have not developed any tracks of work to try to address them. In part, this is because these are, by definition, outside of the areas that our member organizations traditionally deal in and are so quite difficult to tackle in terms of establishing standards.
However, even within the context of standards, IPTC is opening up to new forms of experimentation. As we heard on Monday, the joint project between IPTC and MINDS, to allow for the identification of audience and interest metadata, has lead to the introduction of structures within NewsML-G2 to support rapid prototyping and experimentation. I see this as a positive move, with great potential to accelerate the work we do and to help keep it lightweight and relevant.
Of course, IPTC has had significant changes of its own over the last year. We bid goodbye to Michael Steidl as our Managing Director of 15 years, and welcomed Brendan Quinn as our new Managing Director this summer. We’re grateful that we continue to benefit from Michael’s skills and experience, as he has remained the Chairman of the Photo and Video Working Groups. And I think that Brendan has made a great start in his new role in helping us keep the IPTC moving forward.
As part of the handover from Michael to Brendan, we decided to scan a lot of the old paper documents (link available to members only), including various types of IPTC newsletter, dating back to 1967, two years after the organization was founded. I thought I would look back to what IPTC was up to in the year 2000, the year I became a delegate to the IPTC, back when I worked for Dow Jones.
And there I am in the photo at the top of the page. Or, at least, the back of my head. Some things are quite reminiscent of this week’s meeting – the birth of NewsML, a focus on improved communications, cooperation with other organizations e.g. MPEG-7.
Then I thought I would look back on IPTC in 1968, the year I was born:
Some things were similar to today – such as a focus on fine technical details such as Alphabet Number 5 and a plan to go to Lisbon next year for a meeting. However, most of the focus in those days was mainly on lobbying against tariffs and satellite monopolies.
So I think it is fair to say that the IPTC has never been just a standards body. It is also, more broadly, a community of practice. We are a group of people from around the world who have a common interest in news and media technology. The process of sharing information and experiences with the group, through these face to face meetings and the online development of standards, means that the members of IPTC learn from each other, and so have an opportunity to develop professionally and personally. I hope you will agree that yesterday’s discussion of news search and classification was an excellent example of exchange of experiences, both good and bad, which can help many of us avoid problems and seize opportunities, and so accelerate our work.
I think it is helpful for us to recognize that IPTC is a community which continues to evolve, as the interests, goals and membership of the organization change. I’m confident that – working together – we can continue to reshape the IPTC to better meet the needs of the membership and to move us further forward in support of solving the business and editorial needs of the news and media industry. I look forward to working with all of you on addressing the challenges in 2019 and beyond.
This is the report of Day 3 of the IPTC Autumn 2018 Meeting in Toronto. See the report from Day 1 and the report from Day 2. All the presentations are available to IPTC members in the IPTC Members Only Zone.
Day 3 of IPTC Autumn Meetings always includes the Annual General Meeting, where all Voting Members can have their say in the future of the organisation. This time new Managing Director Brendan Quinn gave his first MD’s report, alongside Stuart Myles’ Chairman’s Report (which will be posted to the IPTC blog soon). Materials from the AGM are available to members in the IPTC Members Only Zone.
Rounding out the discussions for the three days, we had some broad-ranging and future-facing conversations regarding News Credibility projects, where Stuart Myles took us on a tour of the wide range of projects and initiatives around misinformation, the credibility of news and news sources, and the perceived problems of “fake news.” IPTC or IPTC members are helping out several organisations in their efforts in this area such as the w3C Credible Web community group and the Journalism Trust Initiative.
We also had a discussion on funding opportunities and potential IPTC projects, which is an internal discussion involving members only.
Lastly, speaking about the future, we had Michael Young from Civil Media speak to us about their plans to use blockchain technologies to power small newsrooms and fulfil their broad goal to “power sustainable journalism throughout the world.” A lot of focus has been on Civil’s Initial Coin Offering, which closed underfunded and will be returning investors’ money, but they have many other activities, including a suite of WordPress-based plugins allowing news providers to join the Civil ecosystem and pledge openness, fairness and transparency according to the Civil Foundation’s constitution. Mike explained how blockchain based voting and decisions mean that members can be rewarded for pointing out breaches of the constitution, and bad actors can be punished or even removed from the network entirely.
The event ended with a few of us attending the Canadian Journalism Foundation’s event with journalism pundits Vivian Schiller, Jeff Jarvis, Jay Rosen and Matthew Ingram, talking about misinformation and misuse of social media (video recording available via the above link), and ten of us went on a networking and team bonding trip to Niagara Falls and to a local winery on the Thursday.
Overall it was a great Autumn Meeting which set the scene and built the foundation for many more great IPTC meetings to come!
This is the report of Day 2 of the IPTC Autumn 2018 Meeting in Toronto. See the report from Day 1 and the report from Day 3. All the presentations are available to IPTC members in the IPTC Members Only Zone.
Day 2 of the IPTC Autumn 2018 Meeting in Toronto was a deep dive into search and classification. Many of our members are working hard to make their content accessible quickly and easily to their customers, and user expectations are higher than ever, so search is a key part of what they do.
First up we had Diego Ceccarelli from Bloomberg talking through their search architecture. Users of Bloomberg terminals have very high expectations that they will see stories straight away: They have 16m queries and 2m new stories and news items per day, with requirements for a median query response time of less than 200ms and for new items to be available in search results in less than 100ms. And as Diego says, “with huge flexibility comes huge complexity.” For example, because customers expect to see the freshest content straight away, the system has no caching at all!
To achieve this, the Bloomberg team use Apache Solr – in fact they have 3 members of staff dedicated to working on Solr full-time, and have contributed a huge amount of code back to the project, including their machine-learning-based “learning to rank” module which can be trained to rank a set of search results in a nuanced way. Bloomberg also worked with an agency to develop open source code used to monitor a stream of incoming stories against queries, used for alerting. Other topics Diego raised included clustering of search results, balancing relevance and timeliness, crowdsourcing data to train ranking systems, combining permissions into search results, and more – a great talk!
Our heads already reeling with all the information we learned from Bloomberg, we then heard from another search legend, Boerge Svingen, one of the founders of FAST Search in Norway and now Director of Engineering at the New York Times. He spoke about how NYT re-architected their search platform to be based around Apache Kafka, a “distributed log streaming” platform that keeps a record of every article ever published on the Times (since 1851!) and can replay all of them to feed a new search node in around half an hour. The platform is so successful that it is used to feed the “headless CMS” (see yesterday’s report) based on GraphQL which is used to render pages on nytimes.com for all types of devices. Boerge and his team use Protocol Buffers as their schema to keep everything light and fast. More information in Boerge’s slide deck, available to IPTC members.
Next up was Chad Schorr talking about search at Associated Press, discussing their Elastic implementation on Amazon Web Services. Using a devops approach based on “immutable infrastructure” meant that the architecture is now very solid and well-tested. Chad was very open and spoke about issues and problems AP had while they were implementing the project and we had a great discussion about how other organisations can avoid the same problems.
Then Robert Schmidt-Nia from DPA talked about their implementation of a content repository (in effect another “headless CMS”!) based on serialising NewsML-G2 into JSON using a serverless architecture based on Amazon Lambda functions, AWS S3 for storage, SQS queues and Elasticsearch. Robert told of how the entire project was built in three months with one and a half developers, and ended up with only 500 lines of code! It can now be used to provide services to DPA customers that could not be provided before, including subsets of content based on metadata such as all Olympics content.
Next, Solveig Vikene and Roger Bystrøm from Norway’s news agency NTB spoke about and gave a live demo of their new image archive search product. They demonstrated how photographers can pre-enter metadata so that they can send their photos to the wire a few seconds after taking them on the camera. Some functions like global metadata search and replace and a feature-rich query builder made their system look very impressive.
Veronika Zielinska from Associated Press spoke about AP’s rule-based text classification systems, showing the complexity of auto-tagging content (down to disambiguating between two US Republican Congressmen both called Mike Rogers!) and the subtlety of AP’s terms (distinguishing between “violent crime” events versus the social issue of “domestic violence”) therefore the necessity of manually creating, and maintaining, a rules-based system.
Stuart Myles then took us on a tour through AP’s automated image classification activities, looking at whether commercial tools are yet up to the task of classifying news content, the value of assembling good training sets but the difficulties in doing so, and the benefits of starting with a relatively small taxonomy that is easier for machine learning systems to understand.
Dave Compton talked us through Thomson Reuters Knowledge Items used by the OpenCalais classifier and how they use the PermID system to unify concepts across their databases of people, organisations, financial instruments and much more. Dave described how Knowledge Items are represented as NewsML-G2 Knowledge Items, and are mapped to Media Topics where possible.
On that subject, Jennifer Parrucci of the New York Times, and chair of the IPTC NewsCodes Working Group, gave an update on the latest activities of the group, including the ongoing Media Topic definitions review, adding new Media Topic terms after suggestions by the Swedish media industry, and work with schema.org team on mapping between schema.org and Media Topics terms.
As you can see, it was a very busy day!
This is the report of Day 1 of the IPTC Autumn 2018 Meeting in Toronto. See the report from Day 2 and the report from Day 3. All the presentations are available to IPTC members in the IPTC Members Only Zone.
This week we are in Toronto for the IPTC Autumn Meeting. Unfortunately the weather is not as warm as it was last week but we are still enjoying ourselves immensely and learning a lot from each other!
All presentations are available to members on the members-only event page.
After an introduction from Chair Stuart Myles, we heard an update from Michael Steidl, chair or the Video Metadata and Photo Metadata Working Groups. Michael updated us on work promoting the IPTC Video Metadata Hub standard, talking to manufacturers and software vendors at events like IBC in Amsterdam, and pulling together use cases and success stories from existing users of the standard.
On the IPTC Photo Metadata Standard, Michael shared news about the fact that Google Images now displays IPTC Photo Metadata project and the press we have received since that time. Also we are working on new technical features in the standard such as metadata for regions within images. We’re looking for use cases and requirements for storing metadata against regions, so if you have any input, please let Michael, or IPTC Managing Director Brendan Quinn, know!
Dave Compton of Refinitiv, formerly the Financial & Risk business of Thomson Reuters, chair of the NewsML-G2 Working Group, gave an update on recent progress and work towards NewsML-G2 version 2.28 which will be released soon. It will incorporate features for the requirements of auto-tagging systems and a new experimental namespace to be used for potential new updates to NewsML-G2 that aren’t yet ready to be added to the full specification.
The experimental extension to NewsML-G2 is already put in use by Gerald Innerwinkler of APA and Robert Schmidt-Nia of DPA who presented an update on a current project between IPTC and MINDS International looking at metadata for suggesting news stories to users based on psychological and emotional characteristics, plus properties like the likely timeliness for different types of user. Based on the Limbic Map concept from marketing theory, the new proposals are in testing right now.
Chair of the Sports Content Working Group, Johan Lindgren of TT in Sweden, presented an update on SportsML and the work on SportsJS which is nearing a final version now that JSON Schema is soon able to support some new properties that we need to be able to validate Sports content.
Stuart Myles appeared again in his role as chair of the Rights Working Group, updating us on RightsML and where we can take it in the future, including the potential to use RightsML as the basis of blockchain-based rights management systems.
Then we had a focus on “new-generation editorial systems” including a great presentation from Peter Marsh of new IPTC member NEWSCYCLE Solutions on the history and state of the art of content management systems from Tandem-based SII workstations in the 1980s, all the way through to the current wave of headless CMSs as illustrated by this project by The Economist.
Stephane Guerrilot of AFP finished day one presenting AFP’s new-generation system, Iris, which enables AFP customers and partners to search for stories, video and images.
Stay tuned for a report on Day Two!