David Worlock on Digital Revolution: AI will be the future of metadata Artwork

Insights Xchange: Conversations Shaping Academic Research

Welcome to the Insights Xchange podcast, brought to you by Cactus Communications (CACTUS). Hosted by Nikesh Gosalia, this podcast is your guide to the world of research and academic publishing. Tune in to hear lively discussions with experts from the academic and publishing realms.

Earlier known as All Things SciComm, the new series covers a variety of topics, from the latest trends in academic publishing to critical issues faced by researchers in an increasingly AI-driven world. Join us for insightful conversations and expert perspectives that will help you navigate the exciting world of academia. Whether you're a researcher, publisher, or just curious about academic insights, this podcast hopes to be your go-to source for understanding the evolving landscape of academic publishing.

Stay ahead, stay informed, and let's explore the fascinating world of research and knowledge together!

All Episodes

Insights Xchange: Conversations Shaping Academic Research

David Worlock on Digital Revolution: AI will be the future of metadata

March 07, 2022 • ScienceTalks • Season 1 • Episode 2

In this episode, Nikesh Gosalia talks to David Worlock, who started one of the first online publishing services in the mid-80s. David gives his take on how different access to information is today and how it will evolve in the near future. David calls this “The Digital Revolution”. He explains how AI will help obtain information, how metadata is a critical factor, and what role “nanopublishing” will play.

David Worlock is a Cambridge History graduate. He was CEO of the pioneer development of EUROLEX (1980–1985), the UK’s first online service for lawyers. In 2013, the Professional Publishers Association (PPA) honored him with the George Henderson Award for lifetime achievement for his work in publishing and information marketplaces. He can be reached at:

Website: https://www.davidworlock.com/
Blog: https://www.davidworlock.com/category/blog/
Twitter: https://mobile.twitter.com/dworlock
LinkedIn: https://www.linkedin.com/in/davidworlock/

Insights Xchange is a fortnightly podcast brought to you by Cactus Communications (CACTUS). Follow us:

Nikesh Gosalia

There is another thing that we are seeing, David, and everyone has been talking about the use of AI and automation. We all know the potential of it. But in your opinion, do you think AI can really transform the research cycle, the research communication, workflows, processes? What do you think?

David Worlock

I think about 5 years ago, in the small port town of Aarhus, in Denmark, I saw the light in this particular question. I met then with a company called UNSILO, now a part of your own practice company which I think was a hugely wise decision. I met with a company called UNSILO. They showed me the way they were developing ways in which you could check on an article's effectiveness as a piece of science publishing. The 25 checks which they have since developed widely used now in the publishing industry are a semi-automated peer review, it seems to me. But they are the antecedents of what, to me will become an increasingly automated peer review, an ongoing peer review process.

When I have described a few moments ago that the revolution I see taking place in the fragmentation of the article and the way in which that all develops, this cannot happen without the intensive use of AI.

That is, well, I think there is a two-part process going on here.

One is around our ability to see the connectivities in the marketplace, I referred to knowledge graphs of research groups a moment ago.

The other is about the pure automation of process. I do not believe in 5 years' time that anybody will expect an article to be edited by a human being. I think that would be absurd. I really do think that we are going to see increasing amounts of automated acceptance systems. Already, you see these happening at Wiley, and elsewhere. The Sage have one. From those early systems, you will see the development of processing and editorial systems which will enable the academic to see at any one time where his research work is in process. I think it will take a lot of the time out of the process.

When I reflect that some very high reputation journal publishers are still taking 2 years to produce articles, this seems to me bringing the 19th century to bear on the 21st. This cannot go on. The automation of process there will be extremely important, but it will have another effect. It will speed up, again, the research cycle. The more rapidly the research comes to market, the more rapidly the turnover of argument and research takes place. We must expect other effects from that. Incidentally, I expect a lot of this automation of process work to come from the Chinese market. I think that we see huge investments in China in the AI environment, and I expect to see really heavyweight contributions from that direction.

Nikesh Gosalia

Very fascinating, as always David.

Just before I move on to the next set of questions, just to kind of recap what we have spoken about and two or three key insights.

One is that, like you mentioned, David, I think the importance of metadata, and that's only going to grow.

The other is the fact that the early career researchers are going to be more and more demanding. We are going to see a greater movement or revolution towards impact, engagement, dissemination, everything moving online.

Finally, just the automation and the use of artificial intelligence, all through the workflow and the importance of it and like you said, David, 5 years later, perhaps, when we look back some of the processes, I mean, probably, they will not exist and we will have a laugh over it, saying that we used to do this manually, and that's already happening.

David Worlock

Can I add one more major trend to those?

Nikesh Gosalia

Yes, please.

David Worlock

Which we haven't given the emphasis that we should have done in this conversation, Nikesh, and that is data. I want to reflect upon the idea that we briefly mentioned machines and machine operability. Machines, when they look at these articles and that the evidential or statistical data, which is attached to them, they will treat everything as data.

When we, a moment ago, talked about the structure of the research article, we were telling a story. This is the way we communicate, we human beings. We tell stories to each other. It's a narrative world that we live in.

Here is the story in the article which begins with a claim, a hypothesis, and ends with a proof or a denial and a result. It's a story of research process.

Machines do not deal in stories. When we have metadata, we have to mark up the data of the article and the data of the evidence in entirely different ways to make sure that machine speaking to machine can communicate very precisely, what was in all of that data. Their metadata is, again, supremely important. We are making a good start on this. I mean, I think the FAIR Foundation and the FAIR Principles were a wonderful start really, and so I think sciences has responded. We have a set of protocols upon which to establish data exchange. That is invaluable.

Now, we need to build those protocols into the way in which science happens, so that seamlessly the machine intelligence that we are employing on a particular inquiry is able to connect with the data wherever it is held. This is not just text mining or data mining. That was the low spec we walked over to get into this world. Machine-to-machine operability is a totally different matter. We are, I think, generally woefully unprepared. Although we do have the implementation of FAIR as a really good guideline to get this going. I expect to see real international action in this area again in the next few years.

Nikesh Gosalia

Thank you, David. I mean, that was brilliant. I think that's one of the best ways that I have heard contextualizing the importance of data. I think you have done it so well. I am sure the listeners will benefit from this as well, David.

A couple of quick questions.

What are one or two emerging trends that you are most excited about if you look at the next 3 or 5 years?

David Worlock

Well, obviously, the data trend is critical. But actually, I also look and, again, from the APE Conference last week, I take much encouragement. I also look at much better ways of cleaning up science. We have volunteers, Retraction Watch, excellent work but not pervasive. We don't have any internationally recognized convention about withdrawn articles. When the article is withdrawn, the publisher will issue a notice. That notice is not always associated with the article. It's sometimes not even linked to the article and although the publisher can say my conscience is clear, and I posted the withdrawal notice, the fact is that the unwary may put their foot on a landmine at any time.

At a slightly less intense level, reproducibility is going to become a real issue. We have to move from negative to positive. The research, which is published, which fails to demonstrate reproducibility, has not normally been published in leading journals, just as articles, which did prove reproducibility, have not been published.

Now, we have to get out of this habit of mind of thinking that that is somehow lesser science. That is proper science needs to be notified and it needs to be notified on the original article. It is a disgrace that a researcher can go to an article without being able to immediately see whether other scholars have been able to prove that point for themselves or whether everybody has failed to reproduce that evidence themselves. We have to get much, much better at that.

In part, that is again, bringing us back to my issues about metadata because those linkages should be in place. But they are not. Of course, in the old physical world, it couldn't be because they would take an age to look up. But it brings me then, I think, to this wider point.

There are predictions, especially in the US, who say we are publishing too much. They say preprint servers are full of stuff which never gets into journals. They say there is a sort of fog of bad data in the system. Well, the way to disperse the fog is not to shine lights into it, but to have routes that you can follow. Good metadata gives you routes to follow.

It seems to me very, very important that we publish as much as we possibly can. Publish in the sense of make available, make searchable, make part of the searchable corpus, as we build better intelligent systems for scanning this material. Then, we can find all of these reproducibility issues. Then, researchers can see exactly what has happened in the lifetime of the article that they are looking at. That we don't have and that we have to build.

Don't let's publish less. Let's make sure everything gets published and use our intelligence to sort out what matters.

Nikesh Gosalia

Absolutely! I agree with you, David.

Last but not the least, David, just for the benefit of our listeners, there is so much information available out there as far as the scholarly publishing industry is concerned. There are reports. There are articles, blogposts. In your opinion, what is the advice you have, one or two tips that you have, for listeners to keep themselves updated about everything that has been happening in the industry?

David Worlock

Well, I think it is easier to be well advised if you are well involved. In other words, now that the whole cycle is involved, then the researcher's role is to taste the stream of events in the research ecosystem at every possible stage.

We spoke a little bit earlier about the conferences and the seminars coming online. The researcher has to be a participant there. Even earlier, the researcher will be a participant in the process of putting in research grants and seeking backing from the university, from other institutions, from commercial interests, and from state and foundation funding sources. These are powerful players, and one has to interact with them first and foremost. But through the process of research itself, the researcher, the agile researcher, and this is what being a researcher today calls for, systems agility.

The agile researcher will find that they are blogging, they are on ResearchGate or academia.edu, looking at material, commenting on other people's material, they are in seminars, they are doing posters, they are in public sector conferences.

Then, of course, they are making their own research available and indeed writing books. We haven't said a word about books, but books are kind of interesting. I think that the books will morph into something else again.

A book is a very interesting place for automating the collection of a whole variety of pieces of information on a particular topic in one place. Now, just as our articles contain videos, and manipulable data and graphs whose axes can be changed, and interviews and everything else, so our books can. I sometimes balk at the idea of continuing to call them books. It's a sort of misleading word in an odd way, just as I think the word article will be replaced. But in this way, so what I see here is a participatory chain.

Because you are participating, you will be reading and no one part of the chain is more important than the other. The societies blog, or the professional bodies newsletters, or all of these other different pieces of material can now be brought into the mainstream.

Nikesh Gosalia

Absolutely!

David Worlock

Because they can be findable and are scorable. They can be part of your own opinion forming. But they are also part of your reputation forming as a researcher and building your online reputation is the other side of the coin of building your online information awareness.

Nikesh Gosalia

Lots to reflect on.

David Worlock

I am just very grateful that having started reading galley proofs as a sub-editor in 1967, I have emerged into this digital world of 2021, with its huge, bright prospects of so much improving our ability to communicate the scientific progress of the day.

Nikesh Gosalia

Yes. But that's huge credit to you, David, for having such an open mind and being so excited about the revolution, the disruption, because one can often get really fearful, start worrying too much about all the changes that possibly can happen, and as a result, decide to move. But it's very exciting, very motivating to see you talk so passionately about the disruption that is already happening.

David Worlock

Well, I am very, very excited by it. I reflect that as a consultant in the last 20 years, there is a real luxury in being able to predict things which at my age one may never see. On the other hand, I shall write them all down, and you can score me later on.

Nikesh Gosalia

Brilliant!

Thank you, David, for being our guest on All Things SciComm. It's been a real pleasure. As always, lots to reflect on.

David Worlock

Well, I very much enjoyed the conversation, Nikesh, and as always, and I do say to all those hundreds of thousands of researchers who work with you, are at the very beginning of a challenging world. But because it's challenging, you can get more out of it in a digital world than you ever dreamt. I think that's most exciting.

Nikesh Gosalia

Yes. Absolutely, David. I think we echo the sentiment. We are very excited as well about the future and doing our bit, hopefully, towards making that a reality.

Thank you everyone for joining us. I hope all the listeners enjoy this podcast as much as I have enjoyed hosting it. Stay tuned for our next episode.