Reforming research culture to incentivise open data, transparency and openness transcript

Daniel Ridge: In this episode we're joined by David Mellor to discuss themes around data sharing, openness and transparency in open research. David is leader of policy initiatives at the Center for Open Science. We discuss the findings of a recent global survey which Emerald publishing commissioned in August 2020, which gathered views on change within academia. The survey highlighted a number of themes including Open Data, barriers to publishing open access and measures of impact.

Thanks David. Thank you so much for joining us.

David Mellor: You're welcome. Thank you so much for the invitation.

DR: So, to begin with, can you tell us a little bit about the Center for Open Science?

DM: Sure, yeah we are a nonprofit company with a mission to increase trust and credibility and reproducibility in scientific research. We've been around for about seven years, and we work to achieve that mission through three kind of main strategies. We're most well-known for some of our reproducibility projects where we attempt to replicate the results of previously published findings. So, the reproducibility projects in psychology, in social sciences and cancer biology, aim to take previously published findings and see if we can come to the same results using the same methods as the original authors and really the lessons learned from those practices, the barriers to reproducibility that pop up, inform our policy and culture change, and advocacy work. So, we work in order to apply lessons learned from those process to what types of ideal practices should be encouraged by funders of research and publishers of research. And finally, we work to enable the types of activities, for which we advocate. So, a lot of what we're about is large collaborations, data sharing specifying in advance how work should go and so we build and maintain the OSF, which is online platform that enables data sharing, collaboration and things like preprints.

DR: Okay yeah, I really want to talk about OA, aspects of what you do, open access, but I'm really interested, first of all, just to kind of hit it off right off the bat what is wrong with academia, because I think that's really actually a really great place to start and then see where open access fits into that, so, I know that that you guys work a lot on trying to make reforms with an academic culture. So, what do you think are some of the biggest problems with an academic culture, academic production, consumption, that you think need to be reformed?

DM: Yeah, that's a really tough nut to crack, but it sort of comes down to the fact that individuals are kind of rewarded for a quality over quantity mindset there's number of citations or number of publications or number of grants, comment, which really distorts the desire that the scientific community that individuals have to focus more on quality over quantity. And when you do, when you when you ask researchers, tell me what you're focused, quality over quantity, we all kind of self-identify, we’re all focused on the quality over quantity and when ask them same people tell me what your colleagues or your competitors are focused on that they'll often say well they're just focused on quantity over quality, and so there's that real mismatch between what we as individual scientists value. And what we perceive the scientific community is acting on, so it becomes kind of a collective action problem, we all want to be more open share more freely published, just the best most rigorous research, but it has that perception of almost diminishing one's own reputation. And so, we focus on the most selective journals or the most surprising results, even if those comes at the expense of some good quality indicators that we would like to ideally see more of.

DR: Well, what about things like Impact Factor, and things that are constraining publications by researchers?

DM: Yeah. Impact Factor is a great example of something that everybody loves to hate. It's a, it's been around for decades, of course, and it has a utility it's designed by the library community to sort of measure which, how many citations on average individual journal is getting, and to use that to sort of devote where we will subscribe to use our limited collections budget to use that. But it's really been, it's come to represent an aura of quality or of prestige, really is what it is perceived as, these days. You're taking a measure of what's the average number of citations that articles in this journal come at every year, and the higher, that is, it seems like that's going to be the marker of how good the individual articles are in that journal. And really, they have no relation to each other. There's no kind of rational argument that the impact factor would have any sort of relationship to the importance of the research or the quality or rigor that it, that it connects to. So, it's a bit conflation of how we would ideally evaluate researchers and research with how we, we actually end up doing it, it's a shortcut, it's an easy shortcut that we all take. It persists, because we can immediately see the perceived prestige that is attached to an article once it gets accepted at the high Impact Factor journal, because it takes time and effort, and those are both resources that everybody has very little of to do a better evaluation of an individual research study. It really takes being able to look under the cover, open the hood and explore the data, the conclusions, the rigor of the work to really determine how important and how rigorous any given empirical research article is.

DR: Well, the Center for the Open Sciences is offering a sort of counter to the Journal Impact Factor which is the Transparency and Openness Promotion Factor, right, or TOP.

DM: Yeah, the TOP Factor.

DR: Can you tell us about that how it works?

DM: Right the TOP factor evaluates the journal policies and author guidelines and evaluates them against the framework provided by the TOP Guidelines, so, the TOP guidelines has eight specific recommendations for how journals and funders should incentivise or require activities such as data sharing materials sharing analytical code sharing and other things such as the degree to which the journal or the funder encourages replication studies, which I can get into a little bit of a few minutes and activities such as pre-registration, which is specifying in advance how a study is going to be conducted before you know what the results look like. And these aspects of journal guidelines really get down to how completely the study was reported and what process, the author's went through to design conduct an analyse the study, and with the replication standard, your replication studies are kind of the bread and butter how science is kind of designed to evaluate credibility, but it's something that very few journals, want to publish and very few funders want to have fun because it's seen as relatively boring compared to being the first to discover some new correlation or some new impact. So, the Top Factor is really designed as an alternative to measures such as Altmetrics or their Journal Impact Factor that really evaluate the attention that's being given to or brought to an article or to a journal. And it's really designed to focus on the policies and practices that align with these core values of how science should work. So, does the journal require disclosure of whether or not these practices occur? That's kind of a level one approach and worth one point each for every time they require disclosure of one of these practices. Does it mandate something like data sharing? You know, so that's what we call two points or level two. Or does it actually verify sometimes there's a are some societies that will look at the data, look at the way that it was analysed and check to see if somebody else can come to the same results as are being reported in the journal. And that takes a lot of time of course and a lot of effort. Not many societies or journals or publishers can devote that to each article, but it is a rich goal that we advocate for. And so the top factor really evaluates based on those criteria and gives a number of how many points this journal policies, align with. And we've evaluated about 450 journals so far in the TOP factor database. Many, many of them I think I think the modal response is still just about a zero or a one. So very few journals, taking concrete steps to require disclosure of these types of practices, and that's zero or one out of a possible 29 points, so that's not making much progress and we really encourage journals to be around that the 10 to 15 range.

DR: So, what you’re really trying to do is encourage change through this.

DM: It's a big experiment to see if we can allow comparison between disciplines between journals and publishers. When editors see this they say, “oh, we have this journal that I know pretty well is doing this, what does it take to raise my Top Factor, a couple of points to match the peers I have in my community.”

DR: What sort of feedback have you been getting from journal editors and publishers about Top Factor?

DM: There's two bins, I would say, a lot of are saying finally this is a way that allows us to really focus on the types of science that mattered. These are issues that really get to the important things that science should be focusing on so thank you very much. It makes a good way to show the steps that we're taking, or to allow us to discover which other journals are doing a little better than we are so we can basically copy from them and apply those policies. Those are the types of conversations that we have had a lot of over the past year in journal editors, talking about these practices, and there's been a lot of pushback journals that have traditionally relied on impact factor to show their, their quality or prestige, don't necessarily like a lot of the focus that we have on this, so it's controversial in that way.

DR: Well how do these evaluation systems like TOP and Impact Factor, how do they figure into the overall ecosystem of open research and open access?

DM: They are making a lot of noise in some circles and what we're working to do is bring awareness of it to more and more decision makers, a lot of editors are talking about this during the editorial board meetings. It's not on the radar of everyone but steps like this are ways to raise awareness about how it's important to focus on these types of issues. There are a couple of other evaluators that really focus a little bit more on some of the issues around preprints and open access and how open it is. We do advocate and support a lot of preprint servers to help you quickly disseminate results free from some of the gatekeeping and from the subscription process that occur otherwise. Top Factor itself doesn't yet include measures of that but will evolve over time and so I can imagine that making its way into it in the future.

DR: I n terms of open access we’ve talked about making reforms in academia, and I'm wondering how open access is part of that reform, I mean obviously there's been a big shift and people willing to accept it, willing to publish open access, whether they're paying an article processing charge or they're publishing in a gateway that does a pre-publication before review. So, how do you see open access falling into the evolution of this change that we're hoping to see in academia?

DM: I think it's at the forefront of how ideas disseminate through sometimes a sluggish academic culture. It's only been, I don't know how, less than 20 years, less than one academic generation, so to speak, that some of the biggest moves in open access have occurred. And some of those initial reactions are still felt by some of the old players but some of the initial reactions will, “Oh you're, you have to get paid to get published in there, How good can the research actually be,” you know, a lot of that is being diminished by allow the good work that open access publishers are focusing on activities such as transparency into the research lifecycle, so the more open, the whole research lifecycle is, the less room for those types of disparaging comments are because you can just go in and look at how good the research was and evaluate for yourself. And so, I think the move towards open research process can complement very nicely the move towards open access because it can directly address some of the concerns and some of the more senior voices have heard in the past, and most younger scientists coming up through the pipeline are very familiar and appreciate the open access model. We know that with Plan S a lot of funders are really getting behind you're supporting your work. Part of that comes to supporting the publication of it so that it can be read by everybody.

DR: You mentioned Plan S just now. I'm curious how a group like the Center for Open Science responds to a plan like that and if there are any plans on the horizon in the United States, for example, that we might have to react to?

DR: Nothing critical that's, I think, relatively well known there's a lot of discussion about whether or not federal funders would do something similar to plan s and there's move towards that those reaction against that flip.

DR: For those of the audience who don't know can you explain a little bit about plan S, just a brief explanation?

DM: Yeah Plan S, it was a, it came out about 2018 by several key foundations, I think Welcome Trust being one of the biggest ones behind it, saying, coming next year, if we're supporting your work, then we expect that the main outputs of the work be published open access so that it's not behind the paywall so that all of society can read and benefit from the work that we're supporting in the work that you're conducting. And, yeah, as you can imagine there's a lot of enthusiasm behind it, there's a lot of pushback behind it by some of the research community that really want to focus on. Don't come off a little bit pejorative, but just cutting through some of the spin, do they really want to publish in traditionally High Impact Factor journal is an open access so how can you tell me where to where to publish. If this is going to diminish the venues where I would prefer to publish and it all comes back down to those perceptions of prestige, which I would re-emphasise is really a superficial perception.

DR: Well how do these things, how does open access effects, or maybe attitudes might be different. I'm just thinking of different communities like the science community or business studies or the liberal arts. I imagine that they have different views of open access.

DM: I can't speak on behalf of really any of them. And one thing that happens actually quite frequently, is that I might hear from or a funder might hear from, you know, a well-known, member of a particular community and ascribe that feeling to the entire community so it's kind of these anecdotes that build up that are generated by some of the most vocal or most well-known or most prominent members of a particular community. One of the ways around that from, and that's happened very frequently with Plan S as one concrete example you “How can you tell us to do this”, and then it seems like the entire community is voicing against it. So, the real way around that is with better data better opinions about what the whole community actually thinks and this happened recently with a couple of funders they surveyed their researcher community and say oh we're thinking about something like Plan S. We really want articles to be published open access. How would you feel about that? And some folks didn't like it, but a big majority of researchers did like it, they want their articles to be read they think it's silly that they put in a whole lot of work on it, and then it's , you don't belong to a university library consortium that you can't get access to it. They're often happy to share their articles with members of the public who happen to know to email, when the authors for a copy of the paper. That's not very widely known outside of the academic community, so a lot of researchers really support things like Plan S., they're happy to have things that are open, and we want to get a more accurate perception about the whole scientific community thinks about issues such as open access, so we're working on the open scholarship survey as a way for funders to survey their community, or university administrators to survey their researchers on issues around an open access, open data replications and so forth.

DR: Well, there's definitely a push and pull between production and consumption of this research. So, I'm wondering how user experience plays a role in how data is being used, the different kinds of datasets that we want to produce for consumers.

DM: Yeah, that the issues around Open Data are so interesting and so complex because of one of the ways that you just mentioned, and even that simple dichotomy really sort of breaks down in different ways as you look at it so when a researcher collects a data set. Who does that belong to, does it belong to the researcher does it belong to the funder that supported their work? The university where the work was conducted? If it involves humans, which obviously a lot of research, does it belong to the individual participants who worked in that study? Sometimes they are, are patients, sometimes they're undergraduates in psychology lab. Sometimes they are members of the public we're filling out surveys on a variety of topics. And so, who are the producers of that is will vary, you'll get 10 different answers if you ask 10 different people and falling back on some of these principles that have been around for four decades that the Belmont principles are some of the foundational principles that a lot of researchers look to ask a when considering these types of ethical questions to ask who the stakeholders are and what's the most equitable and just way to take their rights into account, and to make sure that the data that they provide for example is protected to the degree that it should be protected and use the greatest beneficence of the individuals and for society at large. So, these are complex issues that are solved through listening to all the stakeholders and applying solutions that aren't one size fits all, but that are focused on making the safest and most use of data to the widest number of folks.

DR: I know that it's difficult sometimes to get researchers to publish their data just as data sets. How are you trying to encourage that?

DM: Yeah, through a lot of different channels is that the short answer, but I'll put them into two different bins of benefits that we tend to see so let me just step back and begin with kind of a typical author's response when there's a lot of hesitancy around data sharing.

DR: I'll just interject there with some of the stats on that, you know, so, Emerald did this large global survey, and 7% of respondents admit they didn't know how to share data. And then that also broke down to region specific where in the Middle East in North Africa, it was 16%. We're unfamiliar with how to do it. So, there is that that lack of knowledge, just from the, from the get go.

DM: Yeah, lack of knowledge is a barrier, lack of incentive to do so. There's no requirement there's no additional credit given or there's a perception that no credit is given for sharing data, so there's no perceived benefit there's no perceived requirement or incentive for doing so. And even the most well intentioned are the one who has the most desire to do so can very easily have a knowledge gap just don't know the right way to do it is to put it all in the paper is tables and figures well that would that would look horrendous heparin in supplemental materials was often get detached from articles and make it hard to reuse. So there's a knowledge, there's an incentive gap and trying to sweep all that aside, looking at the strategies to get over those barriers and hurdles fall into a couple of different humps. And it comes down to a lot of what we work on with the Top guidelines so the data policies that we work on with funders and with publishers, start with making sure that credit is given and recognition is given for when these activities take place so we know that data sharing isn't required in too many journals or in too many institutions, but when it does happen, it's something that is kind of universally recognised as a good thing for the community so let's give additional recognition or credit or visibility to those who are taking this voluntary step and that meshes with the badging program, very nicely where there's just an icon that open data is available to support these underlying findings and that's just a visual reminder that this data. These findings are a little bit more transparently reported than are typically seen.

DR: There also seems to be concerned with the content of the data, more than half the respondents to the Emerald survey, 61% in North America said they were concerned over datasets that contain sensitive or personal information, so that is there is that aspect of it isn't there?

DM: Yeah, and that's a, it's a very persistent concern that comes up legitimately all the time. What our main recommendation is in that area comes down to metadata, essentially. First of all, evaluate whether or not, data that are being shared actually are sensitive or there would be some concern, sharing that data, and that really starts when you recruit participants into the research pool, ask them if they would like their data shared a vast majority of research participants and do want that data shared because they see that as being a benefit to the whole scientific process, and those that don't feel those can be separated out as they should be if, if that data belongs to the person who's providing it to really. So, it really begins at that step, later on in the process. Sometimes there are simple steps that can be taken to remove identifying information just to make sure that that comes out, where it should. And when that is not possible there are still steps that can be taken to accrue some of the benefits of data sharing. So at the very least, sharing all of the metadata around the data set it precisely what the variables were and how they were included in the analysis is a step that's often not included in many manuscripts, but those are in ours are Top guidelines of what to do when data can’t be shared.

DR: Yeah, what do you think publishers should be doing?

DM: I think publishers have a role in a couple of different areas. I think they should help shift the expectation that transparency should be the default, and if it can't be transparent then here are three things you can do, you know, instead of sharing the full data set, re-shift expectations. There's a lot of role for training. If you don't know how to share data set here are three links to guide you through the process. We are really getting back to the Top Factor, a one or two point on the top factor scale is requiring disclosure, which is simply a data availability statement, you know, tell us if data are available, and if so, how to get them. That's a step above what we see as what most journals are taking so I think that sort of level one approach is the bare minimum. And then sort of the level two approach is that requirement to make data available. As long as ethical concerns don't prevent it. And when they do prevent it, here are the steps you can take that level two approach does take a little bit more work because you had to have this kind of exceptions in place, and much more clear guidance for what to do when those pop up, but it is possible and I think it should be on the roadmap of moving from a kind of level one to level two world, where we see transparency becoming more and more of the expectation.

DR: So, I think a really good point to end on would be what should researchers do. So what advice would you have to researchers who come to you and say that they want to publish open access?

DM: So, I'm going to answer that question, but I'm going to step back one minute and we'll come back to it. What we advocate for is kind of a culture change of making these types of activities possible and expected and part of a normal community. Before we start seeing them being part of the required set of activities so early on, I would say to researchers. Find peers find colleagues find groups of like-minded colleagues who are working are grappling with these similar issues directly and see what they're doing and see, you know, where they're publishing or where they are having either a good publishing experience, or what types of research practices, they are taking on that are, that are showing benefits. One thing I wanted to mention earlier that I got sidetracked on, and I'll come back to your initial question I promise was kind of some of the overall benefits of data sharing, often come back to the individual researcher themselves and not the wider scientific community, just because the process of getting ready to share a data set, involves a lot of what we call data hygiene getting everything set and ready to go so that others can take a look at it. Those are good steps that transparency and incentivises. Likewise, when deciding where to publish, you'll get more citations more people will be able to see the impact of your work. We know that policy makers themselves in the halls of Congress often can't get the articles that they want to say and these are the people that we really want to be using evidence to inform their decision making. And so, I sort of ask yourself, who do you want to be able to get to this information and let that inform your opinion. Check out top factor you have journals that are taking great steps to focus on these core issues of how science should be conducted our art journal that I would say are better at focusing on what matters. So those are some of the kind of the roadmap of suggestions I would give to a researcher in that place.

DR: Great, well thank you so much for talking to me about this today.

DM: I'm really happy to participate. Thank you for the invitation.

DR: We hope you enjoyed this podcast and hearing from David. Keep a lookout for more podcasts coming up which will delve deeper into number of open research topics.