“Open Science can be enzymatic. High-quality and intentionally sourced data make discovery go faster, especially when those data are public.”
At Astera we believe the future is open and that new technologies need new institutions for innovation. We’re thrilled to introduce John Wilbanks as our first Head of Data to build infrastructure for datasets in service of a more open future of science. Formerly the Head of Data at Biogen Digital Health and Head of Product at the Broad Institute of MIT and Harvard, John brings a wealth of experience in Open Science and a bold vision for how open data speeds knowledge creation.
John has been a key figure in Open Science for more than two decades through his leadership as Chief Commons Officer at Sage Bionetworks, VP of Science at Creative Commons, and as a co-founder of the Access2Research campaign for public access to publicly funded data and literature. At Astera, John will bolster our commitment to developing large, open datasets to catalyze scientific advances.
Recent progress in computing storage and processing can revolutionize research in disparate fields of science — but not without the right data. Our understanding of soil diversity, air quality, evolutionary biology, climate change, materials chemistry, and more might be transformed. Each scientific discipline, from computational mycology to quantum internet physics, could experience a step-change in their development with the right datasets and tools.
“What if you could go into a field that doesn't exist, that sits between fields, and birth the field computationally around open science? We're ready for this breakthrough in how we make knowledge.”
The goal of his evolving role will be to build the public domain datasets needed to underpin these breakthroughs, which Astera is committed to funding and supporting. John was recently interviewed about his vision and expectations of the future of science. An edited transcript of the conversation is below.
What do you hope to accomplish in your role with Astera?
Fundamental data resources like the Human Genome Project drove a lot of innovation because they allowed lots of people to do whatever kind of research they wanted to do. The data was public domain, it was standardized. My intention is very much in that spirit, but also builds on the emergence of massive scale computational power, storage, and collaboration: What are the other places where public data can really accelerate a scientific discipline?
What factors beyond just “more data” will matter here, or in other fields?
I think about the ecosystem and the tooling environment. With the Human Genome Project, the National Center for Biotechnology Information provided software tools to explore and analyze the genome. If you had a reasonable PC, you could download the reference genome and compute on it. You had an ecosystem that allowed you to move down the line from data to information to knowledge to wisdom. And that is a lesson for us – we can’t just release a bunch of data and hope there is valuable use to be made, but instead we also have to think about the software and compute environment around those data.
Now in some cases maybe the tool environment is strong enough and you just need to put the data out there. But in other cases, we might need to also be thinking about how we make sure that we have the right tools, we have the right algorithms, we have enough compute, we have all those other pieces.
You’ve been a pioneer in Open Science; how do you see the goals and impact of the movement changing over time?
My early experiences with Open Science were at Creative Commons, and as such very centered around publishing. Making publications open-access. That kind of moved on to making the data open-access. And I think the unique opportunity now is really core to Astera’s vision — open data coupled with the explosion of capacity to compute will accelerate what is possible with science, often in new kinds and groups of institutions. Whether it’s data creation dropping in price, or just that it's much easier to push those raw data through statistics now, when you pair that capacity to the publicness of what we’re doing, there's an opportunity to really blow stuff out. Lowering the price of transformative science for everyone in the way that gigantic taxpayer funded public data has done, but building data with a really different approach. We want to build public goods through creative partnerships, fee-for-service approaches, working with startups, not as a by-product of funding research itself.
When I was introduced to Open Science, the operational point of Open Science was to maximize everything I could download onto my computer, or to my local compute. Software, data, libraries. Now, to me, Open Science is ready to drive institutional change. We do science in institutions the way we used to do computing in institutions in the 70s and into the 80s. Computers were the size of rooms, so they lived at universities because that’s where the electricity and the budgets were. Many scientific disciplines are going through some of that same miniaturization and decentralization. And I think we have this opportunity to say Open Science is about a completely different way of science-ing that is as much technologically mediated as academically mediated.
So it’s not simply that data is accelerating science, but that we may adopt new ways of doing science? What are the implications of that kind of shift?
Many of us are not comfortable with uncertainty, and we are not trained to think about risk estimation. Having public data and learning how to communicate better is going to be really important.
But really we are talking about a new way of doing science where we are generating data for unknown scientists to run unknown analyses. We have to think about the good of what can happen with that, and we also have to think carefully about the risks there — what fields of science can we find high impact for open data, and mitigate the risk for its being open.
That’s why I’m here — Astera is about enabling open science to push the knowledge creation world faster.
Great intro, excited to hear more. On a side note: that’s a damn good Substack title, props to whoever came up with it!