Rick Hao* says data scientists and machine learning engineers have some substantial challenges ahead of them to ensure that their models and infrastructure achieve the change they want to see.
One exciting possibility offered by artificial intelligence (AI) is its potential to crack some of the most difficult and important problems facing the science and engineering fields.
AI and science stand to complement each other very well, with the former seeking patterns in data and the latter dedicated to discovering fundamental principles that give rise to those patterns.
As a result, AI and science stand to massively unleash the productivity of scientific research and the pace of innovation in engineering.
For example:
- Biology: AI models such as DeepMind’s AlphaFold offer the opportunity to discover and catalog the structure of proteins, allowing professionals to unlock countless new drugs and medicines.
- Physics: AI models are emerging as the best candidates to handle crucial challenges in realizing nuclear fusion, such as real-time predictions of future plasma states during experiments and improving the calibration of equipment.
- Medicine: AI models are also excellent tools for medical imaging and diagnostics, with the potential to diagnose conditions such as dementia or Alzheimer’s far earlier than any other known method.
- Materials science: AI models are highly effective at predicting the properties of new materials, discovering new ways to synthesize materials and modeling how materials would perform in extreme conditions.
These major deep technological innovations have the potential to change the world.
However, to deliver on these goals, data scientists and machine learning engineers have some substantial challenges ahead of them to ensure that their models and infrastructure achieve the change they want to see.
Explainability
A key part of the scientific method is being able to interpret both the working and the result of an experiment and explain it.
This is essential to enabling other teams to repeat the experiment and verify findings.
It also allows non-experts and members of the public to understand the nature and potential of the results.
If an experiment cannot be easily interpreted or explained, then there is likely a major problem in further testing a discovery and also in popularizing and commercializing it.
When it comes to AI models based on neural networks, we should also treat inferences as experiments.
Even though a model is technically generating an inference based on patterns it has observed, there is often a degree of randomness and variance that can be expected in the output in question.
This means that understanding a model’s inferences requires the ability to understand the intermediate steps and the logic of a model.
This is an issue facing many AI models which leverage neural networks, as many currently serve as “black boxes” — the steps between a data’s input and a data’s output are not labelled, and there’s no capability to explain “why” it gravitated toward a particular inference.
As you can imagine, this is a major issue when it comes to making an AI model’s inferences explainable.
In effect, this risks limiting the ability to understand what a model is doing to data scientists that develop models, and the devops engineers that are responsible for deploying them on their computing and storage infrastructure.
This in turn creates a barrier to the scientific community being able to verify and peer review a finding.
But it’s also an issue when it comes to attempts to spin out, commercialize, or apply the fruits of research beyond the lab.
Researchers that want to get regulators or customers on board will find it difficult to get buy-in for their idea if they can’t clearly explain why and how they can justify their discovery in a layperson’s language.
And then there’s the issue of ensuring that an innovation is safe for use by the public, especially when it comes to biological or medical innovations.
Reproducibility
Another core principle in the scientific method is the ability to reproduce an experiment’s findings.
The ability to reproduce an experiment allows scientists to check that a result is not a falsification or a fluke, and that a putative explanation for a phenomenon is accurate.
This provides a way to “double-check” an experiment’s findings, ensuring that the broader academic community and the public can have confidence in the accuracy of an experiment.
However, AI has a major issue in this regard.
Minor tweaks in a model’s code and structure, slight variations in the training data it’s fed, or differences in the infrastructure it’s deployed on can result in models producing markedly different outputs.
This can make it difficult to have confidence in a model’s results.
But the reproducibility issue also can make it extremely difficult to scale a model up.
If a model is inflexible in its code, infrastructure, or inputs, then it’s very difficult to deploy it outside the research environment it was created in.
That’s a huge problem to moving innovations from the lab to industry and society at large.
Escaping the theoretical grip
The next issue is a less existential one — the embryonic nature of the field.
Papers are being continually published on leveraging AI in science and engineering, but many of them are still extremely theoretical and not too concerned with translating developments in the lab into practical real-world use cases.
This is an inevitable and important phase for most new technologies, but it’s illustrative of the state of AI in science and engineering.
AI is currently on the cusp of making tremendous discoveries, but most researchers are still treating it as a tool just for use in a lab context, rather than generating transformative innovations for use beyond the desks of researchers.
Ultimately, this is a passing issue, but a shift in mentality away from the theoretical and towards operational and implementation concerns will be key to realizing AI’s potential in this domain, and in addressing major challenges like explainability and reproducibility.
In the end, AI promises to help us make major breakthroughs in science and engineering if we take the issue of scaling it beyond the lab seriously.
*Rick Hao is the lead deep tech partner at Speedinvest.
This article first appeared at venturebeat.com.