What has AI ever done for us…?
This year’s Nobel Prizes in both Physics and Chemistry celebrate the seminal contribution of artificial intelligence (AI) to science.
While the Physics prize recognizes John Hopfield and Geoffrey Hinton as pioneers of the technology, the Chemistry prize (shared between Demis Hassabis and John Jumper of Google DeepMind, and the University of Washington’s David Baker) highlights the significant contribution that the technology has had in helping to crack the Protein Folding Problem (PFP) – a conundrum that has puzzled scientists for over 50 years.
As outlined in my recent FEBS Letters paper Solving the Protein Folding Problem… the PFP is in fact three problems: The folding code concerns the amino acid interactions that drive protein folding. The kinetic question centres on how the folding process occurs so quickly, while the computational problem is focused on predicting the final protein shape, given only its amino acid sequence as the starting point.
While several advances have been made, over the last 20 years, in solving the computational problem (including the development of powerful distributed computing networks such as Folding@home and Foldit), perhaps the most significant recent development has been the release of Google DeepMind’s AlphaFold suite of machine learning algorithms. Designed as deep learning systems, these AI models can generate protein structure predictions (in seconds) that are comparable to those achieved (over weeks or months) using experimental approaches (such as X-Ray Crystallography, NMR Spectroscopy, and Cryo-Electron Microscopy).
Indeed, AlphaFold3, DeepMind’s latest iteration (developed in conjunction with Isomorphic Labs) not only accurately predicts a protein’s structure but also informs us of the likely interactions that occur between the protein and other small molecules, including DNA and RNA. This improved ability to predict protein complexes holds significant promise for the development of new and improved diagnostics and therapeutics – potentially heralding a new era in medicine.
While some will argue that the protein folding problem is still not completely solved (pointing to the fact that current AI models tell us little about the actual folding process, or the dynamic nature of proteins, etc.), AI, perhaps more than any other technology, has advanced our understanding of protein structure, providing us with a predicted structure for almost every protein on Earth!
However, to quote Mark Manson, “life is essentially an endless series of problems. The solution to one problem is merely the creation of another.” In this context, while revealing the structure of the protein universe is an undoubted boon, it also brings with it the problem of data overload.
How do we deal with the data deluge? Well, Google may have an AI based solution to that problem too…
Originally developed as Project Tailwind in 2023, NotebookLM was designed by Google as a virtual assistant to help navigate the transition from information to insight. A key distinguishing feature of NotebookLM, when compared to other Large Language Models (LLMs) such as ChatGPT for example, is that it is grounded in the information uploaded to it. This source-grounding (more technically referred to as Retrieval-Augmented Generation (RAG)) optimizes the LLM to specifically reference Reliable External Knowledge (REK; the supplied source material, such as peer reviewed academic papers for example), outside of its training data set, before generating a response. This source grounding helps to reduce the hallucination problem which has plagued other LLMs. Indeed, a recent study by Tozuka et al. (2024) showed that NotebookLM performed significantly better than GPT-4o in a lung cancer staging experiment, suggesting a possible role for NotebookLM in assisting radiologists with image diagnosis in real life clinical settings. Furthermore, the uploaded files and NotebookLM dialogue is not visible to other users and Google also asserts that it will not use any of the collected data to train new AI models. This is particularly important when uploading sensitive patient data, as is the case with the Tozuka et al. (2024) study.
Since its initial release NotebookLM has continued to expand, using Gemini 1.5’s multimodal capabilities to incorporate ever more sophisticated functionality. Once such modification is the addition of Audio Overview, a novel feature which allows uploaded documents to be converted into a podcast style audio output. The generated podcast, or ‘deep dive’, features humanized AI hosts discussing the source material in an informative and engaging manner. While by no means an exhaustive, or even objective, view of the topic (given that it is restricted to the source material), NotebookLM users can, nonetheless, guide the discussion – adjusting the depth, or specific direction, of the conversation.
The image below is a link to NotebookLM’s deep dive into the PFP using my FEBS Letters paper Solving the Protein Folding Problem… as the source material.
Pop in your AirPods and enjoy…
References:
TOZUKA, R., JOHNO, H., AMAKAWA, A., SATO, J., MUTO, M., SEKI, S., KOMABA, A. & ONISHI, H. 2024. Application of NotebookLM, a Large Language Model with Retrieval-Augmented Generation, for Lung Cancer Staging. arXiv preprint arXiv:2410.10869.
Image by Gerd Altmann from Pixabay.
Join the FEBS Network today
Joining the FEBS Network’s molecular life sciences community enables you to access special content on the site, present your profile, 'follow' contributors, 'comment' on and 'like' content, post your own content, and set up a tailored email digest for updates.