In the previous short article, I mentioned that conceptual strategies for designing and planning experiments beyond their technical execution are rarely taught and often acquired through trial and error. This gap can lead to suboptimal experimental design, from inappropriate controls to an inability to formally test or refute the core hypothesis. I also highlighted that different layers of thinking exist in research, with the most complex involving how we approach entirely new biological questions.
In this piece, I want to explore how the recent explosion of genome editing, multiomics, and computational analyses has reshaped how we think about experiments.
Let’s begin with a widely used approach when studying the function of a gene. Genome editing technologies, including the generation of genetically modified animals, have transformed our ability to model human diseases and address this question. However, this has led to a widely accepted assumption: that the most direct way to understand a gene’s function is to knock it out. It’s not uncommon to read proposals or papers with a narrative like this: “Gene X is (over)expressed in a particular disease. To investigate its function, we generated a knockout.”
While knocking out a gene can be informative, particularly when its loss phenocopies a disease, it may not reliably reveal the gene's function. Strong evidence suggests that gene knockouts often trigger compensatory adaptations, so the observed phenotypes may reflect cellular responses to perturbation rather than the gene’s actual role. For instance, many gene deletions are lethal, but this doesn’t mean those genes actively prevent cell death. Even more misleading is the inference that if loss of gene X blocks process Y, then gene X must drive process Y. For example, if knocking out a gene impairs tumour formation, it's sometimes concluded that the gene promotes cancer, when in fact, the phenotype might stem from stress responses or altered network dynamics, not direct oncogenic function.
Therefore, deleting a gene may not be the most effective strategy for studying its function. A more suitable approach depending on the specific question, might be to express the gene in a controlled context to assess its function under physiologically relevant conditions. Even though genome editing strategies are fascinating and powerful, we shouldn’t use it indiscriminately. Don’t fall prey to the law of the hammer, a cognitive bias that involves an over-reliance on a familiar tool: "it is tempting, if the only tool you have is a hammer (CRISPR), to treat everything as if it were a nail (knock it out)."
Often, after generating a knockout, researchers proceed with a growing number of omics analyses, commonly RNA-seq, and sometimes proteomics or metabolomics. The expectation is that these data will somehow reveal what the gene does. Pathway enrichment analyses follow, and from these, one or more pathways are selected, often based on prior knowledge or prevailing trends, as likely mediators of the gene’s effect. Even more in fashion now is resorting to single-cell analyses. As we started to realise that the genotype-phenotype correlation is more complex and heterogeneous than what we initially hoped, we created an even more complex scenario where each cell in our experimental condition will exhibit a slightly different behaviour, captured only by visualisation tools to reduce data dimensionality. Yet, these analyses will show only with higher granularity the response to the gene silencing, rather than its functions. The root of the problem won’t change if we apply single cell approaches to it. In addition, other problems arise. For instance, is this apparent heterogeneity just the result of technical noise, or a true readout of biological complexity. If the latter, is this heterogeneity reflected at the level of cell’s phenotype? Finally, considering that most of these analyses depend on not yet standardised approaches, is this heterogeneoty eventually reproducible?
To gain further insight, someone might propose to “integrate” the datasets, the now-ubiquitous multi-omic integration. But this approach carries conceptual challenges. First, as mentioned earlier, omics analyses typically capture the downstream response to a perturbation, not the direct function of the gene. Second, transcriptomics won’t tell us what a protein actually does, especially if it’s an enzyme. Metabolomics might help, but the compensatory rewiring that occurs following gene deletion can obscure the direct activity of the missing enzyme.
When it comes to data integration, it’s often unclear what is actually meant by “integration.” In reality, the increasing sparsity and heterogeneity of molecular data from transcriptomics to metabolomics preclude true integration. At best, researchers can look for targeted correlations or attempt to overlay data onto models—such as metabolic networks enriched with enzyme expression data—but these models are often incomplete or based on strong assumptions.
In sum, both conceptual and technical hurdles undermine the ability to use multi-omics to deduce gene function. Just because omics tools are available and affordable does not mean they are the right tools for every question.
Before the omics era, scientists relied more heavily on hypothesis-driven experimentation. While this came with its own risks—such as confirmation bias—it at least followed a logically coherent framework: a clear hypothesis, tested by a focused experiment. Today, we often delegate hypothesis generation to omics analyses, hoping the data will reveal something unexpected. After all, you will find something looking at 300000 genes, right? But this frequently results in large, noisy datasets that are difficult to interpret and disconnected from the original biological question.
In other words, omics should not be used as substitutes for thinking and should be carefully designed, see here for instance. They should expand our perspective—but only if the right question is asked in the first place. Omics data will not provide meaningful answers unless guided by a clear conceptual framework.