Recombination is a common form of ideation that involves breaking down and blending existing ideas across domains to create novel solutions.
A hallmark of human innovation is the process of recombination---creating original ideas by integrating elements of existing mechanisms and concepts. In this work, we automatically mine the scientific literature and build CHIMERA: a large-scale knowledge base (KB) of recombination examples. CHIMERA can be used to empirically explore at scale how scientists recombine concepts and take inspirations from different areas, or to train supervised machine learning models that learn to predict new creative cross-domain directions. To build this KB, we present a novel information extraction task of extracting recombination from scientific paper abstracts, collect a high-quality corpus of hundreds of manually annotated abstracts, and use it to train an LLM-based extraction model. The model is applied to a large corpus of papers in the AI domain, yielding a KB of over 28K recombination examples. We analyze CHIMERA to explore the properties of recombination in different subareas of AI. Finally, we train a scientific hypothesis generation model using the KB, which predicts new recombination directions that real-world researchers find inspiring.
Recombination is a common form of ideation that involves breaking down and blending existing ideas across domains to create novel solutions.
We automatically mine CHIMERA, a knowledge base of recombination examples from across the scientific literature.
Our approach to mining recombination examples begins with building a curated dataset of annotated examples. We then use this dataset to train an information extraction model. Finally, we apply the trained model on arXiv to collect recombination examples at scale.
We focus on two recombination types, which we name blends and inspirations. Blends combine multiple concepts to create new approaches (e.g., boosting classical machine learning algorithms using quantum computing), while inspirations involve adaption of ideas from existing concepts to spark insight (e.g., applying bird flock behavior to coordinate drone swarms).
CHIMERA includes over 28K examples of idea recombination, spanning both conceptual blends within and across domains, as well as inspiration-driven links such as analogies, reductions, and abstractions.
Inspirational connections are often cross-domain. Of note is the volume of inspiration drawn from brain-related sources, such as cognitive science and q-bio.nc. A possible explanation might be that many of our arXiv categories of interest are related to machine learning, where the human brain historically serves as a general source of inspiration.
We observe that while some sources of inspiration (like cognitive-science) are commonly shared across related fields, domains may draw inspiration from unique sources (e.g., from zoology to cs.RO).
Blends often connect the same or similar domains.
Using CHIMERA, we train supervised models that learn how to recombine concepts for predicting new scientific ideas.
We experiment with retrievers based on encoders trained prior to the test set cutoff year (2024), and find that fine-tuning them on our data improves the median rank of the gold answer (MedR) by an order of magnitude.
We invited researchers with proven experience—specifically, authors of at least one published paper—to evaluate the suggestions generated by our recombination prediction model against various baselines.
Researchers rated our recombination suggestions as nearly as helpful as the gold answers in inspiring new ideas, providing additional validation for our automated evaluation metrics.
@misc{sternlicht2025chimeraknowledgebaseidea,
title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature},
author={Noy Sternlicht and Tom Hope},
year={2025},
eprint={2505.20779},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.20779},
}