CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature

The Hebrew University of Jerusalem
2025

Abstract

A hallmark of human innovation is the process of recombination---creating original ideas by integrating elements of existing mechanisms and concepts. In this work, we automatically mine the scientific literature and build CHIMERA: a large-scale knowledge base (KB) of recombination examples. CHIMERA can be used to empirically explore at scale how scientists recombine concepts and take inspirations from different areas, or to train supervised machine learning models that learn to predict new creative cross-domain directions. To build this KB, we present a novel information extraction task of extracting recombination from scientific paper abstracts, collect a high-quality corpus of hundreds of manually annotated abstracts, and use it to train an LLM-based extraction model. The model is applied to a large corpus of papers in the AI domain, yielding a KB of over 28K recombination examples. We analyze CHIMERA to explore the properties of recombination in different subareas of AI. Finally, we train a scientific hypothesis generation model using the KB, which predicts new recombination directions that real-world researchers find inspiring.

Re-what?

Descriptive text for the figure

Recombination is a common form of ideation that involves breaking down and blending existing ideas across domains to create novel solutions.

The CHIMERA Knowledge Base

We automatically mine CHIMERA, a knowledge base of recombination examples from across the scientific literature.

Descriptive text for the figure

Our approach to mining recombination examples begins with building a curated dataset of annotated examples. We then use this dataset to train an information extraction model. Finally, we apply the trained model on arXiv to collect recombination examples at scale.

We focus on two recombination types, which we name blends and inspirations. Blends combine multiple concepts to create new approaches (e.g., boosting classical machine learning algorithms using quantum computing), while inspirations involve adaption of ideas from existing concepts to spark insight (e.g., applying bird flock behavior to coordinate drone swarms).

Descriptive text for the figure

Scale

CHIMERA includes over 28K examples of idea recombination, spanning both conceptual blends within and across domains, as well as inspiration-driven links such as analogies, reductions, and abstractions.

Descriptive text for the figure

Use-Case 1: Analysing Recombination in Science

Frequent inspirations in CHIMERA

Frequent inspiration relations between domains. cs.*, q-bio.nc and math.oc are arXiv categories.

Inspirational connections are often cross-domain. Of note is the volume of inspiration drawn from brain-related sources, such as cognitive science and q-bio.nc. A possible explanation might be that many of our arXiv categories of interest are related to machine learning, where the human brain historically serves as a general source of inspiration.

Zoom-in
Descriptive text for the figure
Common sources of inspiration for leading domains. cs.*, q-bio.nc and math.oc are arXiv categories.

We observe that while some sources of inspiration (like cognitive-science) are commonly shared across related fields, domains may draw inspiration from unique sources (e.g., from zoology to cs.RO).

Common blends

Frequent blend relations between domains. cs.*, q-bio.nc and math.oc are arXiv categories.

Blends often connect the same or similar domains.

Use-Case 2: Predicting New Recombination Directions

Using CHIMERA, we train supervised models that learn how to recombine concepts for predicting new scientific ideas.

Descriptive text for the figure
Given a context string and a query concerning the recombination of a certain graph node, our recombination model suggests directions based on knowledge learned from the KB.

We experiment with retrievers based on encoders trained prior to the test set cutoff year (2024), and find that fine-tuning them on our data improves the median rank of the gold answer (MedR) by an order of magnitude.

Descriptive text for the figure

User Study

We invited researchers with proven experience—specifically, authors of at least one published paper—to evaluate the suggestions generated by our recombination prediction model against various baselines.

Researchers rated our recombination suggestions as nearly as helpful as the gold answers in inspiring new ideas, providing additional validation for our automated evaluation metrics.

Descriptive text for the figure

BibTeX

@misc{sternlicht2025chimeraknowledgebaseidea,
      title={CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature},
      author={Noy Sternlicht and Tom Hope},
      year={2025},
      eprint={2505.20779},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.20779},
}