QuantumLLMInstruct (QLMMI) is a groundbreaking dataset featuring over 500,000 meticulously curated instruction-following problem-solution pairs tailored for quantum computing. Spanning 90 predefined domains and hundreds of subdomains, QLMMI empowers Large Language Models (LLMs) to tackle intricate quantum challenges, setting a new standard for quantum-focused datasets.
QLMMI leverages advanced reasoning techniques to address unique quantum computing tasks, including Hamiltonian simulations, circuit optimization, and algorithm design. These tasks demand high reasoning complexity, such as solving quantum models with Yang-Baxter equations or designing circuits for quantum algorithms. QLMMI integrates LLM-driven automation for scalable problem generation, solution development, and validation.
The creation of QLMMI follows a robust four-stage process:
QLMMI spans a diverse range of 90 predefined quantum domains, including:
Beyond these predefined domains, LLMs demonstrate emergent capabilities by autonomously generating new quantum fields, showcasing their adaptability and scalability in problem-solving. This adaptability ensures QLMMI remains a dynamic and versatile resource.
QLMMI bridges the gap between LLM advancements and quantum computing challenges, enabling scalable and precise solutions. With its comprehensive domain coverage, rigorous methodologies, and open-access design, QLMMI sets a new benchmark for instructional datasets in computational science.
Our next steps involve fine-tuning models like Qwen-2.5-math family of models on the QLMMI dataset to evaluate their effectiveness in automating quantum circuit synthesis. We anticipate that QuantumLLMInstruct will significantly contribute to the development of efficient and precise quantum circuits.
@misc{sk2025QuantumLLMInstruct,
title={QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing.},
author={Shlomo Kashani},
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={}
}