Structured Cancer Chemotherapy Adverse Events Datasets for LLM Training and Biomedical Innovation
In the era of data-driven science, access to structured, high-quality biomedical datasets can transform how we approach drug discovery, AI healthcare tools, and domain-specific LLM training. At IEEARC, we’ve curated a series of cancer research datasets that go beyond basic statistics — offering molecular targets, adverse events, pharmacological pathways, and herbal medicine insights such as those found in Danggui Buxue Decoction (DBD).
Whether you’re building an AI model for drug response prediction, training a medical LLM, or designing a clinical decision support tool, our data can serve as your foundation.
What We Offer: Dataset Highlights
1. Chemotherapy-Induced Adverse Events Dataset
- 209 cancer patients
- Drug-wise adverse reactions ranked by severity (Oxaliplatin, Paclitaxel, Cisplatin)
- Age/gender distribution, combination vs. single drug use impact
- Classification of systemic adverse events (e.g., blood, respiratory, immune)
2. Herbal Compound Analysis: Danggui Buxue Decoction (DBD)
- Active ingredients: Quercetin, Kaempferol, Isorhamnetin, Formononetin
- Targets: PTGS2, PTGS1, NOS2, RXRA, PRKACA
- Mechanisms: PI3K-AKT, TNF, MAPK, and IL-17 signaling
- ADMET profiling: Toxicity, BBB penetration, CYP inhibition, hepatotoxicity status
3. Molecular Docking and Target Binding Affinity
- Binding energies with BMS-related proteins
- Kaempferol-PTGS2: −10.08 kJ/mol
- Quercetin with NOS2 & PRKACA: >9.0 kJ/mol binding energy
4. Functional Enrichment (GO + KEGG)
- 108 gene targets analyzed
- 570 biological processes, 151 KEGG pathways
- Includes cancer pathways, inflammation, immunity, and hematopoiesis-related pathways
5. Prompt–Completion Format for LLMs
- Data is packaged in LLM-ready format
- Easily used for Q&A generation, AI training, biomedical search models
Adverse reactions in the study Prompts .JSON
ADMET_Profiling_Core_Ingredients Molecular_Docking_Results .JSON
Why This Matters: Use Cases
AI Model Developers
Use these datasets to train models that detect adverse event risks, simulate drug–target interactions, or personalize chemotherapy.
Drug Discovery Teams
Leverage molecular docking and pathway data to identify new targets, repurpose herbal components, or design BMS-preventive co-therapies.
LLM Trainers (BioGPT, ChatDoctor, Med-PaLM, etc.)
Train or fine-tune large models using structured question–answer pairs or prompt-ready pharmacology knowledge.
Academic and Clinical Researchers
Accelerate evidence synthesis, hypothesis validation, or meta-analysis with structured datasets based on published literature.
Who Should Use This Data?
- 🔬 AI & HealthTech Startups
- 📚 Biomedical LLM Development Labs
- 🧪 Pharmaceutical & Drug Repurposing Companies
- 💻 Clinical NLP Engineers
- 🌍 Global Research Institutions & CROs
- 🛒 Data Marketplaces & Insight Platforms
Ready to Access the Data?
We provide datasets in JSON formats, complete with documentation and prompt samples.
👉 Visit: https://ieearc.com/
📩 Contact us for samples : contact@ieearc.com ieearctechnologies@gmail.com
Final Thoughts
Biomedical innovation is no longer limited by computation — it’s limited by clean, curated data. Our cancer research datasets are designed to bridge this gap and accelerate progress in AI healthcare, LLM reasoning, and therapeutic modeling.
Let’s unlock insights together — one dataset at a time.
