Structured Cancer Chemotherapy Adverse Events Datasets for LLM Training and Biomedical Innovation

In the era of data-driven science, access to structured, high-quality biomedical datasets can transform how we approach drug discovery, AI healthcare tools, and domain-specific LLM training. At IEEARC, we’ve curated a series of cancer research datasets that go beyond basic statistics — offering molecular targets, adverse events, pharmacological pathways, and herbal medicine insights such as those found in Danggui Buxue Decoction (DBD).

Whether you’re building an AI model for drug response prediction, training a medical LLM, or designing a clinical decision support tool, our data can serve as your foundation.

What We Offer: Dataset Highlights

1. Chemotherapy-Induced Adverse Events Dataset

209 cancer patients
Drug-wise adverse reactions ranked by severity (Oxaliplatin, Paclitaxel, Cisplatin)
Age/gender distribution, combination vs. single drug use impact
Classification of systemic adverse events (e.g., blood, respiratory, immune)

2. Herbal Compound Analysis: Danggui Buxue Decoction (DBD)

Active ingredients: Quercetin, Kaempferol, Isorhamnetin, Formononetin
Targets: PTGS2, PTGS1, NOS2, RXRA, PRKACA
Mechanisms: PI3K-AKT, TNF, MAPK, and IL-17 signaling
ADMET profiling: Toxicity, BBB penetration, CYP inhibition, hepatotoxicity status

3. Molecular Docking and Target Binding Affinity

Binding energies with BMS-related proteins
Kaempferol-PTGS2: −10.08 kJ/mol
Quercetin with NOS2 & PRKACA: >9.0 kJ/mol binding energy

4. Functional Enrichment (GO + KEGG)

108 gene targets analyzed
570 biological processes, 151 KEGG pathways
Includes cancer pathways, inflammation, immunity, and hematopoiesis-related pathways

5. Prompt–Completion Format for LLMs

Data is packaged in LLM-ready format
Easily used for Q&A generation, AI training, biomedical search models

Adverse reactions in the study Prompts .JSON

[

{

“prompt”: “Which system had the highest number of Grade 3 and 4 severe adverse reactions in cancer patients?”,

“completion”: “The medical examination system had the highest number of Grade 3 and 4 adverse reactions, with 33 cases.”

{

“prompt”: “What percentage of severe adverse reactions involved the medical examination category?”,

“completion”: “43.42% of severe adverse reactions were related to the medical examination category.”

{

“prompt”: “What are the common clinical manifestations associated with medical examination-related adverse reactions?”,

“completion”: “Fever induced neutropenia, decreased white blood cell count, decreased neutrophil count, decreased platelet count, increased alanine aminotransferase, elevated serum creatinine, elevated serum bilirubin, neutropenia, and weight gain.”

{

“prompt”: “Which organ system had the second highest number of adverse reaction cases?”,

“completion”: “The respiratory, thoracic, and mediastinal system had the second highest with 56 cases.”

{

“prompt”: “What are the severe symptoms observed in respiratory system-related adverse events?”,

“completion”: “Dyspnea, laryngeal edema, abnormal pulse, bronchospasm, and bronchial stenosis.”

{

“prompt”: “How many Grade 3 and 4 events were observed in gastrointestinal adverse reactions?”,

“completion”: “7 cases were Grade 3 and 4 events related to gastrointestinal reactions.”

ADMET_Profiling_Core_Ingredients Molecular_Docking_Results .JSON

{

“ADMET_Profiling_Core_Ingredients”: {

“ingredients”: [

“Quercetin”,

“Kaempferol”,

“Isorhamnetin”,

“Formononetin”,

“7-O-methylisomucronulatol”

“GI_absorption”: {

“Quercetin”: “High”,

“Kaempferol”: “High”,

“Isorhamnetin”: “High”,

“Formononetin”: “High”,

“7-O-methylisomucronulatol”: “High”

“BBB_permeant”: {

“Quercetin”: “No”,

“Kaempferol”: “No”,

“Isorhamnetin”: “No”,

“Formononetin”: “Yes”,

“7-O-methylisomucronulatol”: “Yes”

“P-gp_substrate”: {

“Quercetin”: “Yes”,

“Kaempferol”: “No”,

“Isorhamnetin”: “No”,

“Formononetin”: “No”,

“7-O-methylisomucronulatol”: “No”

“CYP1A2_inhibitor”: {

“Quercetin”: “Yes”,

“Kaempferol”: “Yes”,

“Isorhamnetin”: “Yes”,

“Formononetin”: “Yes”,

“7-O-methylisomucronulatol”: “Yes”

“CYP2C19_inhibitor”: {

“Quercetin”: “No”,

“Kaempferol”: “No”,

“Isorhamnetin”: “No”,

“Formononetin”: “No”,

“7-O-methylisomucronulatol”: “No”

“CYP2C9_inhibitor”: {

“Quercetin”: “No”,

“Kaempferol”: “No”,

“Isorhamnetin”: “No”,

“Formononetin”: “No”,

“7-O-methylisomucronulatol”: “No”

“CYP2D6_inhibitor”: {

“Quercetin”: “Yes”,

“Kaempferol”: “Yes”,

“Isorhamnetin”: “Yes”,

“Formononetin”: “Yes”,

“7-O-methylisomucronulatol”: “Yes”

Why This Matters: Use Cases

AI Model Developers
Use these datasets to train models that detect adverse event risks, simulate drug–target interactions, or personalize chemotherapy.

Drug Discovery Teams
Leverage molecular docking and pathway data to identify new targets, repurpose herbal components, or design BMS-preventive co-therapies.

LLM Trainers (BioGPT, ChatDoctor, Med-PaLM, etc.)
Train or fine-tune large models using structured question–answer pairs or prompt-ready pharmacology knowledge.

Academic and Clinical Researchers
Accelerate evidence synthesis, hypothesis validation, or meta-analysis with structured datasets based on published literature.

Who Should Use This Data?

🔬 AI & HealthTech Startups
📚 Biomedical LLM Development Labs
🧪 Pharmaceutical & Drug Repurposing Companies
💻 Clinical NLP Engineers
🌍 Global Research Institutions & CROs
🛒 Data Marketplaces & Insight Platforms

Ready to Access the Data?

We provide datasets in JSON formats, complete with documentation and prompt samples.

👉 Visit: https://ieearc.com/
📩 Contact us for samples : contact@ieearc.com ieearctechnologies@gmail.com

Final Thoughts

Biomedical innovation is no longer limited by computation — it’s limited by clean, curated data. Our cancer research datasets are designed to bridge this gap and accelerate progress in AI healthcare, LLM reasoning, and therapeutic modeling.

Let’s unlock insights together — one dataset at a time.