Warning: Looks like you're using an outdated `kagglehub` version (installed: 0.3.10), please consider upgrading to the latest version (0.3.11).
Path to dataset files: ('/teamspace/studios/this_studio/.cache/kagglehub/datasets/rubanzasilva/symptoms-disease-no-id/versions/1',)
I have been experiencing a skin rash on my arms, legs, and torso for the past few weeks. It is red, itchy, and covered in dry, scaly patches.
1
My skin has been peeling, especially on my knees, elbows, and scalp. This peeling is often accompanied by a burning or stinging sensation.
2
I have been experiencing joint pain in my fingers, wrists, and knees. The pain is often achy and throbbing, and it gets worse when I move my joints.
3
There is a silver like dusting on my skin, especially on my lower back and scalp. This dusting is made up of small scales that flake off easily when I scratch them.
4
My nails have small dents or pits in them, and they often feel inflammatory and tender to the touch. Even there are minor rashes on my arms.
sd_df
label
text
0
Psoriasis
I have been experiencing a skin rash on my arms, legs, and torso for the past few weeks. It is red, itchy, and covered in dry, scaly patches.
1
Psoriasis
My skin has been peeling, especially on my knees, elbows, and scalp. This peeling is often accompanied by a burning or stinging sensation.
2
Psoriasis
I have been experiencing joint pain in my fingers, wrists, and knees. The pain is often achy and throbbing, and it gets worse when I move my joints.
3
Psoriasis
There is a silver like dusting on my skin, especially on my lower back and scalp. This dusting is made up of small scales that flake off easily when I scratch them.
4
Psoriasis
My nails have small dents or pits in them, and they often feel inflammatory and tender to the touch. Even there are minor rashes on my arms.
...
...
...
1195
diabetes
I'm shaking and trembling all over. I've lost my sense of taste and smell, and I'm exhausted. I occasionally get palpitations or a speeding heart.
1196
diabetes
Particularly in the crevices of my skin, I have skin rashes and irritations. My skin bruises and cuts take a while to heal as well.
1197
diabetes
I regularly experience these intense urges and the want to urinate. I frequently feel drowsy and lost. I've also significantly lost my vision.
1198
diabetes
I have trouble breathing, especially outside. I start to feel hot and start to sweat. I frequently have urinary tract infections and yeast infections.
1199
diabetes
I constantly sneeze and have a dry cough. My infections don't seem to be healing, and I have palpitations. My throat does ache occasionally, but it usually gets better.
# Define the model namemodel_name ="meta-llama/Llama-2-7b-hf"# Initialize the tokenizertokenizer = AutoTokenizer.from_pretrained(model_name)# Load the model with 8-bit quantization directly to GPUmodel = AutoModelForCausalLM.from_pretrained( model_name, device_map=0, # Use first GPU load_in_8bit=True# Use 8-bit quantization to reduce memory usage)
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
# Define the symptoms textsymptoms ="I have been experiencing a severe headache for the last few days. It's worse in the mornings and associated with nausea and vomiting. I feel a bit lightheaded, and my vision is blurry at times."# Create the prompt with clear instructionsprompt =f"Patient symptoms: {symptoms}\n\nTop 3 possible diagnoses with confidence levels:"# Tokenize the input prompttoks = tokenizer(prompt, return_tensors="pt")
# Generate prediction following Jeremy's implementation# Move tokens to GPU, generate with max_new_tokens=40, then move result back to CPUres = model.generate(**toks.to("cuda"), max_new_tokens=40).to('cpu')
# Decode the generated tokens to textdiagnosis = tokenizer.batch_decode(res)[0]# Print the full outputprint("Complete model output:")print(diagnosis)
Complete model output:
<s> Patient symptoms: I have been experiencing a severe headache for the last few days. It's worse in the mornings and associated with nausea and vomiting. I feel a bit lightheaded, and my vision is blurry at times.
Top 3 possible diagnoses with confidence levels:
1. [Migraine (94% confidence)](https://en.wikipedia.org/wiki/Migraine)
2. [Sinusitis (92% confidence
# Basic parsing to extract just the generated diagnoses# This might need adjustment based on the actual output formatif"\n\nTop 3 possible diagnoses"in diagnosis:# Extract only the part after our prompt generated_text = diagnosis.split("\n\nTop 3 possible diagnoses with confidence levels:")[1].strip()else: generated_text = diagnosis.split(prompt)[1].strip()print("\nExtracted diagnoses:")print(generated_text)
# Try with sampling for more diverse outputsres_with_sampling = model.generate(**toks.to("cuda"), max_new_tokens=40, do_sample=True, # Enable sampling temperature=0.7# Control randomness (lower = more focused)).to('cpu')# Decode sampled responsediagnosis_with_sampling = tokenizer.batch_decode(res_with_sampling)[0]print("\nOutput with sampling enabled:")print(diagnosis_with_sampling)
Output with sampling enabled:
<s> Patient symptoms: I have been experiencing a severe headache for the last few days. It's worse in the mornings and associated with nausea and vomiting. I feel a bit lightheaded, and my vision is blurry at times.
Top 3 possible diagnoses with confidence levels:
1. Migraine
2. Tension headache
3. Cluster headache
My 2nd most confident diagnosis is migraine. I am not confident about my
# Multiple symptom descriptionssymptom_list = ["Persistent cough, fever of 101°F for 5 days, and fatigue.","Red, itchy rash on face and arms, started after camping trip.","Joint pain in fingers and wrists, worse in the morning, with stiffness."]# Create prompts for each symptom descriptionprompts = [f"Patient symptoms: {s}\n\nTop 3 possible diagnoses with confidence levels:"for s in symptom_list]# Process each promptfor prompt in prompts:# Tokenize toks = tokenizer(prompt, return_tensors="pt")# Generate (using Jeremy's style) res = model.generate(**toks.to("cuda"), max_new_tokens=40, do_sample=True).to('cpu')# Decode diagnosis = tokenizer.batch_decode(res)[0]# Print resultprint("\n"+"="*50)print(prompt)print("-"*50)print(diagnosis.split(prompt)[1].strip() if prompt in diagnosis else diagnosis)
==================================================
Patient symptoms: Persistent cough, fever of 101°F for 5 days, and fatigue.
Top 3 possible diagnoses with confidence levels:
--------------------------------------------------
1. Influenza A
2. Influenza B
3. Influenza C
### 1. Influenza A
|
==================================================
Patient symptoms: Red, itchy rash on face and arms, started after camping trip.
Top 3 possible diagnoses with confidence levels:
--------------------------------------------------
1. Poison ivy
2. Shingles
3. Lyme disease
### 1. Poison ivy
- **Confidence level:**
==================================================
Patient symptoms: Joint pain in fingers and wrists, worse in the morning, with stiffness.
Top 3 possible diagnoses with confidence levels:
--------------------------------------------------
1. 20%: Bursitis
2. 20%: Carpal Tunnel Syndrome
3. 20%: Arthritis
import torchfrom transformers import AutoModelForCausalLM, AutoTokenizerimport pandas as pdimport random# Load your datasets#sd_df = pd.read_csv('path/to/symptom_disease_no_id_col.csv')# Load model and tokenizermodel_name ="meta-llama/Llama-2-7b-hf"tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModelForCausalLM.from_pretrained(model_name, device_map=0, load_in_8bit=True)# Select a random symptom from your dataset to testrandom_idx = random.randint(0, len(sd_df) -1)test_symptoms = sd_df.iloc[random_idx]['text']actual_diagnosis = sd_df.iloc[random_idx]['label']# Create a prompt with the symptomsprompt =f"Patient symptoms: {test_symptoms}\n\nTop 3 possible diagnoses with confidence levels:"# Tokenize the inputtoks = tokenizer(prompt, return_tensors="pt")# Generate prediction (Jeremy's style)res = model.generate(**toks.to("cuda"), max_new_tokens=50, do_sample=True, temperature=0.7).to('cpu')# Decode the responseprediction = tokenizer.batch_decode(res)[0]# Print resultsprint(f"SYMPTOMS: {test_symptoms}")print(f"ACTUAL DIAGNOSIS: {actual_diagnosis}")print(f"MODEL PREDICTION:\n{prediction}")
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
SYMPTOMS: I am having some diarrhea and constipation, which has been quite concerning. In my stomach, there is a severe, painful ache. I'm constantly exhausted and don't feel like eating anything.
ACTUAL DIAGNOSIS: Typhoid
MODEL PREDICTION:
<s> Patient symptoms: I am having some diarrhea and constipation, which has been quite concerning. In my stomach, there is a severe, painful ache. I'm constantly exhausted and don't feel like eating anything.
Top 3 possible diagnoses with confidence levels:
1. \strong{Acute appendicitis} (80%)
2. \strong{Diverticulitis} (60%)
3. \strong{Ulcerative colitis} (40