WARNING: BNB_CUDA_VERSION=122 environment variable detected; loading libbitsandbytes_cuda122.so.
This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
texts = ['kwsp itu apa?',"Apakah maksud perkataan 'integriti'?","Terangkan perbezaan antara 'hormat' dan 'takut.'","Translate the following English sentence to Malay: 'The cat is sitting on the table.'","Translate the following Malay sentence to English: 'Saya suka makan nasi lemak.'","Correct the following sentence: Me wants to go store.",]
for text in texts: inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=64, pad_token_id=tokenizer.eos_token_id,do_sample=False) decoded = tokenizer.batch_decode(outputs)print('input:',text)print(decoded[0])print('***'*40)
/home/sani/miniconda3/envs/mergekit/lib/python3.10/site-packages/transformers/generation/utils.py:1510: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
warnings.warn(
2024-04-20 10:00:37.566776: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-20 10:00:38.277483: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-20 10:00:39.651643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/sani/miniconda3/envs/mergekit/lib/python3.10/site-packages/transformers/generation/utils.py:1510: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
warnings.warn(
input: kwsp itu apa?
kwsp itu apa? (What is the name of the place?) (2) Apa nama kota itu? (What is the name of the city?) (3) Apa nama negara itu? (What is the name of the country?) (4) Apa nama pulau itu? (What is the name of the
************************************************************************************************************************
input: Apakah maksud perkataan 'integriti'?
Apakah maksud perkataan 'integriti'? (1 markah)
2. Apakah maksud perkataan 'kejujuran'? (1 markah)
3. Apakah maksud perkataan 'kebenaran'? (1 markah)
4. Apakah maksud perkataan 'keadilan'? (1 markah)
5
************************************************************************************************************************
input: Terangkan perbezaan antara 'hormat' dan 'takut.'
Terangkan perbezaan antara 'hormat' dan 'takut.'"
"Explain the difference between 'hormat' and 'takut.'"
"Menghormati" = "to respect"
"Menghormati" = "to respect"
"Menghormati" = "to respect"
"Menghorm
************************************************************************************************************************
input: Translate the following English sentence to Malay: 'The cat is sitting on the table.'
Translate the following English sentence to Malay: 'The cat is sitting on the table.' (Use the appropriate pronoun.)
* [2] Translate the following English sentence to Malay: 'The cat is sitting on the table.' (Use the appropriate pronoun.)
* [3] Translate the following English sentence to Malay: 'The cat is sitting on the table.' (Use the appropriate pronoun.)
*
************************************************************************************************************************
input: Translate the following Malay sentence to English: 'Saya suka makan nasi lemak.'
Translate the following Malay sentence to English: 'Saya suka makan nasi lemak.' (I like eating nasi lemak.)"
"""
# Your code here
return "I like eating nasi lemak."
<|end_of_text|>
************************************************************************************************************************
input: Correct the following sentence: Me wants to go store.
Correct the following sentence: Me wants to go store. The sentence is incorrect because the subject and verb do not agree. The subject is Me, which is singular. The verb is wants, which is plural. The sentence should be corrected to read: I want to go to the store. The subject and verb agree. The sentence is correct. The subject is I, which
************************************************************************************************************************
Basically Base ni bagus untuk generate next sentence bukan untuk menjawab soalan.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
for text in texts: msg = [ {"role": "user", "content": text}, ] encodeds = tokenizer.apply_chat_template(msg, return_tensors="pt") outputs = model.generate(input_ids=encodeds.to("cuda"), max_new_tokens=64, pad_token_id=tokenizer.eos_token_id,do_sample=False) decoded = tokenizer.batch_decode(outputs)print('input:',text)print(decoded[0].split("<|end_header_id|>")[2].split("<|eot_id|>")[0])print('***'*40)
input: kwsp itu apa?
Kwsp is an abbreviation for "Kumpulan Wang Simpanan Pekerja" which is a Malay phrase that translates to "Employee Provident Fund" in English.
Kwsp is a mandatory savings plan in Malaysia that requires employers to contribute a certain percentage of their employees' salaries to
************************************************************************************************************************
input: Apakah maksud perkataan 'integriti'?
Perkataan "integriti" dalam bahasa Melayu bererti "kesucian" atau "kesederhanaan". Ia merujuk kepada sifat seseorang yang memiliki moral yang tinggi, jujur, dan tidak mudah terpengar
************************************************************************************************************************
input: Terangkan perbezaan antara 'hormat' dan 'takut.'
A great question!
In Malay, "hormat" and "takut" are two words that are often confused with each other, but they have distinct meanings.
**Hormat** (respect) refers to a feeling of admiration, esteem, or reverence for someone or something. It is
************************************************************************************************************************
input: Translate the following English sentence to Malay: 'The cat is sitting on the table.'
The translation of the sentence "The cat is sitting on the table" in Malay is:
"Kucing itu duduk di atas meja."
Here's a breakdown of the translation:
* "The cat" is translated to "Kucing itu", where "itu" is a pronoun meaning
************************************************************************************************************************
input: Translate the following Malay sentence to English: 'Saya suka makan nasi lemak.'
The translation of the Malay sentence "Saya suka makan nasi lemak" to English is:
"I like eating nasi lemak."
(Note: "Nasi lemak" is a popular Malaysian dish made with fragrant rice cooked in coconut milk, pandan leaves, and sometimes served
************************************************************************************************************************
input: Correct the following sentence: Me wants to go store.
A classic example of a sentence that needs some correction!
The corrected sentence would be:
"I want to go to the store."
Here's what changed:
* "Me" is a subject pronoun (e.g., "Me is going to the store"), but in this sentence, we need a first
************************************************************************************************************************
Kadang2 Instruct akan reply dalam bahasa melayu sebab dia memang dah tau conteks cuma dia diajar hanya untuk reply dalam English. Basically Base model dah tau tentang Malaysia(nasi lemak, kucing, etc) tapi time Instruct dia dah diajar untuk reply dalam English je.
Masa untuk tunjuk beza finetune base dengan transformed instruct dari base untuk bahasa melayu
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
for text in texts: inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=64, pad_token_id=tokenizer.eos_token_id,do_sample=False) decoded = tokenizer.batch_decode(outputs)print('input:',text)print(decoded[0])print('***'*40)
input: kwsp itu apa?
kwsp itu apa? (What is the name of the place?) (2) Apa nama kota itu? (What is the name of the city?) (3) Apa nama negara itu? (What is the name of the country?) (4) Apa nama pulau itu? (What is the name of the
************************************************************************************************************************
input: Apakah maksud perkataan 'integriti'?
Apakah maksud perkataan 'integriti'? (1 markah)
2. Apakah maksud perkataan 'kejujuran'? (1 markah)
3. Apakah maksud perkataan 'kebenaran'? (1 markah)
4. Apakah maksud perkataan 'keadilan'? (1 markah)
5
************************************************************************************************************************
input: Terangkan perbezaan antara 'hormat' dan 'takut.'
Terangkan perbezaan antara 'hormat' dan 'takut.'"
"Explain the difference between 'hormat' and 'takut.'"
"Menghormati" = "to respect"
"Menghormati" = "to respect"
"Menghormati" = "to respect"
"Menghorm
************************************************************************************************************************
input: Translate the following English sentence to Malay: 'The cat is sitting on the table.'
Translate the following English sentence to Malay: 'The cat is sitting on the table.' (Use the appropriate pronoun.)
* [2] Translate the following English sentence to Malay: 'The cat is sitting on the table.' (Use the appropriate pronoun.)
* [3] Translate the following English sentence to Malay: 'The cat is sitting on the table.' (Use the appropriate pronoun.)
*
************************************************************************************************************************
input: Translate the following Malay sentence to English: 'Saya suka makan nasi lemak.'
Translate the following Malay sentence to English: 'Saya suka makan nasi lemak.' (I like eating nasi lemak.)"
"""
# Your code here
return "I like eating nasi lemak."
<|end_of_text|>
************************************************************************************************************************
input: Correct the following sentence: Me wants to go store.
Correct the following sentence: Me wants to go store. The sentence is incorrect because the subject and verb do not agree. The subject is Me, which is singular. The verb is wants, which is plural. The correct sentence is: I want to go to the store. The subject is I, which is singular. The verb is want, which is singular. The subject
************************************************************************************************************************
Model dekat atas ni dah finetune llama3 base dengan bahasa melayu, tak ada banyak perbezaan dari segi output, tapi kalau inspect probability untuk next token akan bnyk berbeza
from peft import PeftModelmodel = PeftModel.from_pretrained(model, "instruct_vector",device_map=device_map,quantization_config=nf4_config)
for text in texts: msg = [ {"role": "user", "content": text}, ] encodeds = tokenizer.apply_chat_template(msg, return_tensors="pt") outputs = model.generate(input_ids=encodeds.to("cuda"), max_new_tokens=64, pad_token_id=tokenizer.eos_token_id,do_sample=False) decoded = tokenizer.batch_decode(outputs)print('input:',text)print(decoded[0].split("<|end_header_id|>")[2].split("<|eot_id|>")[0])print('***'*40)
input: kwsp itu apa?
KWSP adalah singkatan dari "Kumpulan Wang Simpanan Pekerja" atau "Employee Provident Fund" dalam bahasa Inggris. KWSP adalah sebuah institusi yang bertugas mengumpulkan dan menyimpan dana pensiun untuk pekerja Malaysia. Dana tersebut akan digunakan untuk
************************************************************************************************************************
input: Apakah maksud perkataan 'integriti'?
Perkataan 'integriti' dalam bahasa Melayu bermaksud 'kesucian' atau 'kesahihan'. Ia juga dapat dikaitkan dengan 'kesetiaan' atau 'kesetiaan kepada nilai atau prinsip tertentu'.
Dalam kont
************************************************************************************************************************
input: Terangkan perbezaan antara 'hormat' dan 'takut.'
"Hormat" dan "takut" adalah dua kata yang memiliki makna yang berbeza, walaupun mereka memiliki beberapa kesamaan.
"Hormat" adalah kata yang bermakna "penghormatan", "penghargaan", atau "penghormatan yang layak".
************************************************************************************************************************
input: Translate the following English sentence to Malay: 'The cat is sitting on the table.'
Here's the translation:
"Kucing itu duduk di atas meja."
Note: I've used the word "itu" to indicate the cat, since it's a specific cat being referred to. If you want to use a generic word for cat, you can use "kucing" alone.
************************************************************************************************************************
input: Translate the following Malay sentence to English: 'Saya suka makan nasi lemak.'
The Malay sentence "Saya suka makan nasi lemak" translates to "I like to eat nasi lemak." Nasi lemak is a popular dish in Malaysia and Singapore, consisting of rice cooked in coconut milk and served with a variety of accompaniments.
************************************************************************************************************************
input: Correct the following sentence: Me wants to go store.
The corrected sentence would be: I want to go to the store.
************************************************************************************************************************
Tadaaa, Transformed llama3 Instruct dari base tadi dah bole jawab dalam Bahasa Melayu kalau tanya dalam BM. So far aku pakai model ni untuk experiment je, kalau ada berminat baru push ke huggingface