VLM Quantization and custom_mm Datasets#
llmc currently supports calibrating and quantizing VLM models using custom_mm datasets.
VLM Quantization#
The currently supported models are:
llava
intervl2
llama3.2
qwen2vl
More VLM models are under development.
Here is an example configuration. You can refer to the Calibration Dataset Template on GitHub.:
model:
type: Llava
path: model path
tokenizer_mode: slow
torch_dtype: auto
calib:
name: custom_mm
download: False
path: calib data path
apply_chat_template: True
add_answer: True # Defalut is False. If set it to Ture, calib data will add answers.
n_samples: 8
bs: -1
seq_len: 512
padding: True
custom_mm datatsets#
The format of the custom_mm dataset is as follows:
img_txt-datasets/
├── images/
│ ├── image1.jpg
│ ├── image2.jpg
│ ├── image3.jpg
│ └── ... (other images)
└── img_qa.json
Example format of img_qa.json:
[
{
"image": "images/0a3035bfca2ab920.jpg",
"question": "Is this an image of Ortigia? Please answer yes or no.",
"answer": "Yes"
},
{
"image": "images/0a3035bfca2ab920.jpg",
"question": "Is this an image of Montmayeur castle? Please answer yes or no.",
"answer": "No"
},
{
"image": "images/0ab2ed007db301d5.jpg",
"question": "Is this a picture of Highgate Cemetery? Please answer yes or no.",
"answer": "Yes"
}
]
The “answer” field is optional. The custom_mm dataset can include calibration data that contains only text (except for llama3.2).
VLM Evaluation#
LLMC integrates lmms-eval for evaluations on various downstream datasets. In the config’s eval section, the type should be specified as “vqa”, and the downstream evaluation datasets in the name should follow the standards set by lmms-eval.
eval:
type: vqa
name: [mme] # vqav2, gqa, vizwiz_vqa, scienceqa, textvqa