Examples of Conversation Formats

Image Quality Assessment (IQA)

#User: <img> Can you evaluate the quality of the image? 
#Assistant: The quality of the image is <level>.

Image Aesthetic Assessment (IAA)

#User: <img> How is the aesthetics of the image? 
#Assistant: The aesthetics of the image is <level>.

Video Quality Assessment (VQA)

#User: <img> Rate the quality of the video. 
#Assistant: The quality of the video is <level>.

The user queries are randomly chosen from a group of paraphrases as an augmentation. Following Zheng et al. (2023), only the LMM responses (after #Assistant:) are supervised.

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Introduction

The Q-Align in Comparison with Its Baseline and Existing SOTA

Limitations of Existing Methods: Traditional Methods

Limitations of Existing Methods: LMMs

Observations

Contributions

Methodology: Insights

Insight 1: How Do Humans Rate?

The Q-Align Syllabus

Insight 2: How Do LMMs Rate?

Insight 2: How Do LMMs Rate?

Conversion between Rating Levels and Scores

Training: Scores → Rating Levels

Training: Scores → Rating Levels

Inference: Rating Levels → Scores

Inference: Rating Levels → Scores

Inference: Rating Levels → Scores

Inference: Rating Levels → Scores

Model Structure

Model Structure

Examples of Conversation Formats

Experiments

Experimental Setup

Datasets

Q-Align and Fewshot Q-Align on IQA

Mix-Data Experiments for Q-Align on IQA

Q-Align Performance on IAA

Q-Align Performance on VQA

The OneAlign

Cost Analysis: Training Cost

Cost Analysis: Inference Latency

Ablation Studies: Q-Align vs Training with Scores

Qualitative Analysis

Conclusion

Summary

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Introduction

The Q-Align in Comparison with Its Baseline and Existing SOTA

Limitations of Existing Methods: Traditional Methods

Limitations of Existing Methods: LMMs

Observations

Contributions

Related Works: Image Quality Assessment (IQA)

Related Works: Image Quality Assessment (IQA)

Related Works: Image Aesthetic Assessment (IAA)

Related Works: Video Quality Assessment (VQA)

Related Works: Video Quality Assessment (VQA)

Related Works: LMMs for Visual Scoring

Related Works: LMMs for Visual Scoring

Methodology: Insights

Insight 1: How Do Humans Rate?

The Q-Align Syllabus

Insight 2: How Do LMMs Rate?

Insight 2: How Do LMMs Rate?

Conversion between Rating Levels and Scores

Training: Scores → Rating Levels

Training: Scores → Rating Levels

Inference: Rating Levels → Scores

Inference: Rating Levels → Scores

Inference: Rating Levels → Scores

Inference: Rating Levels → Scores

Model Structure

Model Structure

Examples of Conversation Formats

Experiments

Experimental Setup

Datasets

Q-Align and Fewshot Q-Align on IQA

Mix-Data Experiments for Q-Align on IQA

Q-Align Performance on IAA

Q-Align Performance on VQA

The OneAlign

Cost Analysis: Training Cost

Cost Analysis: Inference Latency

Ablation Studies: Q-Align vs Training with Scores

Qualitative Analysis

Conclusion

Summary