TransFIRA: Transfer Learning for Face Image Recognizability Assessment

Abstract

Face recognition in unconstrained environments such as surveillance, video, and web imagery must contend with extreme variation in pose, blur, illumination, and occlusion, where conventional visual quality metrics fail to predict whether inputs are truly recognizable to the deployed encoder. Existing FIQA methods typically rely on visual heuristics, curated annotations, or computationally intensive generative pipelines, leaving their predictions detached from the encoder's decision geometry. We introduce TransFIRA (Transfer Learning for Face Image Recognizability Assessment), a lightweight and annotation-free framework that grounds recognizability directly in embedding space.

TransFIRA delivers three advances:

A definition of recognizability via class-center similarity (CCS) and class-center angular separation (CCAS), yielding the first natural, decision-boundary-aligned criterion for filtering and weighting.
A recognizability-informed aggregation strategy that achieves state-of-the-art verification accuracy on BRIAR and IJB-C while nearly doubling correlation with true recognizability, all without external labels, heuristics, or backbone-specific training.
New extensions beyond faces, including encoder-grounded explainability that reveals how degradations and subject-specific factors affect recognizability, and the first recognizability-aware body recognition assessment.

Experiments confirm state-of-the-art results on faces, strong performance on body recognition, and robustness under cross-dataset shifts. Together, these contributions establish TransFIRA as a unified, geometry-driven framework for recognizability assessment — encoder-specific, accurate, interpretable, and extensible across modalities — significantly advancing FIQA in accuracy, explainability, and scope.

Method

TransFIRA adapts a pretrained encoder to predict recognizability—the likelihood that an image will be correctly identified—directly from the encoder’s embedding geometry.

The framework consists of three stages:

Defining recognizability using embedding-based metrics,
Learning to predict these scores from images via a lightweight regression head, and
Applying the predictions for recognizability-informed filtering and weighting during template aggregation.

Image Recognizability

Recognizability is defined entirely within the embedding space of the chosen encoder, ensuring that it reflects the model’s actual discrimination ability rather than superficial factors such as blur, illumination, or occlusion. For each image \(x_i\) with embedding \(z_i = \phi(x_i)\), we compute a set of class-center similarities that quantify how well the embedding aligns with its identity.

The Class Center Angular Similarity (CCS) measures how closely an embedding aligns with the center of its own class:

\( CCS_{x_i} = \frac{z_i^\top \mu_{y_i}}{\|z_i\|_2 \, \|\mu_{y_i}\|_2} \)

The Nearest Nonmatch Class Center Angular Similarity (NNCCS) measures its similarity to the most confusable impostor class:

\( NNCCS_{x_i} = \max_{\,j \neq y_i} \frac{z_i^\top \mu_j}{\|z_i\|_2 \, \|\mu_j\|_2} \)

Their difference defines the Class Center Angular Separation (CCAS):

\( CCAS_{x_i} = CCS_{x_i} - NNCCS_{x_i} \)

A natural cutoff emerges at CCAS > 0, indicating that an embedding is closer to its own class center than to any impostor. This provides a principled, parameter-free definition of recognizability grounded in the encoder’s decision geometry.

Recognizability Prediction Network

To predict recognizability directly from images, TransFIRA extends a pretrained backbone with a lightweight recognizability prediction head implemented as a small MLP. The network outputs predicted scores for both CCS and CCAS:

\( \hat{\mathbf{r}}_i = [\hat{CCS}_{x_i},\, \hat{CCAS}_{x_i}]^\top = h_\psi(\phi(x_i)) \)

Training is performed end-to-end with mean squared error against ground-truth recognizability labels derived from the encoder itself. Fine-tuning both the backbone and head ensures recognizability remains encoder-specific, efficient to train, and fully aligned with the model’s internal representation.

Recognizability-Informed Template Aggregation

In template-based recognition benchmarks such as BRIAR and IJB-C, multiple images of a subject are combined into a single representation. TransFIRA uses predicted recognizability scores to guide this aggregation through two complementary steps:

Filtering: Images with predicted CCAS > 0 are retained, ensuring that only frames confidently recognized by the encoder contribute to the template.
Weighting: Each remaining embedding is weighted by its predicted CCS before averaging, emphasizing compact, reliable samples most representative of the class.

These operations form a recognizability-informed aggregation strategy that is both interpretable and parameter-free. Filtering ensures only geometrically valid samples are included, while weighting strengthens alignment with the class center, jointly improving accuracy and explainability.

Results

ROC comparison on face recognition benchmarks — Fig. 2: **Overall ROC analysis.** Top: Template-level ROC comparisons across different IQA methods; for clarity, only the strongest variant of each is shown. Bottom: Ablation study illustrating the individual and combined effects of CCAS-based filtering and CCS-based weighting. Metrics correspond to Table II, and *Average (Baseline)* denotes uniform mean aggregation.

ROC comparison on WebFace ablation — Fig. 2: **Overall ROC analysis.** Top: Template-level ROC comparisons across different IQA methods; for clarity, only the strongest variant of each is shown. Bottom: Ablation study illustrating the individual and combined effects of CCAS-based filtering and CCS-based weighting. Metrics correspond to Table II, and *Average (Baseline)* denotes uniform mean aggregation.

Overall ROC performance table results — TABLE II: **Overall ROC performance.** TAR at fixed FMRs for BRIAR Protocol 3.1 and IJB-C using the backbones in Section IV-B. All baseline methods perform *weighted* aggregation using FIQA scores, while our **CCAS Filter** and **Filter + Weight** apply our *CCAS > 0* cutoff. *Average (Baseline)* denotes uniform aggregation without quality weighting. For each column, the best result is **bolded and underlined**, and the second-best is **bolded**.

Image-level recognizability evaluation — TABLE IV: **TAR at fixed FMRs for body recognition on BRIAR Protocol 3.1 using the SemReID encoder.** The method with the best performance for each operating point is **bolded**. Full results are reported in Appendix B.

BibTeX

@article{Tu2025TransFIRA,
    author  = {Tu, Allen and Narayan, Kartik and Gleason, Joshua and Xu, Jennifer and Meyn, Matthew and Goldstein, Tom and Patel, Vishal M.},
    title   = {TransFIRA: Transfer Learning for Face Image Recognizability Assessment},
    journal = {arXiv preprint arXiv:2510.06353},
    year    = {2025},
    url     = {https://transfira.github.io/}
}

TransFIRA: Transfer Learning forFace Image Recognizability Assessment