Degendering Resumes for Fair Algorithmic Resume Screening
Abstract
Gendered information in IT firm resumes is substantial and can be reduced using lexicon-based methods, but this reduction negatively impacts resume screening algorithm performance, and gender debiasing techniques for NLP models are ineffective.
We investigate whether it is feasible to remove gendered information from resumes to mitigate potential bias in algorithmic resume screening. Using a corpus of 709k resumes from IT firms, we first train a series of models to classify the self-reported gender of the applicant, thereby measuring the extent and nature of gendered information encoded in resumes. We then conduct a series of gender obfuscation experiments, where we iteratively remove gendered information from resumes. Finally, we train a resume screening algorithm and investigate the trade-off between gender obfuscation and screening algorithm performance. Results show: (1) There is a significant amount of gendered information in resumes. (2) Lexicon-based gender obfuscation method (i.e. removing tokens that are predictive of gender) can reduce the amount of gendered information to a large extent. However, after a certain point, the performance of the resume screening algorithm starts suffering. (3) General-purpose gender debiasing methods for NLP models such as removing gender subspace from embeddings are not effective in obfuscating gender.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper