arxiv:2306.02411

A Topological Approach to Measuring Training Data Quality

Published on Jun 4, 2023

Authors:

Abstract

A novel topological method using morphisms between persistence modules evaluates training dataset quality relative to full datasets to explain poor model performance.

AI-generated summary

Data quality is crucial for the successful training, generalization and performance of artificial intelligence models. Furthermore, it is known that the leading approaches in artificial intelligence are notoriously data-hungry. In this paper, we propose the use of small training datasets towards faster training. Specifically, we provide a novel topological method based on morphisms between persistence modules to measure the training data quality with respect to the complete dataset. This way, we can provide an explanation of why the chosen training dataset will lead to poor performance.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2306.02411 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2306.02411 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2306.02411 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.