Spaces:

Kamichanw
/

CIDEr

Runtime error

App Files Files Community

Kamichanw commited on Aug 12, 2024

Commit

444f4df

verified ·

1 Parent(s): e3e59ad

Update README.md

Browse files

Files changed (1) hide show

README.md +66 -3

README.md CHANGED Viewed

@@ -1,12 +1,75 @@
 ---
-title: Cider
 emoji: 🐨
 colorFrom: blue
 colorTo: red
 sdk: gradio
-sdk_version: 4.41.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: CIDEr
 emoji: 🐨
 colorFrom: blue
 colorTo: red
 sdk: gradio
+sdk_version: 3.19.1
 app_file: app.py
 pinned: false
 ---
+# CIDEr Metric for Image Captioning Evaluation
+## CIDEr Description
+The CIDEr (Consensus-based Image Description Evaluation) metric is widely used in image captioning tasks to evaluate the quality of generated captions. The metric assesses how well the generated caption aligns with human-written reference captions by considering both the frequency and relevance of words or phrases. The score is computed using a weighted combination of n-gram precision, accounting for the frequency of each n-gram in the reference set.
+The formula for the CIDEr metric is as follows:
+$
+\text{CIDEr}(c_i, C) = \frac{1}{N} \sum_{n=1}^{N} w_n \cdot \frac{\sum_{j=1}^{m} \text{IDF}(g_j) \cdot \text{TF}(g_j, c_i)}{\sum_{j=1}^{m} \text{IDF}(g_j) \cdot \text{TF}(g_j, C)}
+$
+where:
+- $ c_i $ is the candidate caption,
+- $ C $ is the set of reference captions,
+- $ N $ is the number of n-grams (typically 1 to 4),
+- $ w_n $ is the weight for the n-gram,
+- $ g_j $ represents the j-th n-gram,
+- $ \text{TF}(g_j, c_i) $ is the term frequency of the n-gram $ g_j $ in the candidate caption $ c_i $,
+- $ \text{TF}(g_j, C) $ is the term frequency of the n-gram $ g_j $ in the reference captions $ C $,
+- $ \text{IDF}(g_j) $ is the inverse document frequency of the n-gram $ g_j $.
+## How to Use
+To use the CIDEr metric, you need to initialize the `CIDEr` class and provide the predicted and reference captions. The metric will tokenize the captions and compute the CIDEr score.
+### Inputs
+- **predictions** *(list of str)*: The list of predicted captions generated by the model.
+- **references** *(list of list of str)*: The list of lists, where each list contains the reference captions corresponding to each prediction.
+- **n** *(int, optional, defaults to 4)*: Number of n-grams for which (ngram) representation is calculated.
+- **sigma** *(float, optional, defaults to 6.0)*: The standard deviation parameter for the Gaussian penalty.
+### Output Values
+- **CIDEr** *(float)*: The computed CIDEr score, which typically ranges between 0 and 100. Higher scores indicate better alignment between the predicted and reference captions.
+### Examples
+```python
+>>> from evaluate import load
+>>> CIDEr = load("Kamichanw/CIDEr")
+>>> predictions = ["A cat sits on a mat."]
+>>> references = [["A cat is sitting on a mat.", "A feline rests on the mat."]]
+>>> score = cider_metric.compute(predictions=predictions, references=references)
+>>> print(score['CIDEr'])
+0.0
+```
+## Limitations and Bias
+The CIDEr metric primarily focuses on the n-gram overlap between predicted and reference captions. It may not adequately capture semantic nuances or variations in phrasing that still convey the same meaning. Moreover, CIDEr tends to favor longer captions with more word overlap, potentially biasing against concise but accurate captions.
+## Citation
+If you use the CIDEr metric in your research, please cite the original paper:
+```bibtex
+@inproceedings{vedantam2015cider,
+  title={Cider: Consensus-based image description evaluation},
+  author={Vedantam, Ramakrishna and Lawrence Zitnick, C and Parikh, Devi},
+  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
+  pages={4566--4575},
+  year={2015}
+}
+```
+## Further References
+- [CIDEr GitHub Repository](https://github.com/tylin/coco-caption)
+- [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/)