Kamichanw commited on
Commit
444f4df
·
verified ·
1 Parent(s): e3e59ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,12 +1,75 @@
1
  ---
2
- title: Cider
3
  emoji: 🐨
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
7
- sdk_version: 4.41.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: CIDEr
3
  emoji: 🐨
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
7
+ sdk_version: 3.19.1
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
+ # CIDEr Metric for Image Captioning Evaluation
13
+
14
+ ## CIDEr Description
15
+ The CIDEr (Consensus-based Image Description Evaluation) metric is widely used in image captioning tasks to evaluate the quality of generated captions. The metric assesses how well the generated caption aligns with human-written reference captions by considering both the frequency and relevance of words or phrases. The score is computed using a weighted combination of n-gram precision, accounting for the frequency of each n-gram in the reference set.
16
+
17
+ The formula for the CIDEr metric is as follows:
18
+
19
+ $
20
+ \text{CIDEr}(c_i, C) = \frac{1}{N} \sum_{n=1}^{N} w_n \cdot \frac{\sum_{j=1}^{m} \text{IDF}(g_j) \cdot \text{TF}(g_j, c_i)}{\sum_{j=1}^{m} \text{IDF}(g_j) \cdot \text{TF}(g_j, C)}
21
+ $
22
+
23
+ where:
24
+ - $ c_i $ is the candidate caption,
25
+ - $ C $ is the set of reference captions,
26
+ - $ N $ is the number of n-grams (typically 1 to 4),
27
+ - $ w_n $ is the weight for the n-gram,
28
+ - $ g_j $ represents the j-th n-gram,
29
+ - $ \text{TF}(g_j, c_i) $ is the term frequency of the n-gram $ g_j $ in the candidate caption $ c_i $,
30
+ - $ \text{TF}(g_j, C) $ is the term frequency of the n-gram $ g_j $ in the reference captions $ C $,
31
+ - $ \text{IDF}(g_j) $ is the inverse document frequency of the n-gram $ g_j $.
32
+
33
+ ## How to Use
34
+ To use the CIDEr metric, you need to initialize the `CIDEr` class and provide the predicted and reference captions. The metric will tokenize the captions and compute the CIDEr score.
35
+
36
+ ### Inputs
37
+ - **predictions** *(list of str)*: The list of predicted captions generated by the model.
38
+ - **references** *(list of list of str)*: The list of lists, where each list contains the reference captions corresponding to each prediction.
39
+ - **n** *(int, optional, defaults to 4)*: Number of n-grams for which (ngram) representation is calculated.
40
+ - **sigma** *(float, optional, defaults to 6.0)*: The standard deviation parameter for the Gaussian penalty.
41
+
42
+ ### Output Values
43
+ - **CIDEr** *(float)*: The computed CIDEr score, which typically ranges between 0 and 100. Higher scores indicate better alignment between the predicted and reference captions.
44
+
45
+ ### Examples
46
+
47
+ ```python
48
+ >>> from evaluate import load
49
+ >>> CIDEr = load("Kamichanw/CIDEr")
50
+ >>> predictions = ["A cat sits on a mat."]
51
+ >>> references = [["A cat is sitting on a mat.", "A feline rests on the mat."]]
52
+ >>> score = cider_metric.compute(predictions=predictions, references=references)
53
+ >>> print(score['CIDEr'])
54
+ 0.0
55
+ ```
56
+
57
+ ## Limitations and Bias
58
+ The CIDEr metric primarily focuses on the n-gram overlap between predicted and reference captions. It may not adequately capture semantic nuances or variations in phrasing that still convey the same meaning. Moreover, CIDEr tends to favor longer captions with more word overlap, potentially biasing against concise but accurate captions.
59
+
60
+ ## Citation
61
+ If you use the CIDEr metric in your research, please cite the original paper:
62
+
63
+ ```bibtex
64
+ @inproceedings{vedantam2015cider,
65
+ title={Cider: Consensus-based image description evaluation},
66
+ author={Vedantam, Ramakrishna and Lawrence Zitnick, C and Parikh, Devi},
67
+ booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
68
+ pages={4566--4575},
69
+ year={2015}
70
+ }
71
+ ```
72
+
73
+ ## Further References
74
+ - [CIDEr GitHub Repository](https://github.com/tylin/coco-caption)
75
+ - [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/)