ProCreations commited on
Commit
afdc6b5
·
verified ·
1 Parent(s): 49f62df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -1,3 +1,39 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: zero-shot-image-classification
6
+ tags:
7
+ - vision
8
+ - simple
9
+ - small
10
+ ---
11
+
12
+ # tinyvvision 🧠✨
13
+
14
+ **tinyvvision** is a compact, synthetic curriculum-trained vision-language model designed to demonstrate real zero-shot capability in a minimal setup. Despite its small size (~630k parameters), it aligns images and captions effectively by learning shared visual-language embeddings.
15
+
16
+ ## What tinyvvision can do:
17
+ - Match simple geometric shapes (circles, stars, hearts, triangles, etc.) and descriptive captions (e.g., "a red circle", "a yellow star").
18
+ - Perform genuine zero-shot generalization, meaning it can correctly match captions to shapes and colors it has never explicitly encountered during training.
19
+
20
+ ## Model Details:
21
+ - **Type**: Contrastive embedding (CLIP-style, zero-shot)
22
+ - **Parameters**: ~630,000 (tiny!)
23
+ - **Training data**: Fully synthetic—randomly generated shapes, letters, numbers, and symbols paired with descriptive text captions.
24
+ - **Architecture**:
25
+ - **Image Encoder**: Simple CNN
26
+ - **Text Encoder**: Small embedding layer + bidirectional GRU
27
+ - **Embedding Dim**: 128-dimensional shared embedding space
28
+
29
+ ## Examples of Zero-Shot Matching:
30
+ - **Seen during training**: "a red circle" → correctly matches the drawn red circle.
31
+ - **Never seen**: "a teal lightning bolt" → correctly matched a hand-drawn lightning bolt shape, despite never having seen one during training.
32
+
33
+ ## Limitations:
34
+ - tinyvvision is designed as a demonstration of zero-shot embedding and generalization on synthetic data. It is not trained on real-world data or complex scenarios. While robust within its domain (simple geometric shapes and clear captions), results may vary significantly on more complicated or out-of-domain inputs.
35
+
36
+ ## How to Test tinyvvision:
37
+ Check out the provided inference script to easily test your own shapes and captions. Feel free to challenge tinyvvision with new, unseen combinations to explore its generalization capability!
38
+
39
+ ✨ **Enjoy experimenting!** ✨