Update README.md
Browse files
README.md
CHANGED
@@ -90,12 +90,19 @@ This multi-level distillation strategy enables DMind-1-mini to maintain high Web
|
|
90 |
|
91 |
## 2. Evaluation Results
|
92 |
|
93 |
-

|
94 |
|
95 |
-
|
96 |
-
|
97 |
-
- DMind-1
|
98 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
|
100 |
|
101 |
## 3. Use Cases
|
|
|
90 |
|
91 |
## 2. Evaluation Results
|
92 |
|
|
|
93 |
|
94 |
+

|
95 |
+
|
96 |
+
We evaluate DMind-1 and DMind-1-mini using the [DMind Benchmark](https://huggingface.co/datasets/DMindAI/DMind_Benchmark), a domain-specific evaluation suite designed to assess large language models in the Web3 context. The benchmark includes 1,917 expert-reviewed questions across nine core domain categories, and it features both multiple-choice and open-ended tasks to measure factual knowledge, contextual reasoning, and other abilities.
|
97 |
+
|
98 |
+
To complement accuracy metrics, we conducted a **cost-performance analysis** by comparing benchmark scores against publicly available input token prices across 24 leading LLMs. In this evaluation:
|
99 |
+
|
100 |
+
- **DMind-1** achieved the highest Web3 score while maintaining one of the lowest token input costs among top-tier models such as Grok 3 and Claude 3.5 Sonnet.
|
101 |
+
|
102 |
+
- **DMind-1-mini** ranked second, retaining over 95% of DMind-1’s performance with greater efficiency in latency and compute.
|
103 |
+
|
104 |
+
Both models are uniquely positioned in the most favorable region of the score vs. price curve, delivering state-of-the-art Web3 reasoning at significantly lower cost. This balance of quality and efficiency makes the DMind models highly competitive for both research and production use.
|
105 |
+
|
106 |
|
107 |
|
108 |
## 3. Use Cases
|