Spaces:

double-ai
/

FormulaOne-Leaderboard

Running on CPU Upgrade

App Files Files Community

galb-dai commited on Aug 13

Commit

5662cd4

1 Parent(s): e5e5305

Updated.

Browse files

Files changed (1) hide show

src/about.py +10 -0

src/about.py CHANGED Viewed

@@ -44,6 +44,8 @@ WHAT_IS_F1_HTML_TOP = f"""
 """
 # Bottom is split so we can insert real Gradio media (images/video) from app.py.
 WHAT_IS_F1_HTML_BOTTOM_A = """
 <div class="f1-container">
   <section>
@@ -61,11 +63,13 @@ WHAT_IS_F1_HTML_BOTTOM_A = """
     <!-- bag_modifications figure inserted via gr.Image in app.py -->
 """
 WHAT_IS_F1_HTML_BOTTOM_B = """
     <p class="mb-4 f1-p">An algorithm can then traverse this tree of bags, solving the problem piece by piece using dynamic programming. This process involves designing a “state” that summarises all necessary information about the partial solution within a bag, and then defining how this state transforms as vertices are introduced, forgotten, or bags are merged.</p>
     <!-- Video inserted via gr.Video in app.py -->
 """
 WHAT_IS_F1_HTML_AFTER_VIDEO = """
     <p class="f1-p">The deceptive simplicity of the problem statements belies the <strong>extraordinary difficulty</strong> of discovering the correct dynamic programming solution. This process is riddled with subtle combinatorial and logical pitfalls, demanding a profound understanding of the problem’s underlying structure. For a detailed walkthrough of the fifteen interdependent reasoning steps required to solve a single hard problem &mdash; <code>Maximal-Cluster-Graph</code> &mdash; <a href="https://arxiv.org/pdf/2507.13337#appendix.A" target="_blank" rel="noopener noreferrer" class="f1-a">see the appendix of our paper</a>.</p>
   </section>
@@ -81,17 +85,23 @@ WHAT_IS_F1_HTML_AFTER_VIDEO = """
     </ul>
     <p class="mb-4 f1-p">To support research and encourage community contributions, the <code>FormulaOne-Warmup</code> dataset is released as a public resource for training and fine-tuning models. The complete test suite for all 100 Warmup problems is available, alongside a standalone evaluation environment, in our <a href="https://github.com/double-ai/formulaone-dataset/tree/main" target="_blank" rel="noopener noreferrer" class="f1-a">GitHub repository</a>.</p>
     <p class="f1-p">To maintain the integrity of the core benchmark, only a minimal subset of tests is released for the Tier 1 and Tier 2 problems.</p>
     <h2 class="f1-h2">Model Accuracy</h2>
     <p class="mb-4 f1-p">On the <strong>FormulaOne-Warmup</strong> problems, frontier models perform reasonably well. This confirms they have a foundational capability for these types of algorithmic tasks.</p>
     <!-- warmup_performance figure inserted via gr.Image in app.py -->
 """
 WHAT_IS_F1_HTML_AFTER_WARMUPFIG = """
     <p class="mb-4 f1-p">However, as the reasoning depth increases in <strong>Tier 1</strong>, and solutions require the discovery and integration of novel and more complex state representations, model performance drops off sharply.</p>
     <!-- tier1_performance figure inserted via gr.Image in app.py -->
 """
 WHAT_IS_F1_HTML_AFTER_TIER1FIG_TAIL = """
     <p class="f1-p">This trend culminates in <strong>Tier 2</strong>, where the difficulty is characteristic of exploratory research problems. On this set of 20 problems, no current frontier model solves even a single one. This result starkly illustrates the gap that remains between high performance on existing benchmarks and the deep algorithmic reasoning required for truly complex problems.</p>
   </section>

 """
 # Bottom is split so we can insert real Gradio media (images/video) from app.py.
+# Up to before the first figure (bag_modifications.png)
 WHAT_IS_F1_HTML_BOTTOM_A = """
 <div class="f1-container">
   <section>
     <!-- bag_modifications figure inserted via gr.Image in app.py -->
 """
+# After the first figure, before the video
 WHAT_IS_F1_HTML_BOTTOM_B = """
     <p class="mb-4 f1-p">An algorithm can then traverse this tree of bags, solving the problem piece by piece using dynamic programming. This process involves designing a “state” that summarises all necessary information about the partial solution within a bag, and then defining how this state transforms as vertices are introduced, forgotten, or bags are merged.</p>
     <!-- Video inserted via gr.Video in app.py -->
 """
+# Text immediately after the video; opens Evaluation section header/content (up to before Warmup figure)
 WHAT_IS_F1_HTML_AFTER_VIDEO = """
     <p class="f1-p">The deceptive simplicity of the problem statements belies the <strong>extraordinary difficulty</strong> of discovering the correct dynamic programming solution. This process is riddled with subtle combinatorial and logical pitfalls, demanding a profound understanding of the problem’s underlying structure. For a detailed walkthrough of the fifteen interdependent reasoning steps required to solve a single hard problem &mdash; <code>Maximal-Cluster-Graph</code> &mdash; <a href="https://arxiv.org/pdf/2507.13337#appendix.A" target="_blank" rel="noopener noreferrer" class="f1-a">see the appendix of our paper</a>.</p>
   </section>
     </ul>
     <p class="mb-4 f1-p">To support research and encourage community contributions, the <code>FormulaOne-Warmup</code> dataset is released as a public resource for training and fine-tuning models. The complete test suite for all 100 Warmup problems is available, alongside a standalone evaluation environment, in our <a href="https://github.com/double-ai/formulaone-dataset/tree/main" target="_blank" rel="noopener noreferrer" class="f1-a">GitHub repository</a>.</p>
     <p class="f1-p">To maintain the integrity of the core benchmark, only a minimal subset of tests is released for the Tier 1 and Tier 2 problems.</p>
+"""
+# *** THIS WAS MISSING BEFORE ***
+# Evaluation: begins the "Model Accuracy" subsection and the Warmup paragraph, up to (but not including) the Warmup figure.
+WHAT_IS_F1_HTML_EVAL_BEFORE_WARMUPFIG = """
     <h2 class="f1-h2">Model Accuracy</h2>
     <p class="mb-4 f1-p">On the <strong>FormulaOne-Warmup</strong> problems, frontier models perform reasonably well. This confirms they have a foundational capability for these types of algorithmic tasks.</p>
     <!-- warmup_performance figure inserted via gr.Image in app.py -->
 """
+# Between Warmup and Tier 1 figures
 WHAT_IS_F1_HTML_AFTER_WARMUPFIG = """
     <p class="mb-4 f1-p">However, as the reasoning depth increases in <strong>Tier 1</strong>, and solutions require the discovery and integration of novel and more complex state representations, model performance drops off sharply.</p>
     <!-- tier1_performance figure inserted via gr.Image in app.py -->
 """
+# Tail after Tier 1 figure (closes evaluation section + container)
 WHAT_IS_F1_HTML_AFTER_TIER1FIG_TAIL = """
     <p class="f1-p">This trend culminates in <strong>Tier 2</strong>, where the difficulty is characteristic of exploratory research problems. On this set of 20 problems, no current frontier model solves even a single one. This result starkly illustrates the gap that remains between high performance on existing benchmarks and the deep algorithmic reasoning required for truly complex problems.</p>
   </section>