di-zhang-fdu (Di Zhang)

posted an update 12 days ago

Post

326

Our new paper Moose-chem 3 introduces experiment-guided hypothesis ranking, a novel setting where candidate hypotheses are prioritized based on experimental feedback from previously tested hypotheses.

To support research in this area, the work proposes a simulator grounded in three domain-informed assumptions that can generate simulated experimental feedback without requiring costly real-world trials.

The simulator is validated on a curated dataset of 124 chemistry hypotheses, and the resulting method outperforms strong pre-experiment baselines. This enables scalable research on feedback-driven hypothesis discovery strategies in scientific domains where empirical validation is expensive or slow.
MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback (2505.17873)

reacted to Kseniase's post with 🚀 12 days ago

Post

4251

12 Types of JEPA

JEPA, or Joint Embedding Predictive Architecture, is an approach to building AI models introduced by Yann LeCun. It differs from transformers by predicting the representation of a missing or future part of the input, rather than the next token or pixel. This encourages conceptual understanding, not just low-level pattern matching. So JEPA allows teaching AI to reason abstractly.

Here are 12 types of JEPA you should know about:

1. I-JEPA -> Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (2301.08243)
A non-generative, self-supervised learning framework designed for processing images. It works by masking parts of the images and then trying to predict those masked parts

2. MC-JEPA -> MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features (2307.12698)
Simultaneously interprets video data - dynamic elements (motion) and static details (content) - using a shared encoder

3. V-JEPA -> Revisiting Feature Prediction for Learning Visual Representations from Video (2404.08471)
Presents vision models trained by predicting future video features, without pretrained image encoders, text, negative sampling, or reconstruction

4. UI-JEPA -> UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity (2409.04081)
Masks unlabeled UI sequences to learn abstract embeddings, then adds a fine-tuned LLM decoder for intent prediction.

5. Audio-based JEPA (A-JEPA) -> A-JEPA: Joint-Embedding Predictive Architecture Can Listen (2311.15830)
Masks spectrogram patches with a curriculum, encodes them, and predicts hidden representations.

6. S-JEPA -> S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention (2403.11772)
Signal-JEPA is used in EEG analysis. It adds a spatial block-masking scheme and three lightweight downstream classifiers

7. TI-JEPA -> TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems (2503.06380)
Text-Image JEPA uses self-supervised, energy-based pre-training to map text and images into a shared embedding space, improving cross-modal transfer to downstream tasks

Find more types below 👇

Also, explore the basics of JEPA in our article: https://www.turingpost.com/p/jepa

If you liked it, subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

posted an update 3 months ago

Post

2932

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203)
Critic-V has been accepted by CVPR2025!
Bonus! VRI-160K uploaded now!
di-zhang-fdu/R1-Vision-Reasoning-Instructions

posted an update 6 months ago

Post

1717

News! ChemVLM Codes Opensource Now! https://github.com/AI4Chem/ChemVlm

1 reply

·

posted an update 6 months ago

Post

1853

ChemVLM has been accepted by AAAI2025!
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM (2408.07246)
Try have a chat wiht him🤗.
AI4Chem/ChemVLM-26B-1-2

replied to their post 6 months ago

We will write a short technical report for current progress.

reacted to their post with 🚀 6 months ago

Post

3091

The first version of LLaMA-O1 has been uploaded to HF now!Here We Come!
Supervised:
SimpleBerry/LLaMA-O1-Supervised-1129
Base(Pretrain):
SimpleBerry/LLaMA-O1-Base-1127
Supervised Finetune Dataset:
SimpleBerry/OpenLongCoT-SFT
Pretraining Dataset:
SimpleBerry/OpenLongCoT-Pretrain-1202
RLHF is on the way! View our GitHub Repo:
https://github.com/SimpleBerry/LLaMA-O1
Our ongoing related researches:
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203)
@AdinaY @akhaliq @jwu323
------
GGUF:https://huggingface.co/Lyte/LLaMA-O1-Supervised-1129-Q4_K_M-GGUF
online Demo (CPU-only): SimpleBerry/LLaMA-O1-Supervised-1129-Demo

3 replies

·

replied to their post 6 months ago

This comment has been hidden

posted an update 6 months ago

Post

3091

The first version of LLaMA-O1 has been uploaded to HF now!Here We Come!
Supervised:
SimpleBerry/LLaMA-O1-Supervised-1129
Base(Pretrain):
SimpleBerry/LLaMA-O1-Base-1127
Supervised Finetune Dataset:
SimpleBerry/OpenLongCoT-SFT
Pretraining Dataset:
SimpleBerry/OpenLongCoT-Pretrain-1202
RLHF is on the way! View our GitHub Repo:
https://github.com/SimpleBerry/LLaMA-O1
Our ongoing related researches:
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203)
@AdinaY @akhaliq @jwu323
------
GGUF:https://huggingface.co/Lyte/LLaMA-O1-Supervised-1129-Q4_K_M-GGUF
online Demo (CPU-only): SimpleBerry/LLaMA-O1-Supervised-1129-Demo

3 replies

·

posted an update 6 months ago

Post

1378

LLaMA-O1 Base and SFT model will be uploaded to HF today.
RLHF pipeline already ready, still waiting for data sampling.

1 reply

·

replied to jwu323's post 6 months ago

Stay Tuned!

reacted to jwu323's post with 🚀 6 months ago

Post

1377

We are excited to announce a new internal project, Rome, focused on advancing LLM reasoning. The code and accompanying paper will be released soon. Stay tuned!

3 replies

·

replied to their post 7 months ago

You're Genius!

replied to their post 7 months ago

main.py is the entry for finetune, but codes need further improvements, see 'Call for contributors'

posted an update 7 months ago

Post

2433

Discovered an outrageous bug on the ChatGPT official website, especially for those using ad-blocking plugins. Please make sure to add browser-intake-datadoghq.com to your ad block whitelist. The ChatGPT webpage relies on this site for heartbeat detection, but since it belongs to an ad tracking network, it's included in major ad-blocking lists. (If you're using Clash, also remember to add it to the whitelist.) Failing to do so may cause the ChatGPT web interface to display a greyed-out send button after clicking, with no response.

For users with Chinese IP addresses, consider adding this URL to the rules of your U.S. node, as the response headers from this site will report the user's physical location to GPT.

3 replies

·

posted an update 7 months ago

Post

6431

LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models!
https://github.com/SimpleBerry/LLaMA-O1/

What will happen when you compound MCTS ❤ LLM ❤ Self-Play ❤RLHF?
Just a little bite of strawberry!🍓

Past related works:
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)

2 replies

·

posted an update 10 months ago

Post

1546

🚀 Introducing ChemVLM, the first open-source multimodal large language model dedicated to chemistry!
🌟Comparable performances with commercial models or specific OCR model but with dialogue capabilities!
✨2B/26B Models Here! AI4Chem/ChemVLM-26B
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM (2408.07246)

replied to their post 11 months ago

后端异常，挂掉了，在修复

posted an update 11 months ago

Post

660

Preview:
We will open source the 2.5B ChemVLM and the tool-enhanced ChemLLM-7B in the near future

posted an update 12 months ago

Post

745

A great work based on ChemLLM from Open-source community!
Automatic Scientific Discovery guided by LLM!
https://github.com/zyzisastudyreallyhardguy/LLM4SD

Di Zhang

AI & ML interests

Recent Activity

Organizations

di-zhang-fdu's activity