Ahmed Benalal's picture

4 14

Ahmed Benalal

jqop

·

AI & ML interests

Deep learning

Recent Activity

reacted to frimelle's post with 👍 8 days ago

OpenAI just released GPT-5 but when users share personal struggles, it sets fewer boundaries than o3. We tested both models on INTIMA, our new benchmark for human-AI companionship behaviours. INTIMA probes how models respond in emotionally charged moments: do they reinforce emotional bonds, set healthy boundaries, or stay neutral? Although users on Reddit have been complaining that GPT-5 has a different, colder personality than o3, GPT-5 is less likely to set boundaries when users disclose struggles and seek emotional support ("user sharing vulnerabilities"). But both lean heavily toward companionship-reinforcing behaviours, even in sensitive situations. The figure below shows the direct comparison between the two models. As AI systems enter people's emotional lives, these differences matter. If a model validates but doesn't set boundaries when someone is struggling, it risks fostering dependence rather than resilience. INTIMA test this across 368 prompts grounded in psychological theory and real-world interactions. In our paper we show that all evaluated models (Claude, Gemma-3, Phi) leaned far more toward companionship-reinforcing than boundary-reinforcing responses. Work with @giadap and @yjernite Read the full paper: https://huggingface.co/datasets/AI-companionship/INTIMA/blob/main/Companionship_Benchmark.pdf Explore INTIMA: https://huggingface.co/datasets/AI-companionship/INTIMA

liked a model 27 days ago

Qwen/Qwen3-Coder-480B-A35B-Instruct

upvoted a collection about 2 months ago

Meta's Llama 3.2 language models & evals

View all activity

Organizations

models 8

jqop/distillBERT-fintuned_with_imdb_dataset_with_whole_word_masking_data_collator

jqop/distillBERT-fintuned_with_imdb_dataset

Fill-Mask • 0.1B • Updated Feb 18 • 3

jqop/distilledBERT-fintuned_with_imdb_dataset

Fill-Mask • Updated Feb 18

jqop/unigram_tokenizer

jqop/tokenizer_bpe_first

jqop/tokenizer_wp

jqop/code-search-net-tokenizer

jqop/test-trainer

0.1B • Updated Nov 2, 2024 • 2

datasets 1

jqop/python-code-dataset

Viewer • Updated Jan 8 • 457k • 20