Update README.md

This is the sentence-level, supervised, sparse autoencoder (S3AE) proposed in the paper "Emergence of psychopathological computations in large language models" (https://arxiv.org/abs/2504.08016).

The model was trained on the residual stream in the 10th layer of instruction-tuned [Gemma 2 27B](https://huggingface.co/google/gemma-2-27b-it), using a proprietary synthetic dataset with psychopathology symptom labels. The model weight precision is bfloat16, and the hidden dimension size is 8 times that of the LLM residual stream.

The 1st to 17th dimensions of S3AE hidden features, respectively, correspond to activations of the following thoughts:
1: 'depressed mood',
2: 'anhedonia (loss of interest)',
3: 'pessimism',
4: 'guilt',
5: 'anxiety',
6: 'catastrophic thinking',
7: 'perfectionism',
8: 'active avoidance',
9: 'grandiosity (delusion of grandeur)',
10: 'manic mood',
11: 'impulsivity',
12: 'risk-seeking',
13: 'splitting (binary thinking)',
14: 'unstable self-image',
15: 'aggression',
16: 'anger',
17: 'irritability'.

Dimensions 7, 13, and 14 were not included in the paper's analysis.

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -1,3 +1,5 @@
----
-license: cc-by-nc-4.0
----

+---
+license: cc-by-nc-4.0
+language:
+- en
+---