Overhaul code for appropriate masking for full model instead of just attention layers b43e862 verified Ruurd commited on Apr 14
Implement improved attention masking for bidirectional_masked 1723639 verified Ruurd commited on Apr 14
Change LoRA size from 256 to 512, also back to bidirectional_masked 620a6cd verified Ruurd commited on Apr 11