SpeakLeash a.k.a Spichlerz!

An open collaboration project to build a data set for Language Modeling with a capacity of at least 1TB comprised of diverse texts in Polish. Our aim is to enable machine learning research and to train a Generative
Pre-trained Transformer Model from collected data.

Latest news:

