Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Categories
AI dataset

We are leading

As promised, more data from the blogs and education category is now in our granary! To get an idea of the task we’re facing, the data from this category alone is 2.9million files and that’s just a fraction of what we’ve collected. Another added set of data relates to job listings. As a result, at the moment our project has the largest number of Polish data!

Categories
AI dataset

Happy Easter!

We wish you much peace and joy in the coming days!

In the meanwhile, we can report the import of more data. As promised, another from the blogs and education category which, together with the previous texts, gives us more than 145 GB of text data. You can see more details on our dashboard: Speakleash Dashboard – Streamlit

Happy Easter!

Categories
AI dataset

141GB

Another 3 datasets are already in our granary! The datasets come from media in general as well as from sites related to weblogs. Currently our dataset count has stopped at 141GB, and you can be sure that there will be another increase from these areas, like media and blogs, in the near future.
Below you can see the distribution of each category on a pie chart.

Categories
AI dataset

We don’t stop

We have big plans and a amazing team, but the amount of data is too much for the existing staff to be able to achieve our ambitious goal within the deadline. 

Therefore, if you know Python language and love data, please write to us. We need your help right now!

Ending with positive news, another 6GB from the legal category is already in our SpeakLeash. For more details visit our dashboard( https://speakleash.streamlit.app/ )

Categories
AI dataset

Spring has come!

We welcome spring with another great news! Thanks to the acquisition of data from media categories and online stores, we managed to exceed 120GB of data! Big thank you for the whole team for hard work which is an inspiration for all of us.
How much do you think we will be able to collect this spring?

Categories
AI dataset

Another milestone

After months of research and talks we can say we made a milestone in our mission. We reached over 100GB of pure data text! It includes Wikipedia, thesis and novels. What do you think about it? What data would you like to add to train first polish GPT? Don’t hesitate to look it up here: https://speakleash.streamlit.app/.

Categories
AI dataset

BIG ANNOUNCEMENT!!

From now on, on our webpage extension (https://speakleash.streamlit.app/) you can see a live dashboard! Thanks to it you can track how our work is going starting from capacity of data, distribution of the data between the industries and much more! Apart from it, you can apply filters which help fit your demands. If you have any questions about the dashboard or SpeakLeash in general don’t hesitate to ask them.

Categories
AI dataset

Big announcement!

From now on on our webpage extension (https://speakleash.streamlit.app/) you can see a live dashboard! Thanks to it you can track how our work is going starting from capacity of data, distribution of the data between the industries and much more! Apart from it, you can apply filters which help fit your demands. If you have any questions about the dashboard or SpeakLeash in general don’t hesitate to ask them.

Categories
AI dataset

Social & GitHub are live!

We are happy to announce that our social platforms & GitHub are live! You can find the links in Community & Contact section. If you want to be updated about our progress, make sure to leave a follow.

Categories
AI dataset

SpeakLeash a.k.a Spichlerz has official blog!

A project to build a dataset with a capacity of min. 1TB containing diverse texts in Polish for Language Modeling has official blog now! Keep an eye on our blog for the latest information about the development of the dataset. Here is SpeakLeash.org blog RSS feed.