🚀 This Thursday (7 December) at Pytech Summit (https://pytechsummit.pl/) you will have the opportunity to listen to the Speakleash team representation, consisting of Szymon Baczyński and Igor Ciuciura. 🎙️
The talk will be about creating the Speakleash package as a data management tool.
The conference takes place online. Book your ticket today! 🎟️
We are very proud to announce that we have become official partners of a unique hackathon – Hack to the Rescue!
Hack to the Rescue is the world’s largest Generative AI event. Its goal this year is to search for the most effective solutions to help nonprofit organizations deal with the most pressing challenges of the modern world. It is an online event that will take place on June 14-15.
We would like to point out that among the mentors of this extraordinary hackathon are Maria Filipkowska and Adrian Gwoździej, who work with us on a daily basis on the Speakleash project! We are extremely pleased that they are with us and with their attitude they motivate us to continuous development.
We invite you to read the details of the event at the link: https://hacktotherescue.org/
We have barely enjoyed the Python Data Summit webinar, and there is already another presentation waiting for us! You are warmly invited to the conference, which will be held on June 15-16. Among the speakers, in addition to the standard duo – Sebastian Kondracki from Deviniti and Adrian Gwozdziej from BTC and Bank Pekao S.A. – will be Maria Filipowska and Grzegorz Urbanowicz. The presentation will discuss the achievements of the SpeakLeash project to date, as well as compare them with other initiatives. There will certainly be other interesting topics as well. For all those willing to attend the conference, we have a code for a -20% discount. See you there – you can’t miss it!
The temperature outside is rising, but this has nothing to do with our data collection rate. The end of May is under our belt, and we reach “3” in front, 302GB to be exact!!! It is worth mentioning that only 2 months ago i.e. at the end of March we had only 120GB. This gives an optimistic outlook for further updates which will appear as soon as possible.
The last nearly 50GB include women’s, sports or health forums and public information.
Please visit our dashboard, where you will learn much more about the data we collected.
Data is not the only thing a person lives by. As the interest in our project is greater than we could have expected, we are reaching out to you with a hand.
Tomorrow our representatives in the person of Sebastian Kondracki from Deviniti and Adrian Gwozdziej from BTC and Bank Pekao S.A. will talk about how to effectively obtain large text data sets in Python on the example of the SpeakLeash.org project.
The conference will take place tomorrow, i.e. May 18, at 1:00 p.m. You are cordially invited. You must not miss it.
Another week, another update! This time we exceed the magic number that marks the achievement of a quarter of our goal. 255.1GB or 255,100MB(which sounds even more impressive) is the exact volume of Polish text data we have managed to collect so far. The data collected, like last time, was for the forums and education categories.
Knowing our researchers, next week we will be even closer to our goal, as the pace of data collection is growing exponentially. This is also thanks to the people who have joined the project in recent weeks inspired by the idea to help our work. If you are interested in being a part of something big, don’t hesitate to write to us!!! Contact link in the comments. 👇
We come with positive news! As promised, we managed to exceed more than 200GB before May. Moreover, currently the counter has stopped at over 217GB, although we are not sure if it has already changed while we are writing this post 🙂
The main source of data acquisition relate to lifestyle and beauty forums.
We can’t describe the enormity of the work and dedication of our experts. Thank you !
As a result, based on the last update of the OSCAR project from 23.01(https://lnkd.in/gEgAQygg)) we surpass the mentioned project in terms of the size of the dataset, by as much as 40%.
We will keep you updated on further changes. We are looking forward to the next news.
As promised, more data from the blogs and education category is now in our granary! To get an idea of the task we’re facing, the data from this category alone is 2.9million files and that’s just a fraction of what we’ve collected. Another added set of data relates to job listings. As a result, at the moment our project has the largest number of Polish data!
We wish you much peace and joy in the coming days!
In the meanwhile, we can report the import of more data. As promised, another from the blogs and education category which, together with the previous texts, gives us more than 145 GB of text data. You can see more details on our dashboard: Speakleash Dashboard – Streamlit