I recently took part in a popular conference for Data Science called PyData. The conference was in Eindhoven and was organized mainly by the companies ASML and Greenhouse, among others. It happened over two days, the first with hands-on workshops and the second with presentations/talks. But first things first, what is PyData?
What is PyData?
PyData is an educational program of NumFOCUS with a focus on bringing together developers and users of data analysis tools in order to learn from each other and share their ideas. NumFOCUS is a non-profit charity which promotes open practices in research, data, and scientific computing. In the context of PyData, numerous conferences are organized regularly around the world that bring together companies and individuals from the local Data Science ecosystem. You can learn more about PyData and NumFOCUS, as well as find upcoming conferences here. At this point I hear you wondering: “How did he end up in PyData Eindhoven?”
How I Turned Up In PyData Eindhoven?
I had heard of PyData Amsterdam from colleagues in my PDEng program and instantly became interested in taking part in it some day. When I heard that the conference would be happening for the first time in Eindhoven, where I am currently pursuing my PDEng, I immediately took the opportunity to sign up along with some of my colleagues. I participated in both days of the conference and I will start by describing them individually, before giving my final verdict on the conference as a whole.
First Day Of PyData Eindhoven – Tutorials/Workshops
The first day of the conference took place at the office of the company Greenhouse in the center of Eindhoven. There were people attending from all over the world. The number of people attending was though lower than the number of people attending the talks/presentations on the second day of the conference, something logical, since the tutorial sessions were the more “premium” part of the conference. After all, someone could buy tickets only for the second day, hence the talks, but could not do the same only for the tutorials. Also, the available seats for the tutorials were fewer, so some people could not get a ticket for them.
All in all, as you can see from the image above, there were in total four practical tutorials. The topics covered were:
- Image Recognition with Deep Learning: This first tutorial was pretty basic if you were familiar with Deep Learning and Computer Vision. The presenters made it interesting by customizing the problem for rooftops in The Netherlands. They also added a small challenge for the next day which spiced it up a bit. Pipple was the company that prepared the challenge and have made the notebook available on their github repo here. It’s a nice notebook to go through.
- Apache Airflow for Sceduling Machine Learning Tasks: This tutorial was given by Big Data Republic. A very interesting workshop on Apache Airflow, a tool that seems to be the craze lately. I was looking forward to it and it was quite interesting to get a glimpse of what Airflow is all about. Seems to be a very helpful tool for sceduling Machine Learning Pipelines and different tasks of an ETL Data Pipeline, at times as such when getting new data and you need to retrain your model. The notebook used on the workshop can be found on this Github repo.
- The use of Generative Models, such as GANs and RNNs for artificial text and image generation: This tutorial was given by a combination of ASML and two master students that I don’t remember in which company they were doing their thesis :D. It was more of a fancy demonstration of GANs, RNNs, and Variational Autoencoders. The image generation was done with VAEs and GANs and was based on the fashion MNIST dataset. The text generation was done with RNNs and LSTMs and was based on Shakespeare’s and Sherlock Holmes’s text datasets. The speakers also spent some time explaining basic concepts of RNNs, LSTMs, and VAEs, but the time was not enough to really make an impact. The showcase was an interesting demonstration of these fancy techniques at the very least. You can find the code for this workshop on this Github repo link.
- An introduction to MLflow as a form of “Machine Learning Version Control”: This final tutorial was given by Signify and was an introduction to the MLflow framework. MLflow is an open source platfrom for managing Machine Learning workflows. The speakers showed some examples of how they use MLflow on their own work and the use cases it serves. It was very interesting to see a framework like that existed. It becomes more and more essential for people working with Machine Learning to do some kind of version control. While traditional software engineering practices are used for this purpose, a tool like MLflow seems to be an ideal version control framework specifically designed for Machine Learning. Same with the previous tutorials, the code can be found on this Github repository.
Second Day of PyData Eindhoven – Talks
The second day of the conference was conducted in a different format of small (half to one hour) talks that happened simultaneously in couples. All the talks happened at one of the buildings (a very beautiful one) of the ASML company at Veldhoven. The schedule was as follows: Naturally I was not able to follow all the talks, but I did attend several of them. From all the talks that I attended, the one that I found the most interesting was the Untitled12.ipynb from Vincent Warmerdam of GoDataDriven. The talk was very fun and inspiring, with the speaker coding in real time. He showed some very fun bad practices that most people working with Jupyter Notebooks and the Data Science stack are falling victim to.
PyData Eindhoven – Final Verdict
This was my first time attending a PyData Data Science conference. It was also the first time the conference was organized in Eindhoven. All in all, it was a very interesting two days. I had the chance to widen my perspective of what people work on in Data Science and got to know very interesting data-driven companies. If you happen to have the chance to attend a PyData conference be sure to do it, given you are a Data Science/Machine Learning/Data Engineering/AI/Software/Python enthusiast.
Question of the Day: Have you attended a PyData conference in the past and how did you like it? Be sure to share your experiences in the comment section below.