Edit 2019: mytaxi ist mittlerweile zu free now umfirmiert. Der nachfolgende Text erschien im Sommer 2017 und beinhaltet daher den alten Namen.
Als Kunde nutzt man meist nur die mytaxi-App, um ein Taxi zu buchen – dabei gibt es beim Unternehmen selbst viele IT-Prozesse zu entdecken. Tereza Iofciu ist gebürtige Rumänin, studierte Informatik weniger aus Leidenschaft als aus Pragmatismus – und fand in der IT doch das perfekte Umfeld für ihre persönliche Weiterentwicklung vor.
I am a data engineer at mytaxi in Hamburg and in my free time I am an artist. Few people can tell by looking at me that I have a PhD and many are quite surprised and impressed when I say that it is in Informatics. I personally do not really get why it is such a big deal. Doing a PhD is not that hard and it can be quite fun and rewarding. One gets to meet and work with many smart people around the globe and also travel a lot.
I obviously did not plan any of that, neither did I dream of it as a child. I come from Bucharest, Romania, where the school and university systems are a bit different than in Germany. In the 11th grade you pretty much had to decide what you wanted to study as each university had its own set of exams. Choosing to study Informatics was a mere pragmatical decision. I was good at Math and Physics but I did not really want to be a teacher. I really liked painting, but for studying art, well, those kids were starting to prepare in the 5th grade already.
In 2000 studying informatics seemed like a ,safe future’ thing to do. So I got in the Computer Science University, where we had a ratio of four girls in 25 students and learning was like a full time job. During my studies I got involved in all sorts of extracurricular activities, like the Imagine Cup competition organized by Microsoft, an experience which got me out of the normal ,student goes to university’-mode and opened up a lot of opportunities.
One of those opportunities was coming to Hannover in 2005 in order to work on my final thesis. After my studies I decided to stay at the L3S Research Center and work on my PhD about User Profiling and Entities on the Web. At the time the topic of my research was called Information Retrieval, now it falls under the Data Science umbrella. During this time I also did a three months internship with Microsoft Research in Cambridge. There I got to meet a lot of bright students and mentors and experience an environment where academia meets industry.
In 2011 I decided to switch to industry, moved to Hamburg and worked for XING as a Data Scientist for three and half years. I then took a year off, got to explore art and the startup world. There are quite a lot of startup events happening all year round and it is a good opportunity to find new challenges and meet lots of likeminded people.
I eventually started to miss working with data and started looking for new opportunities. Unfortunately, when wanting to work with data, one has fewer options in the EU than in the US, especially when one does not want to work in the gaming industry or market research. I was happy to find out last year, that mytaxi has a data team in Hamburg. If you ever had to order a taxi in Germany, Spain, Italy, Poland and so on you have probably heard of mytaxi. I call it the fair-trade uber. I like fairtrade and mytaxi is building a product, which is not all about profit, but also about giving something back to the community.
One of the fun facts about joining mytaxi is, that I significantly increased the quota of women developers in the company, this is always a good thing. The data team at mytaxi was new and small, thus in the past year we all learned and achieved quite a lot. Working in the data engineering domain definitely requires autonomy, as there are usually not many experts in the field and the chance that the experts are in your company is also quite small.
„Working in data engineering definitely requires autonomy“
Many are probably wondering: ,What does a data team do?’ We are building a Data Science Infrastructure. For this we have to constantly research what the best tools to use for our requirements are: setting up and maintaining data clusters, building data processing pipelines for the analysts to have cleaned up data and building Data Products.
In the past year we have successfully set up our Hadoop cluster and rewrote the data processing pipeline, which prepares data for the analysts. The BI analyst team at mytaxi uses Tableau as an analytical tool for building dashboards, which are then viewed by various teams in the company, from Marketing to Product teams. Our main data processing workflow loads data stored by various microservices at mytaxi, processes it with various cleaning and aggregation steps and then transform it for BI team to be able to efficiently interact with the data. Some of these tasks run parallel and some depend on each other. For managing such workflows we use Airflow, which is an open source project written in python by Airbnb.
In our day to day work in the data team we experiment and get to work with various technologies, aka all the buzzwords, such as Hive, Hdfs, Kafka, Spark, Airflow, Ranger, Presto, Zeppelin. The one thing they all have in common is that they are all open source, you can look up their code on github and anyone is welcome to contribute. Which makes now a very exciting time, before, industry and academia used to be very different, in academia you got to publish your research, both methodology and results and get peer reviewed and industry was like a walled garden. Now open sourcing opens up also industry to collaboration. Our cluster is slowly growing into a data hub for the mytaxi data, providing a centralized place for people who want to do data analysis. This also means that more and more of the business and product decisions start depending on the data the analysts prepare and we provide. The cluster and all the data workflows have to be 100 percent reliable, and we are responsible for monitoring this.
As you can see, the beginning of a data team is more about administration, trial and error as well as tinkering. From this point on, when we have a first working version of a data science infrastructure, we can start focusing on the fun part of data engineering: building data products, getting to play with cool machine learning and recommender system algorithms.
When I grew up and I studied and had my first jobs, the topic ,Women in STEM’ (or lack thereof) did not really come up. This does not mean that this is not a real issue: I still know very few women developers. What I do mean is that this never played a role in my career decisions, I never thought I couldn’t do something because I was a girl. So if you like math, physics, numbers and so on, do not be afraid of getting into science. It can be fun!
Dr. Tereza Iofciu promovierte zum Thema „User Profiling and Entities on the Web“ und ist bei mytaxi als Data Engineer tätig. Dort baute sie zusammen mit Kollegen ein Data Team auf.
Alle Jobs bei mytaxi findet ihr auch auf der Website!