Portraits | The Higgs Machine Learning Challenge

We've been contacting some participants who were among the top three in our leaderboard at various points of time and interviewed them to get to know those behind the Kaggle pseudos. Thank you for agreeing and to Abha Eli Phoboo of the ATLAS Outreach Team for putting this together. If you top (first, second or third!) our leaderboard and would like to be featured, do contact us : higgsml_at_lal.in2p3.fr.

Meet the competitors:

Tim Salimans from The Netherlands

Team: Tim Salimans

I’m a data science consultant applying my skills to all kinds of real-world business problems and occasionally a Kaggle competition. I studied a little bit of physics as an undergrad but got my PhD in Econometrics. During that time, I spent three months at Microsoft Research and did other machine learning related work. My earlier Kaggle successes include winning the Observing Dark Worlds challenge to detect dark matter halos.

Competitions like these offer a great platform to try out new modelling techniques and update skills. The competitive element makes them fun. Science-based competitions like the Higgs ML are nice opportunities to learn something new beyond Machine Learning.

Technique Compared to the standard black box approach, I prefer more formal probabilistic modelling of a problem. That way, one can incorporate domain knowledge and solve a problem in an elegant and interpretable way. For this particular challenge, the conventional approach seems quite hard to beat.

Gilles Louppe from Belgium

Team: T.A.G.

I’m finishing a PhD in Machine Learning and my expertise is random forests and different tree-based methods like boosting. A byproduct of my research – I’ve also become a core developer of the Scikit-Learn Python library. My main contributions include ‘sklearn.tree’ and ‘sklearn.ensemble’. Folks at scikit-learn call me "the tree hugger".

I have a strong interest for theoretical work in Machine Learning and like working on actual problems. Competitions often constitute nice, challenging problems to sharpen skills. This challenge is of special interest for me because I’ve been selected as a CERN fellow, starting next September, and this is an ideal opportunity to understand what my future colleagues are working on.

Technique Our main tool is tree-based methods and we’ve been on hold for a few weeks now but we plan to try domain-specific ideas directly within the tree and boosting algorithms in the short term.

Gábor Melis from Hungary

Team: Gábor Melis

I live just outside Budapest, and work remotely as a Lisp developer and consultant for Franz Inc. A graduate in Software Engineer & Mathematics, the second of which is rather rusty, I have always been drawn to competitive games and AI.

Following the progress of Machine Learning and writing ML libraries can become an activity that is detached from reality, and motivation flees. Contests offer pressure, focus. With prizes and feedback on what works best, they also anchor ideas to the real world. The Higgs Challenge is cool and has bragging rights. It would also be great to actually help science a bit.

Technique My usual algorithm is – come up with a plan on how to win with a mostly automated approach, get fed up with the slow progress the computer makes, implement a couple of promising learning methods, spend a lot of time trying various tweaks, eventually break down and understand the domain. I’m also a Lisper with a growing collection of ML libraries, but I also use other codes (scipy, R) when time is scarce or the algorithm is too boring. When hunting the last 0.1%, knowing the implementation inside out can be important.

Mathieu Cliche

Team: Mathieu Cliche

I’m a PhD student in theoretical particle physics, my research focuses on dark matter. I got interested in Machine Learning several months ago and have been improving my coding skills, transitioning from Matlab to Python.

The Challenge is interesting but I doubt that my physics background will be helpful in the end. A clever algorithm should be able to figure out any physics insight I might have. The challenge is a nice place to experiment with different machine learning techniques, it’s incredibly addictive! I thought I might have an edge over the other competitors since I have some physics knowledge, though very limited, domain knowledge. I was also made aware that Lubos Motl, a famous physics blogger, was participating and I thought it would be cool to beat him ;).

Technique Only using Python for this competition and I've coded my own learning algorithm, but it's essentially a twist on a well-known algorithm. I'm not actually using any physics knowledge so far, but if I run out of ideas I might try to use physics.

Triskelion (Hendrik Jacob van Veen) from The Netherlands

Team: T.A.G.

Studied Cognitive Artificial Intelligence at University of Utrecht, did not finish my Bachelor’s. I work as a front-end data engineer at Zorgon, a Dutch company specialising in healthcare information management, and as a supplier of security and safety services for Google. I also run a Machine Learning blog at MLWave.com.

I participate in Machine Learning Challenges mostly to play around with datasets and learn about (practical) machine learning. This is a high profile challenge with top-quality contests and has a great data set. The idea that you are working on real data of the Higgs boson is fascinating and humbling.

Technique My approach is to get a high score as fast as possible, I call it ‘automated ML’. It’s a near-black box approach to classification problems, in regards to domain expertise and algorithms used. My favourite tools are Vowpal Wabbit and Sofia-ML, both multi-purpose tools for fast learning over large datasets. For this challenge, we are looking at and evaluating all the tools we can get our hands on and trying out many different approaches.

Abhishek Thakur from India

Team: T.A.G.

Data Scientist at mbr-targeting in Berlin. Originally from India but I live in Germany. I’ve been interested in image processing and computer vision since my undergraduate years and started with identification of cancer cells using Random Forest during an internship in the UK. I’m interested in object recognition, pattern matching and machine learning.

Participating in Machine Learning Challenges is my favourite pastime. This challenge is special because active research is still going on in this field and the chance for the winning solution to be applied to real data. It’s a great way to exchange knowledge and ideas on forums, and if you win and your code is used, wouldn't it be cool to say that you also contributed a tiny bit toward the Higgs Boson search!

Technique Mainly using Python and Scikit-Learn. Also refer to open source libraries to develop modules specific to this competition. It’s too soon to talk about the exact approach but we haven't used any feature engineering or complicated models.