Who can participate ?
Everybody except for LAL employees and members of the committees ! Students, computer scientists, physicists with all possible backgrounds just have to create a Kaggle account and sign the standard "terms and agreements" specifying, in particular, under the conditions they can copy and use the ATLAS data file.
By the way, the file will not remain available after the end of the challenge.
Which kind of scientific outcome do we expect ?
Up to now, the way things worked has been that physicists read the papers of machine learning scientists and tried (or not) new algorithms on their data. With this challenge, machine learning scientists will be able to directly test their new algorithms on the data, which seems to be a much more efficient way to find the best methods. This also opens the possibility for just anyone with an interest in data science to participate. The "HEP meets ML" Award winner will be invited at CERN to discuss his/her approach with physicists. Depending on the participation and scientific outcome, results can lead to publications in 2015. Typical questions and reactions of the people we talked to so far:
- Can we improve the Higgs signal significance by using more advanced algorithms ?
- Will the winners be data scientist with sophisticated algorithms with no knowledge of physics, or High Energy Physics using e.g. well known Boost Decision Tree on clever combination of variables, or ... ? Make your own bet...
- What is the impact of realistic conditions (full simulation, limited statistics) compared to open source simplified sets ?
- Can this exercise help to compare methods used in HEP (BDTs versus NN for example) ?
What about real data ?
Real ATLAS data are provided to students for International master classes: they learn how one can recognize particle types, measure their energy, and how one combines them to do physics. But for ML training and scoring we need to know the true origin of the event, signal or background, which is only possible if we use simulations. Moreover, in the real analysis, complicated procedures are used to estimate the background from real data but they are quite complicated to explain and to put in place.
If I'm doing well at the challenge, does that mean I could do physics for real ?
It is a good start. But doing a physics analysis is much more than optimizing the signal significance. Physics students learn doing their first months of PhD that most of their time may be spent on evaluating or reducing the systematic uncertainty on their results. This means quantifying the known unknowns. A fascinating topic, but we've left it deliberately aside for the sake of simplicity.
Can I use the data from the challenge for a master's thesis ?
The rules one has to agree to when participating to the challenge, clearly states that the use of the data should be limited to the Challenge, within the lifetime of the Challenge. But it is perfectly OK to have tutorials, courses, etc... include participation to the Challenge.