The goal of the challenge is to develop machine learning models that can predict survival in breast cancer. We've been given access to a remotely-hosted R system on which to develop our models, and (on said system) use of molecular and clinical data from the Metabric study of breast cancer. We run our models on this remote system and they're scored using concordance index, a nonparametric statistic for survival analysis that is sensitive to the ranking of predictions.
What makes this challenge a bit different is that it's both competitive and also collaborative. Not only are we competing against one another to get the best-performing model, but once someone has submitted a model, I can download it and inspect their code to see how it works! This is very ambitious (and certainly not without its issues), but aims to create a hybrid competition/crowdsourcing approach that can produce very strong solutions to the scientific goal of interest.
Having put a lot of hours into working on this challenge over the last few months, I have developed some opinions on it. So, in no particular order, here are my thoughts on the challenge:
- Incentives for academics. In addition to some small financial prizes along the way, the big prize on offer is co-authorship on a journal paper. This is a very big incentive for academics (such as myself) who want to compete in a challenge like this. I'm very enthusiastic about the whole concept and would love to join in with future challenges. However, in order to justify spending my work time on it, there needs to be some kind of academic return. Co-authorship fits the bill nicely! Currently, I think the top couple of teams (?) get this prize, but I think extending this would be a good plan. Certainly, my experience in this challenge is that there are many more than 2 academic teams who have contributed significantly to the success of the challenge.
- Incentives for non-academics. Of course, it's also important to have rewards on offer for non-academics. The real strength of such a challenge comes from having a diverse community of competitors. I presume the small financial prizes are nice in this regard; it'd be really interesting to hear from some of the non-academic competitors what their views are on this.
- Sharing of code. This has been a very innovative (and brave!) aspect of the challenge. I don't think the organisers quite nailed every aspect of it, but I think the general approach is very powerful and certainly worth persisting with. I wonder if the sharing should be constrained in some way - perhaps code can only be accessed 48 hours after it is submitted?
- Blitzing the leaderboard? In this challenge we could make as many submissions as we liked to the leaderboard (of which I'm as guilty as anyone :-) ). This worries me as it could lead to a lot of over-fitting. Maybe in future challenges there should be a limit - say 5 submissions per day?
- Challenge length. In total it was approx 3 months long. 2 - 3 months feels about right to me.
- Competitive vs. collaborative. Another research model that's relevant here is the Polymath Project. Essentially, one can imagine a sliding scale between competition and collaboration. Polymath lives at one end, with things like the Netflix Prize and Kaggle competitions at the other. This challenge lives somewhere in the middle. I like the idea of blending the two concepts.
- Populations of ideas vs. monoculture. A competition is great for generating a wide range of ideas. Once people start sharing, I expect (as happened in this challenge) the pool of ideas tends towards more of a monoculture.
- Building an ongoing community. This challenge has been a great way of starting up a research community (a smart mob :-) ). It would be great to harness this community in an ongoing basis.
Sharing code
Sharing code means sharing ideas, and this has allowed us to benefit from each others ideas during the challenge. I'm sure this has led to better overall results. However, it has also has some quirks that might need tweaking.First is the phenomenon of 'sniping'. Someone else can spend a month developing an awesome model, but once it's been submitted to the leaderboard I can download it straight away, spend 30 minutes applying my favourite tweak and then resubmit the (possibly improved) new model, jumping above the hardworking other competitor on the leaderboard. Of course, overall this leads to better models, which is the collective aim of the challenge. But I think care needs to be taken to ensure that credit (and reward in general) is given where it's due. It can be a bit dissatisfying when this happens to you!
The other consideration is that after a while of sharing models, we end up with a monoculture. Examinations of the high-ranking models over the last couple of weeks show that almost all the models are based on those of the Attractor Team (with some chunks of my own code scattered around, I was gratified to see!). This is probably not surprising, as the Attractor Team won both the monthly incremental prizes, but it's probably an indication that we've got about as far as we can with the challenge when this happens. Now is probably a good time to stop :-)
So, what would I change? I might suggest something a bit more like the following:
A possible model for future challenges
The 21st Century Scientist Speculative Future Challenge (21SFC) would look like this:
Stage 1 (initial competition) - a month long competition to top the leaderboard. No-one can access other people's code and at the end of the month, a prize is awarded on the basis of a held-out validation set. After the deadline, all code for Stage 1 is made available.
Stage 2 (competition/code sharing) - another month long competition to top a new leaderboard. Everyone has access to the Stage 1 models, but Stage 2 code is either unavailable or only accessible 48 hours after is has been submitted. At the end of the month, a prize is awarded on the basis of a held-out validation set.
(it might be worth re-randomising the training, test, validation sets for stage 2)
Stage 3 (collaboration) - A non-competitive stage. The aim here is to work as a team to pull together everything that has been learned, produce 1 (or a small number) of good, well-written models and to publish a paper of the results.
The author for the paper is "21SFC collaboration", with an alphabetised list of people given. There can be different ways to qualify for authorship:
- Placing in the top-n in either stage 1 or 2
- Making significant contributions in stage 3 (the criteria for this would need to be established)
--------
This post has turned into a long one and I hope I've communicated the intended positive tone. I enjoyed the Sage/DREAM BCC a great deal and I think this is a hugely powerful way of getting answers to scientific problems. I'm certainly going to take a look at whatever the next challenge is (I know there are some in the pipeline) and I would certainly recommend you doing the same.
Hi Rich,
ReplyDeleteAs one of the organizers of the challenge I first of all want to thank you for your thoughtful comments. This was the first time anyone at Sage has attempted something like this... running the challenge has felt a bit like building a plane while simultaneously trying to fly it which is both exciting and scary. We hope to take what we learned and run an even better challenge next time because despite all the things that didn't go ideally we seem to have lots of people engaged in the work.
On incentives we picked the single winner approach as the most straightforward way to motivate, but we fully agree that getting multiple groups recognition is part of what we want to do. Defining criteria for multiple authorship could be tricky... taking all participants might result in a extremely long author list and "diminish" the prize, especially if people make token contributions just to get on the list. The other alternatives I can think of are having the organizers make subjective judgements, having the contestants vote, or defining a few different criteria to win and get on the list.
Sharing code we agree was quite challenging to think through. We should have more technical capabilities in the future to implement suggestions like the one you made to give a time lag before code is public. We are also working on capabilities in Synapse to allow people more ability to reference each-others work.
Blitzing the leader-board was a bit of a concern for us due to overfitting, but conversely there's a lot of research showing the importance of rapid feedback in learning. We chose to error on the side of permitting over-fitting, with the second validation data set as a guard to catch the guilty.
Hi - great to hear from you!
ReplyDeleteI entirely take your point that it's very tricky to think of optimal ways to reward participation. I strongly suspect that there isn't an "ideal" system :-) I think the key point is to avoid situations where there are N prizes and N+1 teams who all put in huge effort and score within a fraction of a percent of one another on the leaderboard.
That said, I can also see a strong argument for keeping the system fairly simple, and of course diminishing the prize could certainly be a problem.
What about something like the following: prizes are awarded to the top-N teams on the leaderboard (with say N=3 or 5). These are scaled in some way according to place - e.g. for authorship, the leaderboard place determines the authorship order. There is then one additional rule, that you (the organisers) reserve the right to award additional "special mention" prizes if there is a team outside the winning places that you feel has contributed significantly - for example, if all the winning entries are using a big chunk of their code. "special mention" teams could then be included as extra authors for such exceptional cases.
Doing this, one could set the expectation that competitors shouldn't in any way count on this "special mention" prize, but it gives a bit of flexibility if an "N+1 into N" type of situation arises.
Hi Rich,
ReplyDeleteI think we are likely to invite the top two scoring teams to present at the DREAM conference, which is a bit along these lines.
Also like the way you are thinking. We hope to run more of these events in the future, and as we get better at managing the logistics around these challenges we hope to experiment with ways to make them more collaborative and less winner-take-all.
Mike
Hi Mike,
ReplyDeleteI think experimentation will be very interesting. There are lots of different ways one can imagine to organise these kinds of challenge; it'll be fascinating (and hopefully very informative) to see different ideas in action and find out how they work in practice!
Hi Rich,
ReplyDeleteThanks for your interesting entry.
We know from the previous DREAM experiences (I am the founder of DREAM) that the Wisdom of Crowds (the fact that if you aggregate predictions, the aggregate prediction is often better than the best individual prediction) works best when we integrate independent predictions.
The SAGE-DREAM challenge emphasizes competition and sharing via a commons platform that has the leaderboard and the ability of anybody to read everybody elses' code.
Interestingly, as you incisively point out in you monoculture comment, it may happen that the latter stifles true innovation rather than promoting it, in the sense that the teams tend to copycat the code of the winner. This, in turn, hampers the wisdom of crowds.
What new rule can be imposed to avoid that everybody copies the same code, and we are not trapped in a local maximum of performance? I believe that a credit system, a sort of free market of algorithms, can discourage frivolous copying, but the method that is copied gets more credits. There will then two awards: one for the best method in the leaderboard (objective performance), and one for having contributed the most to it (the teams that get the most credits). Of course there will be problems with this idea, but it doesn't sound so crazy in principle.
Thanks for your thought provoking piece.
Gustavo
I think the idea of a credit system is really interesting. It would potentially open up a range of other ways for people to meaningfully contribute to the challenge. I for one would really like something like this - I'd be very happy to build useful specific models/functions for other people to use, if I was getting credit for it :-)
ReplyDeleteAnd I agree that having two types of award - the leaderboard, and also credit-based contribution.
I think the key is that whatever metrics (credit, leaderboard position) are used need to map well to the "value" of given contributions. If this is the case, then the right kind of contributions are being incentivised. I imagine this could be quite tricky, but I could certainly also imagine it being possible!