A VIEW FROM INSIDE THE DEMAND SPHERE: AGILE DATA SCIENCE

For third post in our View from Inside the Demand Sphere series I thought we would step away from the nitty gritty technical issues a bit and take a look at what appears to be an open question across the data science community: what is the best way to manage a data science team? 

There seems to be a growing consensus that data science teams need to be made up of people with a mix of complimentary technical skills (possibly a reflection of the inter-disciplinary origins of data scientists themselves), but how to best manage those people once they’re on board is an open question?

Gi recently closed an investment round and the change in circumstance has taken us to an inflexion point in how we manage and approach data problems. Before the investment, we ran our data science/analytics tasks as engineering tasks. We voted story points on them, assigned a definition of done and broke them down in exactly the same way as we would an engineering task. We had to work this way to get output from each task and get something in front of a customer that we and they could learn from and they would pay for. The tension between exploring ideas and getting output in front of a customer was an ever-present, but even in this tightly bounded and scoped model we learnt lots about how to approach a data problem (as well as what our business was actually going to look like).

We built a loose process for data problems within this framework, which consisted of defining a clear definition of done and a hypothesis to test, an exploratory data analysis (both an automated and a manual – ie eyeball based – approach), followed by an iterative approach to solving the problem. In practice, that means starting simple, regular team reviews and discussions, spotting and avoiding cans-of-worms and adhering to the 80:20 rule.

This approach, although it sounds restrictive, proved very successful at delivering an outcome to data tasks and getting those outcomes deployed to the live system for our customers to work with. Believe or not, I’ve interviewed a lot of data science people who want more structure in their work, not less. Stories of legions of data scientists fiddling away at pet problems in the latest #trending language are not uncommon. One example of where the approach yielded great results was in our S2DS_School project. As part of the summer school, we took on four ex-PhDs and threw them in at the deep end of a real data problem. We wanted the work to yield as much value as possible for Gi and to give the team the most ‘realistic’ experience of what it is like to work in a data driven startup, so we left it as late as possible to decide on the problem (so it was as relevant to our customers as possible) and managed the work in just the same way as our other data tasks.

In the relatively short time they were with us (described here in more detail) the S2DS guys generated a tangible output which added genuine business value to our product. That’s no mean feat given that they were coming in from a standing start. I have seen lots of commercial data work fail because the objectives were too vague. Although we are still customer focussed, now we have investment we are in a position to start to shift the balance towards more exploration and less tightly bounded and closely data work. The challenge we face is how to shift this balance whilst still maintaining a productive team (that is growing rapidly): ensuring that we still deliver value to our customers and contribute to the product we want to build.

How do we do this?

In reality, I expect we’ll undergo an incremental transition to a different, but related way of working: many of the agile software development principles will still apply. After all, software development is on a long journey from waterfall to more agile approaches, and we would be foolish to discard all of the lessons that have been (often painfully) learnt. Some aspects of agile work won’t apply as much – eg test driven development: how do you write tests when you might not know what question you’re asking? But some of what we have retained – stand ups, definitions of done – could be.  Definitions of done might have to be looser to gain value from the margins rather than just the low hanging fruit. Task breakdowns will be more uncertain as fewer of the unknowns can be de-risked upfront with lightning tasks. We’re not the only ones looking at this problem – it looks like this is a conversation going on across the data science community.

As the exploration aspect of the work becomes a greater and greater proportion of the whole, it will increasingly be up to the individuals to manage their own work and know when to back off and when to dive in. Communication skills, both within the team and to external stakeholders, will be more important than ever, which is why we place very high value on communication skills during as part of our recruitment process at Gi.

We’ll post again as our thoughts and processes progress. In the meantime I’d love to hear from others who face a similar challenge.