Commenting on his paper, Kyle Findlay, Director Global Data Science Team, Innovation, Kantar said “Everyone talks about 'big data' like some magical panacea but when the chips are down who really has it in them to come to the party; and who gets left behind?
Having bootstrapped a data science team within a large research organisation, I've faced many challenges while setting up an effective data science function. Most of the challenges are around mindset. Traditional corporates just don't think like agile startups (no surprises there!). But, given this mindset difference, how does one even go about embedding a business unit that no one had even heard of five years ago? How does one even begin changing a decades-old paradigm of how business is conducted? And, how does one do this without stepping on the toes of all the established, siloed business functions that instead need to be brought along for the ride; all while assuring them that they will only increase their business relevance and impact in the process rather than make themselves redundant?
All these questions and more rear their inconvenient little heads when a large organisation makes the decision to invest in 'big data' and data science to become a more data-driven organisation.
The paper goes into more detail on the mindset shifts that need to occur in order for organisations to effectively leverage emerging data sources and tools. It is aimed at organisations who deal with data (i.e. most organisations, even if they don’t realise it) and are faced with the challenges posed by ‘big data’ (which I use as a bit of a catch-all to refer to data science, machine learning and artificial intelligence). The paper is based on our own experiences as a data science team in a large organisation. It is a thought piece about the paradigm shifts we've had to make (and continue to make) in order to become a more data-driven organisation. It is focused on the market research industry but readers from other industries might see some parallels with their own experiences. It briefly discusses the industry dynamics precipitating these changes and then discusses the following areas where our paradigm has/needs to shift.” Said Kyle Findlay.
Here's the crib notes version of each mindset shift:
From stats and marketing science (and 'business intelligence') to data science: Our industry has built up a paradigm around the way that we conceptualise data. That paradigm is exemplified by SPSS and its rows and columns matrix structure. We need to start thinking about data in the way that technology companies do, with a focus on databases, standards and non-matrix-style data such as JSON. Changing how we think about data opens up an entire vista of new ways of using that data.
From ad hoc projects to repeatable data pipelines at scale: When approaching new data products, we shouldn’t focus on ways of repeating the same product from scratch every time a new client comes along. Instead, we need to approach new products from the beginning with the full end-to-end pipeline in mind so that we can build it once and benefit from automation and scalability. Where is the data coming from? What processing needs to be applied to it? And, how are we delivering the outputs in an automated, perhaps even self-service, manner? This is a technology and development discussion that also involves operations, client-facing and technical folks.
From monolithic to agile product development: Gone are the days of planning for six months and then spending a couple of years building something that is no longer fit for purpose by the time it’s complete because the world has moved on. Instead, we need to take a leaf out of the world of software development by adopting agile product development methodologies that allow for the creation of minimum viable products (MVP), get feedback on those MVPs and fail fast (if necessary) or continue to tailor the solution to internal and external client needs.
From centralised technology control to decentralised technology empowerment: Technology continues to creep into an ever-broader variety of business functions, and more and more teams have the technical know-how to leverage it. Therefore, technology is no longer the sole purview of operations, who centralises all development, deployment and maintenance. Instead, technology decisions need to be made on the basis of how they can empower a broad, decentralised range of teams in a scalable way that still allows for some structure. Thus, it’s important to focus on the ‘plumbing’ of your organisation’s infrastructure that teams can build on top of in a non-restrictive way. It’s about empowering rather than controlling.
From operations to dev ops: Operations has always played a central role in ensuring that an organisation, well, operates. That used to be about keeping your email inbox clear, protecting your sensitive data and ensuring that other systems run smoothly. Their role has expanded now though in vital ways: they are the key to empowering an organisation through technology. As such, they need a clear mandate around building and maintain the plumbing that empowers a technology-driven organisation. For example, they need to play a role in defining the data and API standards within the organisation, empower colleagues through effective collaboration platforms such as Slack and Microsoft Teams, empower technical teams through access to cloud infrastructure, sandbox environments and continuous integration pipelines, amongst other things. This is the plumbing that teams can build on.
From silos vs cross-disciplinary collaboration: As more teams become technically empowered and as sexy “big data” approaches become more mainstream and core to data products, collaboration across siloes becomes more and more important. In my presentation, I used the analogy of the A-Team. Effective data products require a range of specialists to come together to tackle a task, including operations, developers, data scientists and client-facing colleagues who know the business issue.
When to outsource technology requirements versus building yourself? This is a perennial conversation in many industries. Traditional wisdom says that if it’s not your core business, let others do it. This used to hold for anything technology related. Unfortunately (or excitingly, depending on your stance) though, business is technology. It is no longer possible to wash your hands of all technology involvement. As already discussed, I’d recommend ensuring that you focus on getting your own technological plumbing in place and then mix and match what you buy from third parties and what you build yourself on top of that. Related to this, many third parties will say that their field is too complex to dabble in part time so you should let them provide those abilities for you (artificial intelligence companies come to mind here). I don’t think this is a zero-sum game though. For example, in the field of market research, if we were to outsource everything technical such as AI applications, we’d never build up expertise and familiarity in these areas ourselves. In addition, when we rely on third parties to do it for us, we introduce many bureaucratic, communication and integration ‘friction costs’ that are often not accounted for up front. These ensure that internal stakeholders are not able to access these products in a scalable way, which prevents us from building expertise in these areas or selling them effectively. As a result, I strongly believe in providing basic access to such approaches internally and only ‘upselling’ to formal products when it makes sense by providing a ‘glide path’ for internal colleagues to follow along this chain.
When to use commercial and open source technologies? This is a false dichotomy. Use the best tool for the job. Often this is open source because it is free and cutting edge; other times it’s a commercial solution that offers best-in-class quality outputs, support and integrates well with your existing systems. The important thing to remember here is that whatever you use should plug into your data system at the ‘plumbing’ level. In other words, as a third party provider, I don’t care how sexy your dashboard is; if it doesn’t speak to my other systems at an API/service level so that I can easily integrate it into my larger data ecosystem, you are just introducing unnecessary friction to my data product pipeline.
“That’s the short-hand version of the paper. Hopefully there are some interesting talking points in there. The reality is that we all leverage technology to operate in the modern business world. It’s up to us to ensure that we have our technology house in order as no-one else is going to do it for you.”