— March, 3 2019

10 Best Practices in Data Strategy

This post is a brief introduction aimed at helping you have a better understanding of what you should absolutely, definitely know regarding data.

10 Best Practices Data Strategy

So, you’re working in marketing/product and your new boss tells you “Hey, I’ve tried looking somewhere for our data strategy, but can’t find anything. Could you help me put everything together?

If you’re lost regarding data, you might be thinking something along the lines of: “D..Data? What is that? Oh jeez, what is (s)he talking about? Ah, maybe they just want my Excel files!

This post is a brief introduction aimed at helping you have a better understanding of what you should absolutely, definitely know regarding data.

Data is just another piece of information, that’s it. It could be in the cloud, your external hard-drive, or your desktop. I’m sure you’ve already manipulated (tons of) Excel files: well, congrats, that means you’re already pretty familiar with data.

Data is a mindset

Something I’d really like to emphasize is that DATA IS A MINDSET.

You don’t need coding skills to get that mindset. And with it, you can work wonders.

Always think like this:


The moment you understand WHY you need (specific) data, you’ll be able to think about how you can manage your project and finally find out WHAT data you need.

When you think about data, always keep in mind that it’s there to answer questions:

  • How long does it take for our customers to churn?”
  • Why do customers shift to a freemium alternative?

You absolutely must NOT do it the other way around: “So, we have tons of data in a big data warehouse, what do we do with it?” It’s a trap, don’t fall into it.


So, data is a mindset, one that should be shared with the whole organization across teams. This is what we call being data-driven.

Really, working with data has more to do with processes than tech.

You might have experienced frustrating team communication: after all, if you’re in marketing, it’s oftentimes hard to know what the tech team is doing, if the sales team is aligned with both teams, and if customer support is doing the right thing.

Transversality and data are very intertwined, even though they’re rarely seen as such because data is not often shared as a company value. But that’s what you should be aiming for when you onboard everyone.

There’s a very good book written by John Doer - Measure What Matters - that describes the Objectives and Key Results method. This method should help you propagate a data-driven culture, without doing anything more tech-related.


Know how to ♥️ KPIs

When working with data, key performance indicators are required. Unfortunately, you’re probably already fed up with KPIs. But it’s pretty useless to work on data without good indicators.

Remember that data is here to answer questions. Well, how do you keep track of the number of customers churning… without KPIs?

The idea is not to track everything. A better idea would be to assign KPIs to teams and people. The OKR methodology I mentioned works really great with KPIs, because it doesn’t force you to focus solely on KPIs but rather on the overall and long-term objective (mostly because objectives are set for at least a quarter).

The problem with KPIs is that they don’t allow you to communicate objectives very well. I mean: “Our goal is to increase revenue by 5% this month”. Ok, cool. But there’s no sense of purpose or mission, there’s no “why”.

You don’t get anyone excited by just telling people to “reach for an amazing objective”. There may be some people driven by numbers, but even they’ll get tired of that in the end.

And then there’s the difference between good KPIs and vanity KPIs. There’s kind of a mental barrier here at play: if you see that you have 10K users, you’ll feel great. However, if you look at the Daily Average Users (DAU) and you only see 100 … there’s a good chance you’ll be disappointed.

Depending on your business, it is critical to correctly assess what to track. For example, if you’re in the SaaS industry, a good start would be to check you MRR, Churn and LTV, because you definitely can’t go wrong with those.

Smart is the new big (data)

Everybody is briefed and (almost) data-driven, and you know pretty much where you’re headed thanks to clear objectives and KPIs.

Now, there are a few things you should know:

  • You don’t need Big Data
  • You don’t need AI

Keep. It. Simple.

You’ve probably read great articles about how “Big Data and AI are revolutionizing the X industry”. But in 2019, there’s still not too many companies doing ACTUAL Big Data or AI.

And so we’ve got a cloudy vision of data, and it is perhaps even the reason why you think data is not accessible. That’s a thing of the past.

You need processes, not technology.

A good example is one of our corporate customers: they wanted to extract market data to analyze their US market share. We’ll, we’re just extracting data from 20 websites, with less than 1M items per month.

That’s enough to power up a strong internal analytical tool without having to dive into a complex machine learning algorithm.

What you really need are metadata, not data. Think about it: you have 10 000 PDFs. Ok, great. What do they say? What are they about? Hmm, pretty useless.

Now, you have a simple Excel file with 10K rows referencing each PDF by category, edition, author, etc. Wow, your PDFs have taken on so much more value!

I mean, you could do a web platform that allows users to search for specific content in your valuable PDF collection, right? (How awesome! I know, I know ...).

What I’m trying to say is that no matter the volume of data we’re talking about, if you can’t make sense of it, then it’s useless.

This is not the perfect example because in this case, you probably would have had to classify each PDF using Machine Learning (AI). But that would not be the case if the PDFs were correctly saved and categorized in the first place.

So always remember this: SEMANTIC wins.

Semantic Web is something defined by the W3C as “a Web of data — of dates and titles and part numbers and chemical properties and any other data one might conceive”. There’s actually a lot more under the hood: Google is doing it with the Knowledge Graph; Facebook is too; and they are not the only ones.

For example, schema.org notation and guidelines are trying to set clear rules about how to deal with semantic in a website. It’s still not widely used but I’d say we’ll get there someday (at least if the Internet remains free as a bird ...).

Data Person & Strategy

I’m not even talking about having a CDO or whatever. The person accountable for data doesn’t have to be (shouldn’t be?) a techy. Best case scenario, your product person will handle this.

Why? Because they’re generally the key between marketing, sales and tech. They already know how to navigate between teams and HAVE to make data-driven decisions based on product behavior.

They don’t need to have tech knowledge, but they absolutely need to understand how the company ties its value proposition to data. Regardless of product features, this has to do with how the company serves its customers.

If you’re selling a CRM, the value you provide is how you handle relationships. You need to have a deep understanding of where the pain points occur: is it email, auto-management, smart enrichment?

Depending on the CRM’s goal, say smart data enrichment, you’ll focus on specific data to best serve customers: you need to have a clear understanding of which type of data you provide (B2B business), where you collect it, how you do it and so on (you already know why, it’s your specialty 😀!). This will probably also give you a clear understanding of your competitive edge.

Take pagesjaunes.fr for example. Their data strategy is somewhat … inexistant. I mean, there’s no way to easily extract data from the service (even by paying for it). When your main business is to provide business data, you must provide users a way to gather this data.

Anyway, we did it for them with our Pages Jaunes Search bot 🙄.


Obviously, in the GDPR era, one should care about security. I’m not even talking about securing passwords and so on (how’s that post-it on your keyboard?).

The more you handle data across teams, the more you should pay attention to how you distribute it and who has access to what: developers don’t need to have access to detailed customer data or sales reports.

It’s a pain, but it’ll force you to be more consistent when designing how data flows across systems. But don’t leave this entirely in the hands of your legal department or external DPO. If you don’t control it by design, you never really will.

Data Analytics

There’s often a misconception between data analysts and data scientists. These days, everybody is one or the other. The problem with data jobs is that the line can be pretty thin, because resources are scarce and managers do not necessarily have the correct knowledge.

All in all, it has to do with statistics. But you’ll never ask an analyst to build a predictive model to try and determine the number of customers churning the next month. That’s the job of the data scientist.

Depending on your organization’s level of “data-driveness” (did I just make up that word?), an analyst might know how to code, or not.

The job of the analyst is to be able to draw conclusion from multiple datasets to answer business questions. That’s it.

You don’t ask them to build and optimize models or collect and aggregate data, that’s the job of the data scientist. Of course, you can and should ask for their recommendations, because they’re generally more conscious of the business goals.

If you’ve paid attention to what I’m saying, you could even be thinking “so in the end … everyone is a data analyst!”. Well, yes and no :)

Everyone should be able to easily connect the dots for specific datasets regarding their job. If a product manager can’t analyze user funnels and usage metrics, it’s a problem, because they won’t be able to understand why the product is under-performing and how to enhance it.

A good way to summarize data analytics would be to picture it as a quartet:

I present you Data Analytics Quartet

  • 👨🏻‍🔧 Engineers for your architecture and the technical aspects (data architect, scientists, etc.)
  • 💾 The architecture :)
  • 👩🏻‍💻 Business Analysts
  • 🛠 Tools to work that data

Single Source of Truth

This is more of a tip than anything else. I talked about how your organization should be data-driven. Well, this topic kind of covers it all.

I’ll quote Wikipedia here: “In information systems design and theory, single source of truth is the practice of structuring information models and associated data schema such that every data element is stored exactly once”.

What the hell does that mean? Well, imagine that you’re a support agent working on social networks such as Twitter. A customer, @HooverBuyer, pings your company “My hoover does not work #notWhatItUsedToBe @Hoover”.

Thing is, if you haven’t designed your customer technology stack with the idea that multiple people from different departments and teams might need to access specific data about your customers… your agents will lose a lot of time.

In this particular case, if the Twitter agent has to ask customer service for data/intel and wait for it, it’ll make their life a living hell. At scale (think every team), it can become a nightmare.

If this sounds at all familiar, you have a serious problem because data isn’t flowing correctly. Most companies with this problem actually don’t realize it until they’re faced with someone telling them.

Single source of truth does not necessarily apply to customers only. Another, more technical, example might be logging (lines of text that explain why an application is not working) or even banking systems that need to correctly aggregate data into a golden source.

Web Data

Whether you’re working in a big corporation or in a startup, internal data in never enough. You pretty much always need to enrich that data with what you could find on the web.

I mean, that’s why we created Captain Data: to give anyone easy access to web data. This is what we call web scraping: a technique to automatically extract data from websites.

You can read about web scraping here or in our blog.

Working with web data can be a bit tedious. You have to target specific websites, analyze how to aggregate data from multiple sources AND take into account your internal stack & need.

And you’re often tempted to “crawl” everything: to extract a lot of content you’ll use later by analyzing it. Again, you probably don’t need that much data - try to stick it to a reasonable amount of websites, not thousands of them.


An API is an “Application Programming Interface”. In a world of data, APIs are your friends.

They’re just a must-have. You can connect an API to any third party service and developers are very used to them. You could translate an API as a direct interface to your data (where you have total control).

For example, when you’re using a tool that connects to Stripe and you ask it “How many clients do I have with over 1K recurring revenue per month (MRR) in the last 6 months?”, it uses an API to ask Stripe that kind of data.

At Captain Data, we’re building APIs on the fly. Because 99% of websites do not offer an easy way to access their data, we’re providing companies with tools and pre-made bots to extract such data in real-time.


In a world of increased complexity, technology has made it … easier. You don’t have to start from scratch every time. This is definitely not an exhaustive list, but it should give you a pretty good overview.

These are mostly non-technical tools.

Product & Marketing

Warehouse & connectors

Visualization & Dashboard

Data Extraction