How to enrich and enhance a business database's quality
The web is full of data about companies: articles written on specialized websites, social networks, directories of companies, showcase websites, etc.
However, extracting and processing data is a complex process that requires expertise. It's in this context that one of our client; a french tech startup operating in the open innovation market, uses our know-how.
Given the fact that our client has a an unqualified, low volume database, the objective of the project is to harvest recurrently, at least once a week, data on innovative companies to enrich its in-house business database.
The first step in our work is to understand the company's value proposition as well as how it works under the hood. This way, it's easier for us to identify and match the extraction and enrichment issues.
The second step consists in making an inventory of the existing architecture and/or data requirements :
- What's the current volume and the targeted one
- We determine the existing data model and if it requires any update
- What's the minimum quality level required regarding company data?
- How does the internal data integration process work?
Selecting and aggregating sources
Each data source (each site) is not equal in terms of quality. Some websites provide a better description than others. Some information might be missing on one particular site, etc. We need to work in depth on the sources' selection in order to decide how to aggregate the data.
We analyze all the sites one after the other and then issue a recommendation based on the inventory previously made. Our goal is to build and validate the entire process, from data collection to data aggregation, to ensure the highest quality of data.
Data enrichment as an innovation lever
Given our client's activity, it is essential to strengthen its database's quality. It allows to better screen companies with a better understanding level.
Indeed, open innovation (our client's activity) consists in:
- Monitoring innovation to enhance agility and innovation's initiatives inside companies
- Building long lasting relationships between innovative companies and SMBs / Corporates
The project results in a daily extraction of business data on a dozen directories.
Thanks to this enrichment's project, our client has seen its database double in quality AND in volume.