Big Data-based Identification and Mapping of Temporal Dynamics of Industrial Clusters

Funder: Office of National Statistics

Collaborators: Prof Savvas Papagiannidis & Dr Eric See-To


This research aimed to develop an evidence based empirical framework for the identification and detection of evolution of industrial clusters over time. The framework aimed to explore how to transform times series of unstructured text descriptions of business activities using big data into industrial cluster dynamics. Combining data science techniques in topic modelling and web crawler technologies, we advanced the current best practices in four aspects: cost, objectivity, flexibility, and granularity. Clustering dynamic data can be generated on demand when policy demands arise. This is in contrast to the current practices of using firms' self-reported data in a predefined classification system, which is costly and time consuming to update. Using publicly available textual descriptions of business activities also offers a more direct inference about the underlying business activities, and the level of granularity is only bounded by the availability of data. The methodology contributes to the growing body of big data and business analytics research. It will provide the empirical foundations for advancing our understanding of the formation and dissolution of industrial clusters over time, and help advance the theoretical inquiry in the domain of economic geography.

If you need more information about this project please contact us.

Project Report: The report will be available once the project is over.

Interactive Report

Project Timeline

January 2020


November 2019
Visualisation and write up reports

We used PowerBi to visualise the clusters cross-sectionally and longitudinally.

September / October 2019
Analysis of data and refinements

We analysed the data longitudinaly exporting the clusters information and the list of companies per cluster.

June 2019
July Data Collection

6 regions were selected for the downloading. The download featured up to 10 pages per company, on a quarterly basis from 2000 to 2019.

May 2019
Draft data collection software and refined algorithm

The scale of the data collection and storage required adopting alternative to the originally-envisaged approaches.

April 2019
Downloaded sample

UK and Irish companies with Turnover > £1.5 million, Profits > £150,000, Shareholder Funds > £1.5 million. 232k companies in 22 regions.

April 2019
Project Commissioned

Project plan reviewed and milestones agreed.