Company Clustering

For this task we adapted the web-mining methodology and analysis proposed by Papagiannidis, See-To [1].

The data analysis used the processed text and meta-information to identify the business activities of each company and then clustered them into “topics”. A topic is a collection of terms describing what the activity is. In this study, “topics” for documents corresponds to “clusters” for companies. This is similar to how SIC codes describe the primary activities of a company. We used the Latent Dirichlet Allocation (LDA) for the clustering. Latent Dirichlet Allocation is an unsupervised graphical model that can automatically discover latent topics in unlabelled data. This approach has been thoroughly explained in the paper by Blei et al. [2], and also in Griffiths and Steyvers [3]. Through applying LDA, each document is modelled as a mixture of K latent topics, where each topic, k, is a multinomial distribution φk over a V-word vocabulary. Given an input corpus W, the LDA learning process consists of calculating Φ, a maximum-likelihood estimate of model parameters. Given this model, we could infer topic distributions for arbitrary documents. For practical consideration, the topic with highest probability could be assigned to a document accordingly.

Company clustering

The data analysis was used to identify the activity of companies and cluster those companies into groups. This analysis enabled us to assign each company into the cluster which best describes their profile. The table below presents cluster IDs, keywords describing company activities and the number of companies belonging to each cluster.

By the cluster size, companies can fall into three broad categories. The category with the biggest clusters includes Clusters 1, 2, 4 and 7. Cluster 1 consists of 76 companies, which service manufacturing and retail industries by deploying tools enabling network analytics for greater business value and reduced costs. The tools include, but are not limited to, edge IoT platforms, machine vision and learning devices, smart cameras, delivering operational efficiency. The biggest is Cluster 2, consisting of 102 companies. The main activity of these companies concerns the production and deployment of IoT driven by radio frequency identification systems (RFID), which are applied in different sectors. Those are the systems making it possible to identify objects and record data through radio waves. Cluster 4 incorporates 43 companies that are engaged in the development of information connected technology in line with cyber-security standards. Also, they produce solutions for smart building and solid-state battery technology powering wireless sensors in the industrial environment. 42 companies represent the 7th cluster, offering engineering, consultancy and strategic solutions aimed at improving indoor conditions, such as building safety and lighting in rooms.

Clusters 3, 5, 6, 9-12 embrace fewer companies. Particularly, Cluster 3 consists of 18 companies, whose activity is represented by the combination of keywords relating to wireless technology and transmitters for monitoring environmental indicators. For example, Tempus technology or other wireless devices equipped with features and temperature probes enable the accurate sensing and measurement of environmental conditions (e.g. temperature ranges) for a comfortable life in private and public settings. Cluster 5 is represented by 25 companies that are primarily involved in the design, development, manufacturing and/or distribution of Bluetooth-enabled products. Companies like LM technologies offer low-energy Bluetooth modules, which can be applied in different industries and for different tasks, such as remote monitoring of vehicles, factory monitoring, car diagnostics, wireless printing and others. Cluster 6 is represented by 14 companies, which provide solutions for acquiring and logging data about water status in a real-time mode. Such solutions include sensors, meters, loggers, displays and smart scanners, measuring and informing on water levels, water flow and pressure. 20 companies fall into the 9th cluster. Those companies offer security control systems, network and infrastructure security solutions, internet access services, different wireless technologies and services on wireless network testing. Cluster 10 includes 18 companies, which are involved in delivering solutions for health-related purposes, including the development of devices for health monitoring, as well as the testing and certification of products and materials. Cluster 11 has 14 companies, offering value-added technology for addressing business and technical challenges. For example, through the integration of intelligence into customer service systems, companies enable the semi-automation of communication processes (e.g. predictive phone dialing) and higher quality of customer support. 11 companies form Cluster 12 develop and distribute to the global markets sensor technology and software, such as carbon dioxide sensors and software for managing smart homes.

Clusters 8, 13 – 15 fall into the category with the fewest number of companies. Cluster 8 comprises nine companies. The companies seem to be engaged in the research into and development of smart industrial and urban infrastructures. They integrate automation in smart grids to ensure power quality and reliability, as well as use IoT to transform electricity future. Four companies from Cluster 13 focus on products (e.g. storage and computing devices, servers and integrated systems) and services which help companies fast-track innovation opportunities, address digital transformation challenges and manage infrastructure in organisations across the healthcare, financial, industrial, transport, consumer, public and other sectors. The 14th cluster is formed by six companies, which benefit business by providing unified communication services, cloud-based portfolio analytics and asset pricing services. The companies provide expertise in implementing security solutions for commercial and residential areas. Seven companies referring to Cluster 15 are engaged with or partly involved in the delivery of reliable, safe and secure cloud solutions and data management, applied for improving the performance of business. The solutions include, but are not limited to, cloud security services, cloud computing, cloud migrations, cloud storage, operations management and data analytics, which aim to protect workloads and services, meet clients’ business needs and improve their competitiveness.

Cluster ID


Cluster size


network, analytics, value, manufacturing, retail, share, channel, touch, latest, director



site, enterprise, driver, process, rfid, sector, tracking, expert, location, info



wireless, transmitter, vehicle, range, communication, life, feature, tempus, insurance, vitality



system, internetofthings, information, energy, battery, industrial, connectivity, standard, connected, download



lm, product, team, mobile, design, monitoring, module, bluetooth, wifi, building



data, logger, meter, , temperature, pressure, group, flow, display, water



smart, work, engineering, room, consultancy, artificialintelligence, safety, medium, accessory, strategy



contact, power, office, asset, integration, risk, quality, sale, multi, research



service, security, control, device, access, resource, key, people, infrastructure, iota



company, project, client, health, job, manager, operation, delivery, payment, machine



support, view, user, automation, phone, kit, model, integrated, many, facility



sensor, technology, development, software, case, experience, market, world, cost, city



business, management, new, high, technical, cyber, financial, space, manage, organisation



industry, digital, report, benefit, working, commercial, statpro, investor, utility, construction



solution, application, partner, cloud, customer, platform, testing, secure, innovation, developer



1. Papagiannidis, S., et al., Identifying industrial clusters with a novel big-data methodology: Are SIC codes (not) fit for purpose in the Internet age? Computers & Operations Research, 2018. 98: p. 355-366.
2. Blei, D.M., A.Y. Ng, and M.I. Jordan, Latent dirichlet allocation. Journal of machine Learning research, 2003: p. 993-1022.
3. Griffiths, T.L. and M. Steyvers, Finding scientific topics., 2004, National Academy of Sciences. p. 5228-5235.