These days, a large consumer goods company has almost the same strong data productivity as a medium sized Internet company. Management in more and more companies hopes to manage users and data in the same way as Internet companies, then make decisions based on data. Nevertheless, colossal and dispersed data stands in the way of IT, management and analysis departments of companies that are yearning for nimble and real-time data analysis. Consulting firms, which were playing a huge role, have now lost the power to pile up data and provide insight. They don’t have enough hands for this. We started working with a leading fashion consumer goods company in China to build its data platform. It took us only six months to put together the functions to gather and analyze data, consumer profiling, member systems and external data tracking and capturing. We would like to share relevant knowledge in these three fields in hopes of better streamlining operations powered by the force of data.
The term “big data” comes with the implication of a concept or mentality, rather than to describe some tool or technology. It can be understood as a general sum-up of specific algorithms, technologies and tools, including data mining, machine learning, natural language processing and distributed computing. Business Intelligence, or BI, has a much longer history than big data in the corporate world. Tech giants like IBM, Oracle, Microsoft, Informatica, SAP, Sybase and Teradata led a trend of adopting BI software, with waves of smaller players following suit. BI is also a general term that comes with a wide coverage of tools and technologies, such as data warehouse (or data mart), inquiry statement, data analysis, data mining, data backup and restoration, to name just a few. What features distinguish the two?
For a large consumer goods company, at least 10 IT systems are needed to support daily operations, and they are listed as follow: 1) A distribution system that sorts out goods distribution to thousands of storefronts and processes as many as 100,000 orders. 2) An e-commerce order system that operates order management and customer service (for companies like JD.com, Vipshop, Yihaodian, Jumei, Amazon, Dangling and Youzan). 3)Warehouse management system that arranges logistics nationwide and records inventory for thousands of SKUs. 4) BI system: In charge of data collection of every major business segment, and brings up the daily statistics chart. 5) Other finance, HR, performance management, brand/branch order system with huge daily data size.
The most common use of enterprise BI software is to integrate all IT systems for statistics and records for a better understanding of daily company operations and data through front-end charts and numbers generated and calculated by the system. Except for its ETL portion, BI software is easily generalized, good for cross-industrial use, and suits universal needs.
That is why the selling point of BI software is its function in monitoring data before coming up with reports and charts based on time, distribution, or segments. Chart 1 reflects the average price and sales change at this company over two years. It is easy to identify a general upward trend, with January and February being the slow months, while the end of each quarter picks up rather fast. Price barely changed over the past two years, being higher in the winter than in summer. Chart 2 is about sales distribution and the ratio of each brand owned by the company, which has a mainstream brand with a strong performance and a couple of sub-brands achieving impressive sales records, according to the graph. It is important for companies to pay close and regular attention to sales distribution charts so as to adjust resource allocation and company development strategies in a timely manner.
BI software is most helpful in collecting and garnering all business data to come up with visualized charts for long-term monitoring without further programming inputs. It can always stay updated in comparison to the static reports provided by traditional management consulting firms, which are basically useless after day 1.
Data engineers always understand BI software as a data analysis tool, which is a foundation for developing insights into this data. Data scientists can provide insights, intervention and industrial input into these numbers and come up with reports more sophisticated than BI ones. Higher level reports can be applied to product design, marketing plans, membership schemes and after sales service, so that data can have the driving force to propel growth of business, just like in Internet companies.
Chart 3 is a CDF curve (Cumulative Distribution Function) with X being days and Y being percentage. 37% of users will purchase the same thing again after they first bought it one month (30 days) ago. 45% of users chose to buy for a third time within one month after the second purchase. 51% of users had a fourth time purchase within one month after their last buy. The CDF curve shows a tendency to lean leftward with more and more purchases being made, showing that customers want to buy more frequently with established brand recognition. Hence, the best time to build brand awareness to attract new customers is the end of each quarter, namely each 3 to 4 months, while reconnecting with existing customers. 1-2 months as an interval is optimal. This is a typical case when data engineers have the power to look into industrial numbers with their knowledge and experience, which cannot be achieved by BI software. Re-purchase with intervals is a unique scenario in the sale of consumption goods. A more complicated and tailored statistical tool is required to dig into the data to find patterns and insights. Through statistics script-writing and multiple compounded SQL (Structured Query Language), data engineers proved the value of their manual work.
Other than complicated and highly customized statistics logistics, processing and exploring unstructured data is not an easy task for BI software. For large consumer goods companies, full e-commerce channel operation has become the new normal. Everyday companies like JD.com, Tmall, Vipshop, Yihaodian and Jumei have seen tens of thousands of orders swarming in, containing mammoth amounts of data including user location, identity, occupation, consumption power, etc. Companies can apply precise research on users via programming and map API (Application Programming Interface). Chart 4 is the heat map we drew based on shipping addresses of each order we received. It is easy to tell that a large proportion of users gathered around Zhongguan Village, followed by clusters of college dormitory buildings in Haidian District of Beijing. Red tabs on the map show the brick-and-mortar shops the brand owns, which have been covering Zhongguan Village, Peking University and Wudaokou area. The company has to rethink the Anzhenli store, where we don’t see much activity, and contemplate the possibilities of opening storefronts around Zhichun Road and Mudan Garden, where we see relatively busy traffic.
As valuable as the heat map is to companies, it’s not easily created by traditional consulting firms.
Even seemingly less important data that’s not necessarily covered by daily monitoring can provide great insights. Chart 5 shows the shopping time for people who place orders online. It is easily identifiable that almost every weekend, orders are distributed evenly across the entire day span, except for midnight. It gets more interesting when we are looking at the trend on workdays. The order number spiked from 9 to 10 in the morning, meaning a large group of office workers have to start the day with a purchase of their favorite items. So what’s the message to e-commerce companies? Send your marketing promotions from 8 to 10 each morning to grab attention.
Almost the same line of thinking holds with Chart 6 below, which shows the purchasing pattern according to each day of the week. Monday and Tuesday are always the busiest, with the weekend appearing quiet. What we can create from this pattern of online shopping, which is a great cure for “Monday Syndrome,” is a channel for office workers to relieve their pressure. Accordingly, the passion for shopping ebbs when the weekend is approaching.
Data from Chart 5 and 6 are usually not detectable by BI software, but can be marked down and interpreted into insights with commercial value by data engineers. Consulting firms are not powerful enough to integrate a sea of numbers and process daily, or even hourly, output.
Data science (bid data) and BI with a macro sense do not differ significantly from each other. They both cover a wide range of services with the core being business progress through data processing and analysis. The BI we talked about amongst our general conversation refers to the BI software/set provided by software makers for business charts and statistical monitoring, which are separate from work related to data science. BI software provides an abstract and convenient summary of data, statistics and visualization tools to cover a part of the job of data science. If we are looking to plough through to achieve analysis and insights at multiple levels with industrial importance, it is indispensable for data engineers to join the task force to set up specific data systems to complete the work.
Create user profiles and user systems
For product providers, as well as service providers, getting user profiles right is an important part of data mining. Many internet companies depend on having accurate and complete user profiles for survival. And we have heard about the miracles that can be gained from user analysis, and the success stories of companies that know how to use it. Amazon and Alibaba have been using machine learning to filter user browsing behavior, changes in shopping carts, and items previously purchased. They develop recommendation systems that advertise certain goods to certain customers based on the data collected to effectively boost click rates and sales. Another example is when app stores recommend applications according to past installation records; or when music, books or news websites use collaborative filtering to provide customized content.
By comparison, the scattergun approaches of market research and sample surveys traditionally used by consultancies look almost crude.
Compared to internet companies, consumer goods companies may have less mind-boggling amounts of user behavior data. But they still have a lot. Moreover, their user information and transaction data, which is distributed among various IT systems, is usually more dependable. Having collected, filtered, and analyzed the data, we found that it is of better quality than we thought and it is good enough to support the creation of some really interesting user profiles.
User data analysis has its roots in machine learning. Whatever purpose user data may be used for, such as customer segmentation or precision marketing, it needs to be processed first and represented in feature vectors on the same dimension. Many algorithms, such as clustering, regression, correlation, and classifiers, require a numerical representation of objects to facilitate processing and statistical analysis. For structured data, feature extraction usually starts with tagging: purchase channels, frequency of spending, age, gender, family information, etc. Clear tags help create more complete user profiles and make machine learning more effective (in terms of accuracy and convergence speed, for example).
We have selected dozens of tags for items from consumer goods companies. Figure 7 shows some of them. They come from 3 sources. The first source is IT systems. For example, membership cards information (gender, age, birthday), purchase channels, loyalty points, etc. The second source is calculation, or analysis, which gives you information such as how enthusiastic a user is about promotions. What’s their color/style preference? Do they usually stick to one brand or do they often try different brands? The third source of tags is inference. If a user’s address book includes words like “dorm,” “school,” or “university,” then it can be inferred that it is very likely that he or she is a student. Similarly, if the address mentions “Tencent Tower” or “technology park,” chances are he or she is white-collar, and is probably a technology professional. Tags under this category are often designed in a way that clearly reflects the industry of the products. For example, do they prefer the latest looks to classic (fashion)? Do they prefer cheaper or discounted products (price sensitivity)? Do they prefer expensive products or limited editions?
When the tags are created, the next step is to discretize the data, or split the tags into multiple 0/1 tags to make them suitable for machine learning algorithms. These can include clustering, classification, prediction, or correlation analysis, creating thousands of dimension vectors in the process.
Doesn’t it sound like the time for Excel, which is still used by consultancies, to go down?
Association rule learning
Association rule learning is a widely used machine-learning tool in the retail industry. A famous story about association rule mining is the “beer and diaper” story, which goes that supermarket shoppers who buy diapers also tend to buy beer. The anecdote has been proven to be no more than a myth—a case study invented for the purpose of illustrating the point. Nevertheless, the repeatedly told parable demonstrates the importance of association rule learning in the retail market. And if the story were told in China, it would be about “instant noodles and ham sausages.”
Contrary to shopping cart association rule learning, the basic unit in our data mining is the user. Eigenvectors are created based on the extracted user tags. The table below is a simple demonstration.
We created an NxM feature matrix. N is the number of users in the order of millions. M refers to the feature dimensions—approximately thousands of binary tags. Apriori  was used to determine association rules. The support threshold, the confidence coefficient, and the augmentation threshold were set to determine association rules that meet the requirements. As the identified association rules may concern user privacy, the table below is only a demonstration. The antecedent is the user location. The consequent is the highest promotion sensitivity. The results are as follows.
It is very clear that users in Shanghai, Jiangsu, and Zhejiang are most responsive to promotions. They have the highest participation rates. Their augmentations are all above 2. In particular, the augmentation in Shanghai is as high as 3.3.
Another example is color association rules. The table below shows users’ preference features for products of different colors and SKU. There are some pretty strong associations, such as between gold and silver, or brown and green. Customers who buy purple and beige goods are more likely to buy gold next. If the shop assistants and/or the online teams could use the information as the basis for decisions about marketing activities, such as color recommendation or distribution, their lives would be so much easier.
It is worth mentioning that in the process of association rule learning, the independence of the antecedent and the consequent must be guaranteed. While features are being extracted, some dimensions are identified from similar or related sections; for example, the zodiac sign and the birth month of the customer. Unless the independence is ensured, you may end up with some funny rules, such as “many people who were born in November are under the Scorpio sign.”
RFM, which stands for recency, frequency, and monetary, is a classic method for analyzing customer value. The three attributes are used for clustering and identifying valuable customers in order to improve business decision-making and marketing.
RFM has the virtue of simplicity. All that is needed is a table with the date of purchase and purchase value. For example, if we have the number of months that have passed since the customer last purchased, the number of purchases in the last X months, and the mean/total value of the orders from a given customer, we can create benchmarks for each of the three dimensions, which are then weighted using k-means clustering . By comparing the three dimensions of different groups of customers and the benchmarks, decisions are made as to which customers are to be retained and which are to be developed. Marketing strategies are used accordingly to boost re-purchase rates and conversion rates (through guiding or awakening). It is worth noting here that there are no standards for determining the weight to be given to each criterion. A popular technique is the analytic hierarchy process, or AHP. Industry and company characteristics are also taken into consideration for optimal results.
Figure 8 shows RFM-identified customer segments. The numbers of customers and their proportions are very clearly presented. The insights can also be included as tags into customer profiles and CRM so that the company can start to understand and target its best customers.
Figure 9 shows the distribution of different customer segments on various dimensions. The three customer groups, obviously, have very different recency, frequency, and monetary scores. They also vary much on some orthogonal characteristic dimensions.
For consumer goods companies, the ultimate goal of investing in data mining and user profiles is to boost performance. So translating the insights from data analysis into tangible results is vital. All the tags and association rules must eventually reach customers through one channel or another. It can be a powerful CRM system, in which customers with different tags are targeted differently. Or it can be a membership application that sends personalized promotional offers or the latest products to the client; or even self-supporting e-commerce platforms, such as tmall.com or JD.com, that generate and use data for themselves.
Use external data to understand trends
The thriving e-commerce platforms and social networks have allowed web crawlers and parsers to retrieve huge amounts of highly structured industry information from the internet. So apart from analyzing data generated from inside the organization, a company can also monitor and analyze data from major online platforms, which can be equally valuable and help the company truly understand the market. The crawlers’ ability to gather and store data also dwarves what consultancies have traditionally been capable of.
This article will share some of the ideas behind data mining at tmall.com. We have 5 months of data under one subcategory, which includes 5,000 brands, 7,000 online retailers, 24,000 items, 1 million SKU, and 26 million customer comments.
As shown in the next figure, tmall.com has already done a fairly good job at structuring each item. Take women’s apparel—there are nearly 20 properties. 24,000 items of different prices and sales are analyzed to determine industry trends; for example, the price range for different styles, which models are the most popular, and what colors of which brands are preferred.
Figure 10 shows the sales of several styles. Last year, Taobao’s Singles’ Day holiday and December 12 special offer day had clearly boosted sales of all styles, most noticeably the Korean style, which triumphed over the rest.
User decision semantics
“What are we talking about when we talk about our products?” is the question every brand wants to have answers to. Consumer goods companies and consultancies, limited by their capacities, can only use semi-consulting approaches, such as focus groups or questionnaires, to try to extract rules from extremely confined samples.
But in today’s digital era, things are much simpler. Back to the example of tmall.com: within only 5 months, 24 million user comments were collected. Each of them was specific for a certain item, with SKU-level granularity and a clear timestamp, allowing us to dig in and observe the users.
Figure 11 shows the time series of different occasions mentioned in the comments. Each occasion was assigned about 10 keywords. Then the words were segmented and a Chinese index was created. Daily data was generated extracting the occasion-specific key words. There are some very interesting insights. For example, the impact of the November 11 and December 12 shopping days set aside, the number of users who mentioned “wedding” reached a peak in mid-September, possibly because of the upcoming wedding season following the October 1 National Day holiday. There were also two peaks when “travel” was mentioned—in early August and early October, which meant that many people were gearing up for travelling during the summer holiday and the National Day holiday.
Tmall.com internally ranks each of its customers, from T0 to T4. T0 customers include entry-level customers who spend less. T4 customers are the top customers who spend much more than others much more frequently. Customer rankings are available on the comments page. We also ran analysis of the purchase occasions of customers in different ranks. Figure 12 and Figure 13 show the occasions discussed by T4 and T0 customers. It is very clear that T4 users talk much more about “wedding,” “travel,” and “driving” than T0 customers, whereas “shopping,” “students,” “office work,” and “commuting” are mentioned more often by T0 customers. Using this information, merchants can target different segments of customers accordingly.
Brand positioning and pricing strategies
User data from e-commerce platforms also allows merchants to know more about the brand positioning and pricing strategies of their competitors. Figure 14 shows the numbers of items owned by the top 5 brands of each of the major four groups in the industry. Only Brand No. 1 under Group B has a towering advantage over the other brands in the group. Among the other three groups, the respective top brand is followed by brands of similar weight.
Figure 15 shows the sales contributed by each brand. Compare it with Figure 14 and we can see that within Group A, the sales of each of its brands is proportional to the brand’s number of items. Within Group B, its Brand No. 1 is also the overwhelming contributor to sales. The number of items owned by each brand within Group C is similar, yet income is mainly from its Brand No. 1. This means that Group C’s multi-brand marketing strategy is not particularly successful.
In the past, it was next to impossible for consulting firms to lay their hands on their competitors’ data.
Even the most successful brands can sometimes get their target customers and, consequently, market positioning wrong. Figure 16 shows the number of items in each price range of each group. Figure 17 shows the sales of each price range. Group C stands out immediately, because although the majority of its products fall into the 200-400 yuan bracket, most of its sales are generated from the 400-600 bracket. That is to say, despite Group C’s own positioning as a high-street brand, its clientele simply like its premium products better.
Word-of-mouth monitoring and emotion analysis
Currently, public opinion monitoring technology in China is not as advanced as some people may think. In fact, we could not even find a suitable Chinese industry words segmentation tool or an emotion computing tool. So we built one ourselves using key words plus dependency grammar to extract meaning and analyze emotions. Compared to using SVM or neural networks to determine whether a comment is positive or negative, our approach, which is a combination of rules and machine learning, has more advantages. Apart from positive/negative analysis, it can also identify the subject of the discussion and its modifiers, which can then be used to run more detailed customer analysis.
Figure 18 shows the positive comments received by the four groups. Group B looks very strong on “value for money.” It clearly knows how to use cheap prices to create appeal. Group C, evidently, lags behind the others when it comes to “customer services.”
Figure 19 is the time series of negative comments. The number of complaints, which was usually stable, soared during the Singles’ Day shopping holiday. Most people complained about logistics and customer service. There were also more people complaining about damaged packages, receiving products of a different color than they wanted, or unsatisfactory color. But here, the increase was only slight.
In the digital era, consumer goods companies, like internet companies, use internally and externally generated data to create user profiles and membership systems to have more targeted and segmented production, operation, and sales. Their demand far exceeds what the human brain or Excel-powered traditional consultancies are able to do. Data technology is like a palantír in J.R.R. Tolkien’s The Lord of the Rings—a crystal ball used by man and elf as a means of seeing events in any part of the world. By contrast, human vision, no matter how acute, can only extend as far as one’s eyes can reach.
Everyone is talking about data. And what’s the response from high-end service providers such as investment banks, consultancies, accounting firms, and law firms? Well, too much talk and no action. But soon they, especially consultancies, will have to decide what to do next. In the future, when they are planning big data strategies for their clients, the clients may ask: “But how about your own big data strategy?”
Notes1. Apriori is an algorithm for frequent item set mining and association rule learning. It uses a “bottom up” approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. The support, confidence coefficient, and augmentation mentioned in the text below are the core concepts of Apriori. It has wide applications in domains such as business and internet security.
2. K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The distances between each of the k cluster centers and the n data points are used to find the optimal solution to the k-means clustering problem. It uses an iterative refinement technique.
3. The analytic hierarchy process (AHP) is a structured, simple, flexible, useful, and multi-criteria technique for organizing and analyzing complex decisions. It was developed by American operations researcher Thomas L. Saaty in the 1970s. AHP has unique advantages when important elements of the decision are difficult to quantify. Once the hierarchy is built, the decision makers systematically evaluate its various elements by comparing them to each other two at a time, with respect to their impact on an element above them in the hierarchy. In making the comparisons, the concrete data about the elements from the analysts and the judgments about the elements’ relative meaning and importance from the experts are combined. AHP converts these evaluations to numerical values that can be processed and compared over the entire range of the problem. A numerical weight is derived for each element of the hierarchy, allowing diverse and often incommensurable elements to be compared to one another in a rational and consistent way. Then a ranking can be created, putting a set of alternatives in order from most to least desirable.
Note from the editors. This article was originally published in our Chinese edition, developed with Shanghai Jiao Tong University, SJTU ParisTech Review.