Alternative data, federated learning and supply chain finance

By Professor Houmin Yan

欲於待, 則書之成未有日也
宋, 戴侗, 《六書故》

Were I to await perfection, my book would never be finished
Dai Tong, History of Chinese Writing, Song Dynasty

Professor Houmin Yan, Chair Professor of Management Sciences, and Director, Hong Kong Laboratory for AIPowered Financial Technologies Ltd. charts the chequered history of alternative data, ensuing concerns over data privacy, and how a federated learning model working in the context of supply chain finance can provide rich and transparent alternative data.

I clearly remember my excitement at the news of AlphaGo defeating the top human Go players.

I clearly remember my excitement at the news of AlphaGo defeating the top human Go players back in 2017. To crack this complicated game, AlphaGo used 300,000 games to train the model. The occasion had been anticipated by an article about mastering the Go game with deep neural networks published in Nature the previous year. Subsequently, another article about applications of reinforcement learning for Go was published in Science. It generated great interest among my colleagues. Jeff Hong organised a study group on machine learning in the college and many young faculty and PhD students participated. We all hoped that with modern data technologies, AI algorithms would demonstrate their full potential. However, for many industry applications, data remained the issue. Did we have sufficient data to train AI? Could we answer concerns over data security and privacy? And in the light of these questions, were new algorithms or computational architecture necessary?

A detour to Cambridge

"Farewell to Cambridge," a poem by Chinese modern romantic poet Xu Zhimo, was in the back of mind.

Around this time three years ago, I was assigned by the European Foundation for Management Development (EFMD) as a member of the EQUIS peer review team for Nova School of Business and Economics, Portugal. It happened that CB had an EMBA overseas module running at Cambridge University at the time. The overseas module started on a Saturday and the EQUIS accreditation started on the following Monday. Knowing that I had not been to Cambridge before, my EMBA colleagues asked me if I could make a detour. How could I reject such an appealing proposal? There was the famous statute of Isaac Newton at Trinity College to see, and besides "Farewell to Cambridge," was in the back of mind. This poem by Chinese modern romantic poet Xu Zhimo, had been with me since school days, striving to loosen the Chinese traditional form and to reshape it using influences from western poetry styles. The prospect of a visit to historical Cambridge was alluring, but my visit was to pull me in more contemporary directions.

Early warning signs: Cambridge Analytica and the expandability of data

Facebook's default settings allowed Cambridge Analytica to harvest respondents' Facebook friends, a dramatic example of the expandability of alternative data.

On the first day I attended a lecture by Judge Business School Dean, Christoph Loch. But regrettably, I missed a second lecture by David Stillwell on the use of personal data. This was a big miss since that week the New York Times and the Guardian reported that British consulting firm, Cambridge Analytica, had obtained the personal data of millions of Facebook users. Stillwell and Aleksandr Kogan had developed an app surveying limited numbers of Facebook users for academic use, but Facebook's then default settings also allowed Cambridge Analytica to harvest respondents' Facebook friends, a dramatic example of the expandability of "alternative data" in this case with significant negative outcomes. Without user consent, Cambridge Analytica employed the data predominately for the political campaigns of Ted Cruz and Donald Trump. The scandal was widely reported resulting in Cambridge Analytica going bankrupt, Facebook settling the case with billions of US dollars, and in 2018 the EU's General Data Protection Regulation (GDPR) was to take effect.

Ant Group uses accumulated transaction data for loan approval

Ant uses accumulated transaction data to continuously optimise its business decision-making algorithms.

Another landmark instance of the use of alternative data relates to the Ant Group. Starting from 2004 as Alipay, Ant Group gradually started to provide innovative financial services such as financing, wealth management, and insurance to both consumers and companies. Their approach differed from traditional bank loan approval processes, which were heavily based on a firm's financial and accounting data, and in turn focused on various financial ratios, such as the ratio of the loan to the borrower's total assets, the current ratio, leverage ratio, and liquidity ratio. The Ant Group's approach uses accumulated transaction data (alternative supply chain data) to continuously optimise and train its business decision-making algorithms, thereby improving target customer identification and customer acquisition capabilities. For loans, Ant Group automatically sends repayment reminders to the borrower, with most repayments set to be automatically repaid through the borrower's Alipay account. According to investment bank reports, the company has the right to directly deduct both principal and interest from the person's Alipay account. In continuing to expand the use of transaction data, this alternative approach and alternative data clearly contributed to Ant's business model for facilitating billions of loans. The last-minute halting of Ant's IPO in November 2020 was probably another example of concerns over data monopoly and privacy.

The regulatory agencies suggest alternative data for credit scoring

The most common AI techniques have a central data processor that imposes risks in crosssharing customer data.

The use of customer data has understandably drawn the attention of the regulatory authorities. In November 2020, the Hong Kong Monetary Authority (HKMA) and Hong Kong Applied Science and Technology Research Institute (ASTRI) released a white paper "Alternative Credit Scoring of Micro-, Small and Medium-sized Enterprises (MSMEs)." Through Financial Technology (FinTech), the paper argued that an alternative approach and alternative data could facilitate loans to small and medium-sized enterprises. The paper indicated: "There has yet to be a treasure hunt to develop models that use alternative data for credit scoring. Machine learning and AI techniques seem to offer the best chance by far for the financial industry to crack the code." The most common machine learning and AI techniques have a central data processor that collects data from various sources. However, it imposes risks in cross-sharing customer data. In contrast with the traditional learning approach, federated learning makes use of raw data for training models to obtain intermediate results. To reflect a joint effort in developing useful models, rather than the raw data, the intermediate results are shared among raw data contributors.

With federated learning the data is available but not visible

Federated learning takes advantages of recent developments in machine learning whilst maintaining data security and privacy.

For data security and privacy issues, Google has developed a concept of federated learning to provide a machine learning environment such that datasets are distributed and data leakages are prevented. This approach was elaborated by Professor Yang Qiang of the Hong Kong University of Science and Technology, published in "Federated Machine Learning: Concept and Applications," in 2019. With different alternative data sets, Yang reckons that federated learning can be a good vehicle, taking advantages of recent developments in machine learning whilst maintaining data security and privacy.

Taking a two-party machine learning example, Yang classifies federated learning as horizontal, vertical, and transfer learning based on varying data needs. Horizontal learning applies when similar feature data exist but data is owned by different companies who serve different customers. Vertical learning applies when different feature data presents and data is owned by different companies who serve the same customer. Transfer learning applies when feature data is different and the customers served are also different. With federated learning structures, the objective is to make the other party's data available but not visible, and to make use of other party's data but not change data ownership.

The key issues for the financial industry are expandability and the lack of transparency of AI technologies.

In the last few years, various algorithms and architectures have been proposed for financial applications, but have generally fallen short of the full adoption of AI and machine learning for loan arrangements. One could argue that there is a lack of regulatory guidance on the application of AI algorithms. Obviously, current bank systems' governance processes for technology, digitisation, and related services deployments might not remain fit-for-purpose in AI-governed environments. But the key issues for the financial industry are probably concerns over the expandability and lack of transparency of AI technologies.

Safety first: Banks favour transparent algorithms

The EU's General Data Protection Regulation requires AI algorithms to explain their decisionmaking.

AI algorithms, such as deep learning algorithms implemented by neural networks, have been described as a "black-box." Banks favour transparent algorithms characterized by clarity. For example, if age is used as a factor in the credit screening process for lending in a traditional bank, can traditional algorithms, such as logistic regression or decisiontrees, provide clarity on how age has played a role in the decisionmaking i.e., to approximate the relationship between inputs (e.g. age) and outputs (probability of default?). The decision-tree based machine learning algorithm has been considered as a promising candidate for understanding, interpretation, and visualisation. It has also been widely tested as being comparable with deep learning algorithms, which all depend on the nature of applications. If data is highly structured, the decision-tree based algorithm performs very well in competing with deep learning algorithms. But it may produce complex trees and become unstable because of small variations in data resulting in different tree structures. Another noticeable feature about the decision-tree based algorithm is the lack of support for horizontal federated learning. Actually, in addition to requesting data privacy and ownership, the EU's General Data Protection Regulation also requires AI algorithms to explain their decision-making. The research frontier is therefore two-fold: Firstly, to enhance cryptographic operations in the decision-tree based algorithms, and secondly to add linear proxy models or decisiontree structure to deep learning algorithms.

The promising role of supply chain finance

Supply chain finance, with its rich alternative structured data, can be a promising domain for AI for loan issuing and risk management.

To this end, we wish to suggest that supply chain finance can be a promising domain for AI and machine learning applications: it has rich alternative structured data, which banks have not made use of, or have no access to, for loan issuing and risk management. A supply chain is a network formed by manufacturers, suppliers, and distributors. The network has three flows: flows of physical goods, flows of information about the goods, and flows of payments of the goods and services. It involves processes such as resource integration, goods design and manufacturing, procurement and production, logistics, and sales and services. Supply chain finance involves financing and financial management for supporting the above-mentioned supply chain processes. Typical supply chain finance models include account receivable financing, inventory financing, prepayment financing, and credit financing. The following diagram represents the relationships of supply chain, supply chain management, and supply chain finance.

Leveraging on the traditional loan indicators, such as ratio of loans to total assets of the borrower, current ratio, leverage ratio, liquidity ratio and profit ratio, and on continuously improved credit risk assessments resulting from accurate models, FinTech applications in supply chain finance aim at speeding up the credit scoring/ lending process and strengthening risk management, including liquidity management, business line allocation, product line allocation, and pricing.

Exploiting transaction data

Exploiting transaction data has also becomes a key competitive advantage in selling. Recently, I started working with a new IDBA student, Mr Lie Li. After graduating from the UK, Mr Li started a new business in crossborder e-commerce. His business turns out to be very successful. Mr Li demonstrated to me examples of how he exploits supply chain data in seeking best-selling products on eBay and Amazon. Take the eBay data for example, it not only provides dynamic pricing and sales data but also supplies website click information such as "other items customers have viewed."

Laboratory for Artificial Intelligence Powered Financial Technologies

In his recent budget speech, Hong Kong Financial Secretary Paul Chan indicated that the InnoHK Research Cluster programme will be officially announced in March, an announcement that has been delayed twice. When reporters asked for details of the programme, Secretary Chan declined to give further advance information. My friends asked me about my opinions of the Financial Secretary's speech on the InnoHK programme. I told them to count only the face value. "If you really want to read between the lines," I added, "probably CE plans to make the announcement, and he does not want to steal the limelight from his boss."Actually, CityU will host three laboratories and Laboratory for Artificial Intelligence Powered Financial Technologies (AIFT) is one of the three.

Rich alternative data

AIFT has identified three R&D themes: AIdriven financial services, AI-enhanced financial technology, and social media analytics.

AIFT will work with City University and Columbia University to assemble a team of professors and research students to conduct academic research, train local talents, and form start-up/spinoff companies. AIFT has identified three R&D themes: AI-driven financial services, AI-enhanced financial technology, and social media analytics. Among them, supply chain financing is one of the intended projects. We plan to extract business value from data, and to develop the augmented intelligence to facilitate credit and risk management decision-making. We have developed a conceptual framework as follows.

Accounting and financial information provide the traditional data. Supply chain information and market sentiment belong to rich alternative data. Because of the multi-party nature of supply networks, we propose a federated learning environment empowered by gradient boosting decisiontrees. In addition to traditional and accounting-based loan indicators, we plan to make use of modern portfolio management and asset pricing theories to evaluate credit worthiness.

Novel research results at CityU

Novel research results at CityU also shed new light into exploiting supply chain alternative data and developing machine learning algorithms. For example, Dr Junming Liu of the Department of Information systems conducts research work in an attempt to find best-selling products. He has developed a GCN-LSTM Deep Ranking model that leverages social media activities and their implicit influences on product popularity. This information can be used, in conjunction with supply chain transaction information, as a leading indicator for future sales. Dr Yining Dong of the School of Data Science is working on algorithms to overcome the notorious lack of interpretability of machine learning algorithms by adopting gradient boosting decision-tree to federated learning. We are quite confident that AIFT will provide useful results and we welcome faculty, students, and alumni to join in our efforts.

Professor Houmin Yan
Chair Professor
Department of Management Sciences