• The Role of Continuous and Discrete Data in ML Models

    Disclaimer
    The content published on this blog is original and authored by me. Portions of the articles have been prepared and submitted as discussion writeups for the Professional Certification in Machine Learning and Artificial Intelligence program at Imperial College London, Cohort January 2025.

    All rights are reserved by the author. Unauthorized copying, reproduction, or distribution of any part of the content, in any form or medium, is strictly prohibited without prior written permission.

    Context –

    In the card payments ecosystem, schemes like Visa, Mastercard plays a vital role throughout the entire transaction lifecycle. They typically use ISO 8583 standard for data exchange during the authorization phase. Their members, Acquirers (Eg- Worldpay) , card issuers (banks), may adhere to either the ISO 8583 or ISO 20022 standards, depending on their specific requirements and infrastructure.

    During transactional information exchanges, a vast amount of related data must be communicated in ISO format, encompassing numerical data (Continuous / Discrete) as well as categorical data.

    Numerical Data –

    Considering an example scenario where MasterCard sends a ISO 8583 card authorisation message to the Issuer bank for either approval while a POS transaction takes place.

    Continuous-

    The ISO message includes various numerical, continuous data points, such as the transaction, reconciliation, cardholder billing amounts, conversion rate for reconciliation, and conversion rate for cardholder billing.

    Additionally, precision may vary, as not all global currencies use two decimal places—some have none, while others extend up to four decimals. Since these fields contain continuous numerical data, their values can span a wide range (e.g., a high-value transactions will reflect larger numerical values). The finest option of Machine learning algorithms on such target numerical continuous variables could be a Regression model, Eg- to predict the differences in the conversion rates in next 3 months (Linear Regression / XG Boost etc).

    Discrete –

    In addition to numerous continuous fields, ISO 8583 messages within the ecosystem also transmit various discrete data.

    A finest example to describe discrete data in this context is ISO currency codes (Ref – https://developer.mastercard.com/mastercard-send/documentation/implementation/currency-codes/)  Every currency code is assigned with a 3 digit unique whole number.

    The transaction must include the currency codes for the transaction, reconciliation, and cardholder billing. (Eg – 826 is GBP, 840 is USD)

    If the message does not contain valid values, it may result with rejection.

    Another example of discrete data is the “Reason Code” used when submitting transactions to the Issuer bank. Transactions can be sent to the Issuer for various purposes, such as AuthorizationFirst Presentment during clearing, or Second Presentment for the dispute resolution. The “Reason Code” field must contain a valid value from a predefined set of options. Any value outside this set is considered invalid and will be rejected by the system.

    These values do not undergo any mathematical calculations, and are used for representation purpose only.

    In the context of machine learning, discrete data plays a crucial role in representing specific values, which can significantly impact how models are trained, evaluated, and interpreted. As a target variable discrete data can be applied to a Machine learning for a Regression or Classification tasks (Logistic Regression, SVM or Random Forest) Model.

    Although it is not advisable to assign weight or importance based solely on whether data is continuous, discrete or categorical, transaction data must derive its value from the characteristics of its fields.

    But in the perspective of Machine Learning, understanding the nature of the variable is essential to apply a suitable algorithm, for example a logistic regression cannot be applied to a continuous numerical variable. Similarly a Linear regression cannot be applied to Boolean variable.

    In data analytics too, when generating visualizations/insights such as histograms / heat maps, the characteristics of the field—whether continuous or discrete—play a crucial role. Selecting the appropriate visualization (such as graphs) is essential to understand, to effectively convey meaning and add value based on the type of variable.

    By leveraging my professional responsibilities and daily tasks, these examples represent a routine part of my work. Expertise in this area comes from a clear understanding of the nature of data and its implications.