Posted by Rory Barrett
3 Data Models to Instantly Improve Your Data Mining Strategy
Data mining is a term that has been described many different ways, whether it be Big Data, Data Science, Data Analytics or even Business Intelligence just to name a few. The Oxford University Press English Dictionary © (2018) gives a simple definition of “The practice of examining large pre-existing databases in order to generate new information.” Did you catch that? I’m not sure I did either. In this age of big data, we are surrounded by continuous data sources, technologies of all kinds have permanently invaded our day to day lives and are contributing to that constant overload of large streams of data. In 1982 John Naisbitt in his first book “Megatrends” coined the phrase “We are drowning in information but starved for knowledge.” More than 30 years ago it was already becoming evident to technology experts that large amounts of data without the ability to understand it was quickly becoming a problem. In today’s modern, technologically driven society one can only imagine how far this issue has come.
"We are drowning in DATA but starved for INSIGHT"
But why does it matter to you? Data mining has grown to play a critical part in strengthening the AML Compliance programs in industries such as Education, Healthcare, Transport, Retail and commerce, Utilities, Manufacturing, Research, Finance, Telecommunications, or in fewer words …everything. Data mining is heavily utilized in credit card analysis, fraud analysis, patient diagnostic analysis, logistics management, speech analysis, power usage analysis and so much more. Data mining impacts our business at its core. Data Mining allows us to identify various patterns in large clusters of data, and with this give us insight into what our data really means. Along with the ability to identify patterns comes the potential to predict behaviours and trends. For this reason alone it is important that we get a firm grip on what is really hidden within our data. Whether your focus is to improve the quality of your services or optimize product offerings or combat fraudsters data mining cannot be ignored.
Usually, when trying to use data to solve a problem, the data we want tends to be disjointed. Meaning that we’re not talking about using data necessarily from one place or a single source and sometimes that can be a real challenge in of itself. For example, if we’re trying to identify potential cases of collusion there are a number of factors to consider: Firstly, identify the data source of suspicious transactions, to be able to see details about the various activities we might want access to certain customer records to link our suspicious transactions to the details of those involved.
"We’re experiencing a phenomenon called the Data Explosion Problem"
In today’s world, we’re experiencing a phenomenon we’re calling the Data Explosion Problem.
Technology has advanced so rapidly over the last couple decades and with those advances we’ve been generating so much data that we don’t know what to do with it, becoming experts at collecting data on almost anything and everything that happens in our society but we’re not really able to do much with it. Until now. Data Mining allows us to finally put all that data to good use by managing these large sets of data and analyzing them to gain new insights that we can use to better respond to threats, quickly identify new opportunities and gain a competitive edge.
Data Mining uses different strategies known as data models to find various ways to gain new knowledge from your information.
1) Anomaly Detection
Looking at any group of data whether that be a set of customers, transactions, or even a combination of the two and the model will analyze the information you feed it and identify various patterns based on what it sees in the different data elements. These patterns show you what would be considered normal or expected behaviour based on the data and what you might find is that there also exist certain deviations or outliers. Outliers are anything that stands out is something which simply does not fit as well as the others.
This is a model commonly used in a number of fraud detection strategies because of its ability to quickly identify unusual behaviour without requiring that you tell it a concrete definition of what is considered fraud. As the data you feed it changes, so do the customer patterns so as your data changes, so do the patterns determined by the model.
Clustering is very similar to anomaly detection in its ability to pattern match. Except rather than single out anomalies, it’s simply showing you how certain networks or groups are related simply based on patterns in the data. Unbiased and purely data-driven.
Clustering is very effective in performing crime analysis to identify various patterns in where, when or how different incidents of crime can occur.
3) Neural Networks
This model is designed to help in predicting decisions based on past behaviour where it is designed to “learn” and “improve”. People are often considered creatures of habit, and it’s true. When you do something once and it works, you’re more likely to do it again. Using elements of Machine Learning you would now be able to take what knowledge you already know about fraud and create a model that can begin to predict incidents of fraud before they occur.
In a recently concluded Webinar, it became evident that there are so many things that we can use data to make an impact in not only improving the things we do every day but to discover new ways to better our future. Technology has reached a point today where almost anyone can not only start data mining on our own but find some way for it to benefit ourselves and our companies. With every day that passes more and more persons are seeing the benefits of this new technology so why wait?
Topics: Risk and Compliance
Rory Barrett is a Business Analyst and Project Manager at Symptai Consulting. He has spent years contributing his knowledge of modern technology practices to projects in Business Assurance, IT Audit and IT Security with specializations in implementing and designing data analytics in anti-money laundering programs.