Exploratory Analysis of Financial Data

Financial data is not particularly special in its own right - after all, like any other data, it is just describing some aspect of the real world. It is therefore subject to the same limitations, and its utility is just as affected by completeness, trustworthiness, timeliness and context as other types of data. By definition, however it does underpin the ability to map the success trajectory of a business or enterprise, and for this reason needs to be analysed carefully and methodically.

Certain approaches are common to the exploration of all financial data, such as using time-series analytic methods, and in the treatment of missing data. It would be simplistic, however, to treat all financial data in the same way.

It would be simplistic, however, to treat all financial data in the same way

To really be able to to take advantage of this data one should preferably possess the relevant business knowledge. This is because financial data is really a catch-all term which can describe anything from overseas exchange transactions, to cheese sales at the local dairy, to company audit reports, to stock fluctuations, to point-of-sale transactions at the supermarket and so on. The diversity of subject matter experts throughout the finance industry reflect both this heterogeneity of purpose, and the multiplicity of approaches taken when exploring each area.

Performing the analysis will ideally fall to such an expert, as they will quickly be able to make valid assumptions about the data, know what is relevant and what to ignore and have an idea of what kind of analyses to run. They will also usually have the best knowledge underpinning the question being answered, and consequently know where best to look to try to answer it.

More often than not, however, the person tasked with the job of finding insights from within the data is a data analyst/programmer with more knowledge of the analytic tools at their disposal than of the subject matter at hand. So what should one do in this case? Are there certain analyses that one should always run when exploring financial data? Are there findings about such data one needs to generate in all cases?

As I see it, there are normally two broad stages to exploring financial data. The first is pretty much a look at the generic properties of the data. This involves investigating the context in which it was collected and the purposes for which it is to be used. This is indeed essential before further analysis can be done, and is of course recommended for any type of data to be explored, not just financial.

you really need to get into its guts, pry apart the strands that hold it together...

Only once you have established the data's provenance and the context in which it was generated and collected, can you then begin to get to grips with what is contained within it. To do this, you really need to get into its guts, pry apart the strands that hold it together, and test as many assumptions about it as you can.

For those who are not familiar with the subject area, a deeper look into the data than is ordinarily warranted may be necessitated. Nothing about it can be assumed, and a comprehensive analytic approach is required. The idea of this is two-fold; firstly, to look for patterns in the data, and secondly to learn as much as possible about the subject area, so as to be comfortable discussing it with an expert in the field.

How to Begin?

So how should someone go about this dual exploratory/learning exercise? Where does one begin?

These are the steps I usually take when in this situation:

Ask for any relevant documentation: this may include data dictionaries and data profiles covering the data sources of interest, product disclosure statements, online material including relevant websites related to product development, technical specifications around related projects and any documentation that may have already been written up about the current project.

Talk to subject matter experts: these are generally people within the organisation who have dealt with the data sources of concern, and are familiar with the business rules used to generate them. Sometimes you will want to talk to people external to your organisation, for example about a piece of software the business purchased from them that generates the data you are now considering using, or to a consultant that previously worked on the data, but who has now moved on from the organisation.

Data Maturity Assessment: Determine if a data maturity assessment has been commissioned recently. If you are in luck and it has, then getting your hands on it would allow you to see the big picture of the state of the data, including an idea of its quality, which would impact on the reliability of any modeling you may base on it.

Discover Alternative Data Sources: It is often the case in the financial world that there is more than a single source or version available of the data you are after. For instance, in an archived file, there may be a limited set of data recording ATM transactions from over a decade ago, but after this time until a couple of years ago, a slightly more detailed version may be available in the enterprise data warehouse, and a third even more detailed and up-to-date version of this data may be available in the operational database. It is up to you to determine what data you choose to use, but you need to know about them all in the first place.

Engage in Exploratory Data Analysis (EDA): This is where you get to dive into the data yourself, and may involve examination of multiple data sources in different formats. This is really the crux of this article, and so we will now delve into this topic in more detail.

First off, gaining access to the data may necessitate arranging the appropriate permissions to the particular platforms that you will need. This often means filling in forms and waiting until the appropriate tech team has actioned your requests. It is therefore often best to arrange such access before engaging in the steps mentioned above, so that by the time you get around to EDA, everything has been organised.

EDA - Step by Step

But what are the steps one should take when conducting an EDA exercise for financial data?

1. Digging in at column-level

Histograms:

Histograms should be generated for each column in order to get an idea of the data distribution. Something to think about here is to consider the bin size to use. A typical approach to generating these charts is to use the default settings. If the default bin size is too large, then you will potentially lose a level of detail that would only be exposed with narrower ranges. You may also want to experiment with unequal bin sizes, which can highlight disparities in certain data ranges and will allow you to hone in on segments of interest.

Category Count Tables:

This refers to the process of determining the number or percentage of each value present in categorical columns. This could be limited to just the top 10 or bottom 5 most frequent values for example. This type of analysis can give you a much better idea of the relative importance of certain values within the data set.

Time series charts:

These are the workhorse of financial data visualisations. They are used to uncover patterns – or unexpected deviations – in chronological data, and are relatively easy to interpret. Using different aggregation levels is important here – in cases of visualisations at too detailed a level, there may be too much noise which interferes with the signal you are looking for. Likewise, at overly consolidated time segments, patterns may be potentially missed. Time series charts are great for looking for trends and seasonality, detecting outliers, as well as scenario comparisons. If you spot something unusual, it could be an indication a deeper look is required.

Data profiling:

If a data profile does not exist, build it yourself - see Data Profiling for a prior article I wrote discussing this topic. In a nutshell, this involves examining each attribute of the data and exploring it for the least and most common values, the type and distribution of values, as well as their quality.

2. Exploring Attribute Interactions

When looking at structured data, you should be aware of how one set of numeric values in a particular attribute change with respect to the values of other attributes. This multi-dimensional approach will give you much greater insight into what the data is actually measuring, as well as the relative impacts of various values on the outcome measure of interest. This is not just how the attributes change over time, but how non-time attributes change with respect to each other.

Correlation Tables

These will give you an idea of the direction and strength of the interaction between the different pairs of attributes. This is a whole topic in itself, but there are different types of correlation coefficients which are suitable for comparing different types of data. For example, the Pearson Coefficient is ideal when working with similarly scaled attributes, and you want a linear measure of the relationship. Other types of correlation coefficients such as the Kendall Rank Correlation, the Spearman Correlation or the Point-Biserial Correlation, may be used, though not all may be relevant for analysis of financial data. The main thing to note is that a coefficient closer to 1.0 implies a strong positive relationship between two attributes, closer to -1.0 implies a strong inverse relationship between two attributes.

Pair Plots:

This is a grid of correlation graphs run against all pairs of multiple attributes simultaneously, and will give you a visual concept of the relative direction and magnitude that each value pair may take. For example, there may be an inverse relationship between transaction amounts debited from savings accounts and the remaining balance. Although this seems like an obvious relationship, speculating on the reasons for this may be the subject of further and deeper analysis.

Often such quick insights will be obvious to the subject matter experts you have been consulting, but they may not have quantified the effect or have evidence for it. They may really appreciate when you share it with them, especially if you can describe your exploratory analysis visually.

3. Text Exploration

You could be forgiven for thinking that there is no way to quickly analyse non-structured, textual data. This is a common misconception, but much can be gained from applying a couple of quick techniques to text.

Word Frequencies:

This technique allows you to determine the most frequently used words expressed in tabular format within a dataset. It is really only a grouped word count exercise where you don't include stop words or connecting words such as 'but', 'the', 'and' etc, but will give you a good idea of the main topics or sentiments expressed within.

Word Clouds:

This is another clever way to visualise words and phrases used commonly throughout textual data. It often impresses clients, but is also a quick and simple way to see what topics are of interest in surveys, what customer complaints are dominant or perhaps what themes may be of concern in a series of management reports. I generated the example above from an IMF Regional Economic Outlook report covering the Western Hemisphere for 2018. This technique is often used in deeper analyses, but I include it in EDA since it is quite quick and simple to implement.

4. Writing up an Exploratory Data Analysis Report

Once you have gone through the activities involved in exploring your data, it is important that you record your findings. This should take the form of a short report, which will then inform the future direction of further analysis and application creation.

The report need not be a formal document, but it needs to be able to be read by other analysts - possibly by those who are non-technical, and so should be well structured. One idea is to describe your findings using the same topic headings as in this article. Another would be just to describe in point form the main findings or highlights that would lend themselves to further work. Make sure you include any visuals that would support your findings.

As well as containing your findings, you should also include proposed directions that follow-up analysis could take. Such proposals need to take into account the goals of the business so that tangential directions are not pursued needlessly. This makes the exploratory analysis not just a technical exercise, but a means of justifying further work, as well as providing the basis for informed decision making.

Hopefully I have outlined a clear way for you to pursue Exploratory Data Analysis that will help you analyse your financial data more effectively so you can push your enterprise onward and upwards.

Good luck out there!

Exploratory Analysis of Financial Data

How to Begin?

EDA - Step by Step

The 6th V of Big Data

Data Profiling