Monkey Business: Predicting Volatility In The Financial Markets (from 22.96% To 22.22% Error Reflections)

This post reflects work in progress, and ideas that are being tested/implemented at the moment. Meanwhile the competition has had a new best result (score below 20%).

Monkey Business: Predicting Volatility In The Financial Markets (from 25.9% To 22.96% Error)

The results of the competition are now official and the winners - determined, but the game Is still on, and, moreover, some solutions have been published (therefore more possibilities to improve the first, very basic, solution from previous post).

There are many things to do with the original script, and ideas to implement, essentially:
- trying different models (other than OLS method - in published solutions I’ve seen XGBoost, Huber regression, Weighted Linear regression, LTSM, RNN, etc.)
- feature engineering (which breaks in three categories - “by ID”, “by product”, “by date” )
- separating volatilities data from returns data and treating them separately
- ensembling / stacking

Different models that I’ve tried included linear weighted regression (as in this (https://github.com/FrancoisPierre/CFM/blob/master/Starting_kit.ipynb ) baseline solution), XGBoost and Huber regression (both mentioned in http://datachallenge.cfm.fr/t/proposed-solution/105 this solution).
For the last two - I am surely not tuning them in a correct way, as their results for me are far from those mentioned in the article, and even quite far from weighted linear regression, for that matter.

For now the returns are completely dropped out of prediction (as with them for this model the prediction is way worse then just with volatilities).
Finally, the most important part is generation of new features based on “ID” dimension. The “basic” are:

Monkey Business: Predicting Volatility In The Financial Markets (from 36.7% To 25.9% Error)

Curiosity killed the cat. Also, it made me register for the challenge of predicting the volatility of selected stocks, organized this year on Challenge Data by CFM. The presentation of the challenge is available here.
This post is my first steps in analysing this purely financial dataset in the data science setting.

Crisis Time : Corporate Governance And Performance Of Firms

This post is rightfully the first as it’s about my master thesis paper, which is the study of corporate governance and how it affected performance of firms during the world financial crisis of 2007-2008, with a focus on emerging markets.

Product Client Fit

Disclaimer: The purpose of this post is to demonstrate the logic and the methodology, model choice, etc. The data has been partially generated/scrambled by the author. Demonstrated results cannot be perceived or interpreted as real data of any financial organization.

How, where, when and why clients choose financial products? This is a question managers, analysts and “conseillers d’affaires” ask themselves all the time, as this is what the revenue will ultimately depend on. Equally important is whether the client will leave, at which moment and why (as then incentives to make him stay can be found).

Product Client Fit And Client Behaviour

Disclaimer: The purpose of this post is to demonstrate the logic and the methodology, model choice, etc. The features and predicted products have been totally anonymized and partially scrambled by the author. Demonstrated results cannot be perceived or interpreted as real data of any financial organization.

Santander Lessons

This post is about a well-known case in DS community in general - Santander Product Recommendation Challenge.

Though I used rather than wrote the code myself (kernel 1 and kernel 2 ) only adding a few things, this one is notable because it gave me both tools and inspiration for continuing to explore how and why bank clients choose certain products.

Analyzing Social Media : More Than Words Can Say

This post is a work in progress. First of all, this is an interesting story (and an even more interesting dataset).