By Prasad Pai, Technical Lead at YML | Jan 30

Presently, there is a major concern on the probability of the likelihood of having a global economic crisis.

In spite of the supposedly cautious mood adopted by few countries, nobody is willing to give a clear message on whether the next recession is just a week away or a year away. And then there are few others, who are giving indications that there is absolutely no economic slowdown at all.

Above all, the opinions of these financial gurus are changing daily from positive to negative outlook and even vice-versa. Everybody’s opinion is justifiable as predicting the future is not an easy task and everyone has his/her wealth of experience behind him/her.

Hence, we wanted to establish a quantifiable assessment to gauge and decide on what the world’s well known financial investors are thinking about the future of the economy.

We wanted to solve this problem through Machine Learning with the least available information/resources in the quickest possible way.

Data collection

To collect data for our problem, we cannot have a one-on-one discussion on a recurring (potentially daily) basis with these investors but we can scrape their interviews, discussions, speeches, etc from YouTube and their messages from Twitter.

To start with, we short-listed few financial behemoths and scraped the transcripts of YouTube videos through YouTube-Transcript-Api and Twitter feeds through Tweepy. We split each YouTube transcript into a duration of a minimum of 5 seconds each and then order them serially to preserve the time-series nature of data.

This is the summary of the data collected in our experiment:

Data summary

Let us focus on all the subsequent discussion in this article with Warren Buffet’s point of view.

Data Validation

As our data has been collected from YouTube and Twitter, we have to benchmark the authenticity and genuinity of the text data with the thoughts being as close to the financial world. This is necessary because we are going to train our models to predict the future of the economy and our text data transcripts have to be related to finance and economics.

While collecting the data we assumed that these financial investors are quite dedicated to their field and will mostly talk publicly every time related to finance and economics. But still, we have to validate our assumed heuristic.

We don’t wish to perform the recommended way of painstakingly filtering individual text statements in our dataset. Hence, we create a small sample of statements of what we believe talks about finance and economics and should represent the state of our dataset. Here is an example of one such sample set.

Custom_1: How is the economy doing in the United States of America?
Custom_2: The current state of affairs is not doing good.
Custom_3: Life will get difficult when inflation kicks in.
Custom_4: We are in a bull market.

a) Known language model embeddings

We generate the sentence embeddings of all the text transcripts in the dataset along with our artificially generated samples by making use of TensorFlow Hub’s Universal Sentence encoders embeddings.

You can experiment with other language model embeddings as well but we chose Universal Sentence encoder as it has been trained on a wide variety of topics. We plot these generated embeddings using TensorFlow’s embedding projector website. Upon performing T-SNE, we observe that most of the sentence embeddings quickly converge into one cluster along with our typically generated examples.

This is an indication that most of our text samples are related to the domain of finance and economics. Here is one of the example cluster what we observed in our experiments.

T-SNE convergence of dataset using Universal Sentence Encoders embeddings


b) Using custom-built language model embeddings

Another thing we have to validate and experiment is the coverage of our dataset. The dataset should extensively talk about as many concepts related to the finance and economics worlds. To check this aspect, we have to obtain a language model created out of general finance and economics.

We weren’t able to get any publicly available language model in this domain, so we ended up training our language model using free to use publicly available textbooks in finance and economics.

We generated the sentence embeddings for our dataset from the newly created language model specialized in finance and economics. We plotted the generated PCA components out of these sentence embeddings using embedding projector website and we were happy to observe that PCA components were wide-spread in all three dimensions.

This indicates that our dataset represents a wide range of subjects in our language model and is not restricted to one particular topic within our domain. Here is an example of PCA projections which we observed in our experiment.

PCA projections of the dataset using custom trained language model embeddings

We performed T-SNE on these sentence embeddings and we found that embeddings were converging into multiple dense clusters. Each cluster is an indication of a specific concept in our specialized domain of finance and economics and this proves the extensive coverage of various topics in our dataset.

On the whole, we are able to validate our heuristic that our financial gurus are speaking only of their area of interest. Here is an example of cluster projections using T-SNE.

T-SNE convergence of dataset using custom trained language model embeddings

Data Filtering

Though this particular dataset has been good enough for our experiment, we may always never encounter such good datasets. We may have a dataset that has text samples related to general discussion and not related to our desired subjects of finance and economics.

In such cases, we will have to filter out the samples whose sentence embeddings are located quite far from any of our artificially generated typical examples embeddings in a standard language model.

To achieve this, we make use of the NMSLIB library. We weed out all those text samples whose cosine similarity lies furthest from all of our custom-generated samples.

To attain a proper dataset in this crude but yet simple way, we may have to keep repeating this cycle of procedures described in data validation and data filtering section multiple times with several custom generated samples.

Sentiment analysis

Once we gather a good dataset of text samples, it is time to process them. Our problem statement is of arriving at a quantifiable measurement to forecast the economic outlook based on the public statements made by the financial investors.

Our dataset comprises only of finance and economic subjects and if we perform a simple sentiment analysis on these samples, we would be able to achieve a quantified metric to understand the underlying sentiments in the statements made by investors.

We make use of Google Cloud’s Sentiment Analysis from Natural Language APIs to perform sentiment analysis on each of the samples in our dataset. We get sentiment values ranging from -1.0 to 1.0 resembling bad to positive sentiment, thereby giving a sense of inner feelings of the person.

Training models

Now it is time to train the models. We have a univariate time series data comprising of sentiment values. Let’s train different types of models to solve our problem and compare them against each other. In each type of model, we split the initial 95 percent of data as training data and the trailing 5 percent as testing data.

a) LSTM model

We will start with a deep learning solution. We will make use of LSTMs in TensorFlow to train our model. After the training is over, we forecast the output one time-step at a time. The obtained result of predicted value vs ground-truth value is shown below. We are not plotting the confidence interval in our graphs as this is based on making predictions by using all the previous correct values after each time step as we proceed to predict the next timestep value.

Here are the graphs obtained in our experiments after training 10 and 25 epochs respectively.

LSTM test predictions at end of 10th and 25th epochs of training

b) ARIMA model

A deep learning solution doesn’t work well in scenarios where you have less amount of data and particularly when you are forecasting using a univariate dataset. We attempt to solve our problem using the statistical-based approach of ARIMA.

ARIMA will internally capture the trends inside the dataset but to do so, we have to transform the dataset to a stationary time series one. This method gives us a better result as we obtain a much smaller amount of test loss.

ARIMA test predictions

c) TensorFlow Probability model

TensorFlow has launched a new framework of TensorFlow Probability which can be used to leverage domain knowledge of various probabilistic models with deep learning. Like how we had employed simple models previously, we create an elementary model using TensorFlow Probability and fit our univariate dataset into it.

TensorFlow Probability can be trained to capture local, seasonal and many other trends inside the dataset which was either absent or little difficult to be explicitly instructed to do so in earlier models.

TensorFlow Probability test predictions

Comparison of different models

This is the average test loss we obtained in our experiments. Note however that these results are local to our dataset and need not necessarily conclude anything.

Loss summary

Understandably, we observe that the ARIMA model is giving the least test loss as our dataset was small and univariate in nature.

Forecasting economic outlook

Finally, we feed the entire dataset and we make use of our best model to predict the future economic outlook. This is the result we obtain in our experiment.

Forecasted Output: 0.100

We will however not emphasize this result as our experiment had several shortcomings which we are listing next and the quality of the result can be improved when we solve them.

Drawbacks in our experiment

  1. First and foremost is the data. We need data to be as recent as possible. As we had a limited amount of data, we had to scrape quite old videos and tweets from YouTube and Twitter respectively.
  2. Data has to be obtained periodically. We had completely ignored this aspect in our experiment and if it is not possible to obtain regularly spaced data, we have to interpolate the missing values.
  3. We evaluated sentiments of our dataset using a generally trained sentiment analysis tool. It would have been better had we created our own sentiment analysis tool which was specifically trained in finance and economics statements.
  4. We factored only sentiments of the statements made by the investor as the training attribute to our model. Though the sentiment is a major factor, yet there may be other minor factors worth exploring like assessing in what mood was the statement made, was it an interview or discussion, etc.
  5. We didn’t concentrate much on hyperparameter tuning as the motivation was to just prove our concept and we employed only simple models.

Future work

Apart from the above-listed problems, there are few other good things worth looking into in our experiment.

  1. The public statements made by investors keep coming every day and the dataset keeps evolving continuously. Online learning methods have to be integrated into our work and the best way to do this is to fit our entire pipeline into TensorFlow Extended flow.
  2. All three models used in our experiment may individually be good in certain cases and it is in the best interest to apply boosting techniques to improve the results.
  3. Club the individual investor’s economic outlook forecast to form a single score.

If you would like to take a look into code used in this experiment, you can look into my GitHub repository.

Y Media Labs is closely working with Google in improving the experience of TensorFlow to all its users across the world and is a part of one of our case studies of our work.

--

About the author

Prasad is a Machine Learning Engineer at Y Media Labs. He is currently responsible for developing prototypes showcasing machine learning capabilities to prospective clients and the development of full-fledged projects which involves experimentation with neural network architectures.