There are a number of drivers of volatility in financial markets. Among other factors, the valuation of an asset is influenced by an attitude or opinion of people towards the relevant company, sector, or specific person. Over the last few years, we witnessed how just a few messages on social media platforms can cause an avalanche of opinion, causing stocks to enter a valuation tailspin. Our client, a leading Global Asset Manager with over USD $8 trillion in assets under management (AUM), needed a comprehensive review of attitudes and opinions (sometimes called “sentiment analysis”) across social and media platforms. The goal was to increase forecasting accuracy, using various information sources including the NewYorkTimes, Sec.gov, SeekingAlpha, Twitter.
From specific use-case to full-scale platform
Sentiment analysis uses Natural Language Processing (NLP) to categorize posts or comments that mention a particular company into categories such as Negative, Positive and Neutral. These serve as a barometer of the sentiment of the community toward the stock, be it bullish, bearish or anywhere between. In order to start using NLP, large volumes of data from various sources must be ingested and transformed. Coming into the ideation and solution phase, our client intended to create a set of data ingestion and transformation pipelines for two specific cases. After a joint review of alternatives, both companies made the decision to proceed toward a more universal solution, minimizing the need for manual data-crunching work. Solution had to ingest the data both in the stream and bulk regimes from various sources and support machine learning based transformation pipelines.
To get the experience right, we had to ensure that analysts could build the desired data transformations without prior experience in programming languages. Each component of the interface had to be intuitive, to allow less technically-savvy users to validate the data and be able to interpret the results.
To get the experience right, we had to ensure that analysts could build the desired data transformations without prior experience with programming languages. Each component of the interface had to be intuitive, to allow less technically-savvy users to validate the data and be able to interpret the results.
We have made our architectural choices to provide a focus on reliability, scalability, and best-in-class prediction accuracy.
- Kubernetes & Docker: To manage the workload split between pods;
- Apache NiFi: To implement and orchestrate ETL pipelines;
- PyTorch and Tensorflow: To train machine learning models and perform low-level data transformations;
- Kafka: To communicate between microservices, providing system scalability and circuit-breaker patterns with the help of Kafka message blocker;
- InfluxDB: To implement a time-series database with low latency and support for huge volumes of data.
of data processed per day
increase in the accuracy of forecasting