AI and machine learning on social media
data is giving hedge funds a competitive
Unicom conference on ‘AI, Machine
Learning and Sentiment Analysis Applied
to Finance’ examined the state of the art.
Extracting value from a universe of data,
analysing sentiment around company
names (equities) or about anything else
(macro), is a complex journey and we are
only about 5% down that road.
The parameters are evolving by which an
ever-expanding data set, including the
likes of Twitter, pictures, text, video is
processed; relying on experts versus the
wisdom of the crowd; sentiment derived
from a “bag of words”, as opposed to
structured linguistic analysis.
Last week’s Unicom conference, AI,
Machine Learning and Sentiment Analysis
Applied to Finance (July 14) brought
together a group of experts in this area.
Professor Gautum Mitra, OptiRisk
Systems introduced Elijah DePalma and
James Cantarella, Thomson Reuters;
Pierce Crosby, StockTwits; Anders Bally,
Sentifi; Peter Hafez, RavenPack; Stephen
Morse, Twitter.
DePalma differed somewhat from the
others because the Thomson Reuters
sentiment engine uses only accredited
Reuters news data, rather than raw social
media chatter. DePalma explained: “When
we extract features, the simpler approach
that’s often done on academic literature,
is a ‘bag of words’ approach. What we
are doing is a bit more sophisticated; we
are doing linguistic parsing, where you
are looking at the structure of the
language – so you can think object, verb,
subject-type representation.”
An example of this in action could be the
sentence: “IBM surpasses Microsoft”. A
simple bag of words approach would give
IBM and Microsoft the same sentiment
score. DePalma’s news analytics engine
recognises “IBM” is the subject,
“Microsoft” is the object and “surpasses”
as the verb and the positive/negative
relationships between subject and the
object, which the sentiment scores reflect:
IBM positive, Microsoft, negative.
“So you are creating a grammatical parse
tree and one of the benefits of this is,
rather than bag of words where you might
have tens of thousands of features, you
have a low dimensional feature
representation when you create these
grammatical parse trees. This makes the
last step there – classification – much
“But it also makes the sentiment scores
about 20% more accurate; from say 60 %
accurate up to 80%. And keep in mind that
among human readers, internal accuracy
consistency is around 85%.”
DePalma pointed out the parsing
approach also affects how Reuters
approaches foreign languages, as in case
of its Japanese news analyitics service.
“Why not take auto translation engine like
Google Translate, translate Japanese
language to English and apply your
English? Because we would lose the
language structure and essentially
reduced to a bag of words type
The unstructured “noisy” character of
data such as Twitter has not stopped big
hedge funds and asset managers
analysing it in an attempt to get an edge
over their competitors.
Stephen Morse, senior manager, data
partnerships and sales at Twitter, said:
“The financial space is a rapidly growing
vertical for us. We serve hedge funds
directly, prop traders, market makers,
banks, fintech partners, etc.
“We are not a news organisation but
events break on Twitter very commonly
now – not only around significant
financial events but act of god events. So
this is a big use case in financial markets
and sentiment analysis is a very common
use case and we are seeing that at the
‘cashtag’ level.
“A number of CEOs start to communicate
on twitter before they do anything else,
like Elon Musk. If you want to know what
he’s doing you have to go to Twitter – it’s
the first place he will go and, often it’s the
only place he communicates.”
Morse said sentiment derived from
consumers about certain brands, which
can also impact equity prices, is a new
twist on the subject currently being
explored and which we can expect to see
a lot of in the future. Twitter can also
gauge macro and geopolitical factors he
said, citing a study last year which
showed Twitter data predictive of
unemployment levels in the US.
StockTwits, which provides real time
commentary on individual companies,
was the inventor of the cashtag, adopted
by Twitter over time.
Pierce Crosby, business director and data
evangelist, StockTwits said: “Basically all
of our conversations are structured
around individual companies. But also we
allow users to add binaries, so they add a
bullish or a bearish tag to their messages.
“From a database standpoint, it becomes
a classifier for a large database of data
because you have these binaries that
eliminate a lot of false positives, or things
like people trying to be funny with their
Crosby said that while sentiment is the
obvious low hanging fruit, the data can
also be used to look into volatility of
stocks. “I think on the macro level it’s
really interesting but on a company level
and sector level it’s also very interesting,
where more or less we watch volumes
spike in real time on either different asset
classes or companies or ETFs, and as
that actually translates into realised
“We have run a study that actually looks
at the predictive element of crowds
around, not just events, but just on daily
trading activity. So basically trying to
correlate volatility as it applies to
companies from social data is becoming
an area that people are really interested
Peter Hafez, chief data scientist,
RavenPack said an important concept
right now is “democratising data”. Big
hedge funds and asset managers want to
know they can get the data they need,
whether in-house or external, at the time
they need it.
He said: “There’s a lot of new data being
produced out there that we can take
advantage of, and a lot of asset
managers and hedge funds have become
a little bit of a data hoarders.”
The data can be anything from emails to
instant messages or legal documents.
These can be fed to people using his
company’s data engine, which is like
access to a private cloud. “In the end
what people are trying to build is almost
like an internal Amazon, where you can
go on a platform and say, I want to get
back what we know as a company about
“Then you’ll get, this from the legal
department, we know that from the Dow
Jones news wires, we know this from
Twitter, we know that from the analyst
reports we get in our inbox. So you can
take all of these different sources and
combine it.”
DePalma added: “In the last five or six
months, I have had a number of client
trials with large fund managers – large
being more than several hundred billion
“One of them spoke transparently with me
that they believe these behaviour finance
tools will be maturing in the next five to
10 years and they, like their portfolio
managers, are already incorporating
these types of signals into discretionary
“So that as these tools become more
reliable and mature, their portfolio
managers will have already incorporated
them, to use them for their large fund
DePalma also said Reuters is considering
a vendor partnership with a San
Francisco-based firm called Now-Casting
Economics , which has a network of
30,000 contributors globally, primarily in
emerging markets, that take pictures on
their phone and upload them on a daily
“They call it ‘nowcasting’, as opposed to
forecasting; identifying the state of some
economic indicator at the current value.
They are taking pictures of traffic
congestion at times of the day; prices of
staple foods; unemployment lines etc.
“This company has an AI system of visual
object recognition to automatically update
the info where it’s coming from with a
geotag and then build that into their
econometric model.
“They provide estimates of things like
inflation and unemployment, either in
counties where that data is not reliably
produced by the government or where
there is no government sources producing
that data. So that’s a fascinating AI
application of visual recognition.”