I was interested in seeing if there were identifiable patterns within quarterly earnings call transcripts that could indicate stock movement directly after the call. Call transcripts were scraped from The Motley Fool and stock prices were gathered using a quandl API.
Analyzing call transcripts to model stock movement

Scraping The Motley Fool
In order to obtain text for earnings call transcripts, I scraped the Motley Fool. They “provide transcripts of the most recent earnings calls for the companies that [they] cover”. While they only had transcripts for select companies, the transcripts covered a variety of companies and industries that were sufficient for my purpose.
I was interested scraping the following:
Names of the participants and their role on the call
Initially I thought that perhaps if a certain role (i.e. CEO) spoke more, it may indicate something about the company
Prepared remarks text
Question & Answer text
Company Name
Company Ticker
Date of Call
If call occurred on a Friday, I would need price for the following Monday
Time of Call
If call occurred at the end of day, I would need the stock price at opening the following day
Which quarter the earnings call was for
To scrape the earnings call transcripts, I used Beautiful Soup. On the right, you can see the code used to scrape all transcripts.
Preprocessing Text
In order to make the data usable, preprocessing was required.
Because the corpus was very finance oriented, I used multi-word expression tokenizer from NLTK to tokenize phrases such as “adjusted earnings per share”, “adjusted net income”, "and “fiscal year”.
I also used the NLTK standard english stop-words along with some additional stop words that I found to be relevant. A full list of phrases tokenized and stop-words can be found here:
Because the quandl API was only able to provide opening and closing prices per day rather than hourly as I had wanted, the time and day of week in which the call occurred were important. This is because if a call occurred after market hours, I would compare that days closing price to the following day’s opening price for %change in stock. Similarly, if a call occurred end-of-day Friday, I would compare Friday’s close price to Monday morning’s opening price to see the effects of the earnings call on the company’s stock.
Sentiment Analysis for Finance
Using Loughran-McDonald dictionary for additional finance words, and weighing certain words differently, I used this github repo to analyze sentiment of every earnings call transcript: https://github.com/jasonyip184/StockSentimentTrading.
The right figures show negative sentiment against percentage stock price change for earnings call remarks and Q&A; Both are quite similar, which makes sense because if the remarks of a call are negative, the q&A will probably also be quite negative.
However, overall remarks that saw a negative % change in stock price averaged a lower positivity score (0.519) than remarks with a positive % change in stock price (0.530) and a higher negativity score (0.188) than its counterpart (0.186).
Similarly, question & answer that saw a negative % change in stock price averaged a lower positivity score (0.551) than q&a with a positive % change in stock price (0.562) and a higher negativity score (0.185) than its counterpart (0.181).
Topic Modeling with NMF
I was interested in topic modeling with NMF to see what would come out of these earnings calls. Unsurprisingly, the topics were the industries that the company of the earnings call belonged to.
The graph below shows the number of transcripts (that I had available) that belonged to each industry based on the NMF topic modeling.
Interestingly, when I looked at the industries (based on remarks) and their average negative sentiment, Insurance saw most negativity, followed by finance and manufacturing.
This could have just been the nature of the time periods of earnings calls that I scraped, perhaps the insurance industry wasn’t doing so well.
transcripts = [] for x in range(0,len(tscript)): transcripts.append(tscript['remarks'][x]) label = [e[:50]+"..." for e in transcripts] vectorizer = CountVectorizer() doc_word = vectorizer.fit_transform(transcripts) nmf_model = NMF(15) doc_topic = nmf_model.fit_transform(doc_word) topic_word = pd.DataFrame(nmf_model.components_.round(3), index = ['component_' + str(i) for i in range(15)], columns = vectorizer.get_feature_names())
Using Random Forest Classifier
Features:
Open price
Positive and negative polarity scores for Remarks and Question&Answer
Difference of positive and negative polarity scores between Remarks and Q&A
Day of week (dummy variables)
Industry (dummy variables, from NMF)
Quarter of earnings call (dummy variables)
NASDAQ, NYSE or NYSE MKT (dummy variables)
Difference of positive and negative polarity scores between remarks and Q&A Squared