I was interested in seeing if there were identifiable patterns within quarterly earnings call transcripts that could indicate stock movement directly after the call. Call transcripts were scraped from The Motley Fool and stock prices were gathered using a quandl API.

Analyzing call transcripts to model stock movement

Scraping The Motley Fool

In order to obtain text for earnings call transcripts, I scraped the Motley Fool. They “provide transcripts of the most recent earnings calls for the companies that [they] cover”. While they only had transcripts for select companies, the transcripts covered a variety of companies and industries that were sufficient for my purpose.

I was interested scraping the following:

  1. Names of the participants and their role on the call

    • Initially I thought that perhaps if a certain role (i.e. CEO) spoke more, it may indicate something about the company

  2. Prepared remarks text

  3. Question & Answer text

  4. Company Name

  5. Company Ticker

  6. Date of Call

    • If call occurred on a Friday, I would need price for the following Monday

  7. Time of Call

    • If call occurred at the end of day, I would need the stock price at opening the following day

  8. Which quarter the earnings call was for

To scrape the earnings call transcripts, I used Beautiful Soup. On the right, you can see the code used to scrape all transcripts.

themotleyfoolscraping_python.png

Preprocessing Text

In order to make the data usable, preprocessing was required.

Because the corpus was very finance oriented, I used multi-word expression tokenizer from NLTK to tokenize phrases such as “adjusted earnings per share”, “adjusted net income”, "and “fiscal year”.

I also used the NLTK standard english stop-words along with some additional stop words that I found to be relevant. A full list of phrases tokenized and stop-words can be found here:

Because the quandl API was only able to provide opening and closing prices per day rather than hourly as I had wanted, the time and day of week in which the call occurred were important. This is because if a call occurred after market hours, I would compare that days closing price to the following day’s opening price for %change in stock. Similarly, if a call occurred end-of-day Friday, I would compare Friday’s close price to Monday morning’s opening price to see the effects of the earnings call on the company’s stock.

Sentiment Analysis for Finance

Using Loughran-McDonald dictionary for additional finance words, and weighing certain words differently, I used this github repo to analyze sentiment of every earnings call transcript: https://github.com/jasonyip184/StockSentimentTrading.

The right figures show negative sentiment against percentage stock price change for earnings call remarks and Q&A; Both are quite similar, which makes sense because if the remarks of a call are negative, the q&A will probably also be quite negative.

However, overall remarks that saw a negative % change in stock price averaged a lower positivity score (0.519) than remarks with a positive % change in stock price (0.530) and a higher negativity score (0.188) than its counterpart (0.186).

Similarly, question & answer that saw a negative % change in stock price averaged a lower positivity score (0.551) than q&a with a positive % change in stock price (0.562) and a higher negativity score (0.185) than its counterpart (0.181).

negative_finance_seaborn_matplotlib_remarks.png
negative_sentiment_nlp_sns_seaborn_matplotlib_qa.png

Topic Modeling with NMF

I was interested in topic modeling with NMF to see what would come out of these earnings calls. Unsurprisingly, the topics were the industries that the company of the earnings call belonged to.

The graph below shows the number of transcripts (that I had available) that belonged to each industry based on the NMF topic modeling.

industry.png
 

Interestingly, when I looked at the industries (based on remarks) and their average negative sentiment, Insurance saw most negativity, followed by finance and manufacturing.

This could have just been the nature of the time periods of earnings calls that I scraped, perhaps the insurance industry wasn’t doing so well.

 
transcripts = []
for x in range(0,len(tscript)):
    transcripts.append(tscript['remarks'][x])

label = [e[:50]+"..." for e in transcripts]
vectorizer = CountVectorizer()
doc_word = vectorizer.fit_transform(transcripts)
nmf_model = NMF(15)
doc_topic = nmf_model.fit_transform(doc_word)

topic_word = pd.DataFrame(nmf_model.components_.round(3),
             index = ['component_' + str(i) for i in range(15)],
             columns = vectorizer.get_feature_names())
 
industry_finance_sentiment_analysis_nlp.png

Using Random Forest Classifier

Features:

  1. Open price

  2. Positive and negative polarity scores for Remarks and Question&Answer

  3. Difference of positive and negative polarity scores between Remarks and Q&A

  4. Day of week (dummy variables)

  5. Industry (dummy variables, from NMF)

  6. Quarter of earnings call (dummy variables)

  7. NASDAQ, NYSE or NYSE MKT (dummy variables)

  8. Difference of positive and negative polarity scores between remarks and Q&A Squared

 

Predicting whether stock movement is positive (1) or negative (0) following an earnings call

Screen Shot 2020-01-04 at 3.53.20 PM.png
Previous
Previous

Predicting Flight Delays

Next
Next

Predicting Altruism with NLP