Find Word Mentions in Conference Call Transcripts

With thousands of companies publicly trading in just North America, it is clearly impossible to keep track of every single relationship that exists between them. But companies don’t work in isolation, they work with suppliers and service providers, they have upstream and downstream relationships with other publicly trading companies. What this means, success of one enterprise is usually not isolated, and spills over into other companies’ profits.

From a practical standpoint, if you know of a great product developed by one company, you can trace who else benefits from this success. Corporate management will definitely mention important developments in their quarterly and annual earnings calls, and so by examining their transcripts we can find such relationships.

Lets see who potentially benefited from the success of the ‘Fortnite’ game.

Import Libraries, Set Path, Read index.txt

This is a standard way to start working with the earnings data set downloaded from our website.

import os
import pandas as pd
from datetime import datetime

path_to_index = 'C:/PATH_TO_INDEX.TXT/'
os.chdir(path_to_index)

df = pd.read_csv('index.txt', sep='|', header=0, index_col=0)

Get Dataframe Length and Copy Structure

Before we start reading all the transcripts, we’ll first of all get the length of our main dataframe ‘df’ for logging purposes. Even through the entire process should not take more than a few minutes (depending in your computer processing power), it is always a good idea to track the state of any long loops.

The second thing we want to do is to copy dataframe column structure into a brand new dataframe, so that we can record information about earnings that were a match to our criteria.

total_rows = len(df)
df_final = pd.DataFrame(columns=df.columns)

Finding ‘Fortinite’ in Transcripts and Recording It

for file_index, row in df.iterrows():
    with open('transcripts/'+str(file_index)+'.txt', 'r') as f:
        if 'fortnite' in f.read().lower():
            df_final = df_final.append(df.loc[file_index])

    if file_index % 1000 == 0:
        print(datetime.now().strftime("%Y-%m-%d %H:%M:%S") + ' - '+str(file_index)+'/'+str(total_rows))

print('Done!')

The first four rows is really all we need to do the job.

‘df.iterrows()’ allows is to loop through the dataframe via iterators. Iterators keep only the current record in memory, so we do not waste RAM while working through this.

‘with open(…) as f’ ensures we open the file to read, and when done, automatically closed it. Because we are looking for a simple word in the transcript, we do not need to use any regular expressions (because they are computationally expensive), and can go with a simple ‘string’ in ‘string’ method. Notice how we use method ‘.lower()’ to convert the entire transcript into lowercase values, so when we look for ‘fortnite’ we do not need to worry about capitalization. By default Python text search is case-sensitive.

In the following rows

    if file_index % 1000 == 0:
        print(datetime.now().strftime("%Y-%m-%d %H:%M:%S") + ' - '+str(file_index)+'/'+str(total_rows))

We use a percent sign to do a little trick – we check for the left overs after division. If ‘file_index’ is fully divisible by ‘1000’, the left over will be equal to zero. If that is the case, we’ll print the current time, current position and total number of rows. This way we know how far we’ve come and can approximate how much more time it will take to finish the process.

You should see something like this:

...
2019-07-02 21:46:58 - 112000/141755
2019-07-02 21:46:59 - 113000/141755
2019-07-02 21:46:59 - 114000/141755
...

Once the loop is over and there are no more earnings transcripts to read, it’s a good idea to do a simple ‘print’ statement, so you know we’re done with the loop now

Done!

So, Who Mentioned ‘Fortnite’?

When we measure the length of df_final

print(len(df_final))
print(len(df_final['stock_symbol'].unique()))

As of May 31, 2019 data there are 70 different earning calls with the word ‘Fortnite’ in them by 34 different companies!

So what are the companies talking about it most frequently?
We’ll do a little trick to concatenate stock_symbol and stock_name into a single field, and then do value_counts.

df_final['long_name'] = df_final['stock_symbol'] + ' - ' + df_final['stock_name']
print(df_final['long_name'].value_counts())

Here are top 6 results:

HEAR - Turtle Beach Corporation                  6
TCEHY - Tencent Holdings Limited                 6
FNKO - Funko, Inc.                               5
GME - GameStop Corp.                             5
HAS - Hasbro, Inc.                               4
EA - Electronic Arts Inc.                        4

What do we learn?

  • Turtle Beach – creates gaming headsets
  • Tencent – an online advertising company
  • Funko – produces pop culture products
  • GameStop – video game retailer
  • Hasbro – an entertainment company
  • Electronic Arts – is a direct competitor to Fortnite developer with their game ‘Anthem’

Have you checked these companies stock performance yet?
Well, there are 28 more companies talking about ‘Fortnite’ in their earnings!

For a deeper dive and to read actual earnings, we can do something like the following filter on Electronic Arts:

df_final[df_final['stock_symbol']=='EA'][['reporting_year', 'reporting_quarter']]
       reporting_year reporting_quarter
124011           2018                 4
126309           2019                 1
134552           2019                 3
140204           2019                 4

And now we know when exactly they were talking about, so we can go ahead an investigate the actual transcripts

Sources

View the complete code on GitHub.

Earnings Conference Call Transcripts available for download now.