10-K filings are rich sources of data about a firm’s financial health, strategy, and risk factors. However, due to their length and complexity, investors may not fully process all the nuanced information they contain. The “Lazy Prices” paper, which this code closely aligns with, posits that changes in the language used in these filings can signal important shifts in a company’s prospects.
“Changes to the language and construction of financial reports also have strong implications for firms’ future returns: a portfolio that shorts “changers” and buys “non-changers” earns up to 188 basis points in monthly alphas (over 22% per year) in the future.”
The authors demonstrate that these textual changes have predictive power for future stock returns, suggesting that the market underreacts to this information initially. Hence, the term “Lazy Prices”.
To empirically investigate this theory, this repo is designed to contain code that performs textual analysis tasks on the 10-K files from 1993 to 2024 for the S&P 500 firms.
Data Acquisition and Loading: The first step is to gather all the 10-K filings to analyse. There’s three ways to go about this:
Now that the data is prepared, for the actual textual analysis, we determined it would be best to utilise the cosine similarity to measure the changes in the texts.
Filing dates are incorporated to ensure accurate timing of information and returns.
Symbol | CIK | Filing Date | Filing Year | Cosine Distance | Cosine Similarity | Return Measures | Bin |
High similarity bins outperforming all other bins: The analysis indicates that portfolios composed of firms with high 10-K filing similarity (Bin 5) demonstrated the strongest performance over the long term. This suggests that companies with consistent disclosures, ie going long on “non-changers”, tend to provide better returns.
Mixed Results for some bins: Contrary to some expectations, the performance of portfolios with lower similarity filings (Bins 1-3) was not uniformly poor. While Bin 3 underperformed, Bins 1, 2, and 4 showed varying degrees of positive returns, with Bins 2 and 4 showing strong performance, though not as strong as Bin 5.
The correlation matrix revealed that there isn’t a straightforward linear relationship between cosine similarity and short-term returns. This implies that the impact of textual similarity on stock performance might depend on other factors specific to the firm.
Our version of re-replicating the original paper generally supports the idea that textual similarity in 10-K filings has predictive power for future stock returns.
Our original hypothesis that similar disclosures are associated with stronger long-term stock performance. Significant changes or novelty in financial disclosures might signal increased risk or uncertainty, potentially leading to weaker or more inconsistent returns.
Investors may benefit from paying attention to the consistency of language in 10-K filings, as it can provide insights into a firm’s future prospects.
While high similarity generally correlates with positive returns, it’s crucial to consider other firm-specific characteristics to get a complete picture, as we did not attempt to prove a casual link between the two.
Further research could explore the specific types of textual changes or sentiment analysis that are most predictive of stock performance and the reasons behind the market’s reaction to these changes.