The Challenge

Our client is a leading US-based FinTech company providing financial health scores for public and private companies. This requires scoring based on a set of different metrics extracted from financial reports, typically derived from filings such as those provided by the SEC in the form of 10-Ks.

The client faced a major challenge in scaling their process of analyzing and extracting financial data from the annual 10-K filings submitted to the U.S. Securities and Exchange Commission (SEC). These reports are typically extensive.

Historically, the process of extracting relevant financial information from these reports involved manual effort that required going through the data, validating the details, and identifying the critical metrics required for their scoring algorithms. This manual workflow had several drawbacks:

  • Prone to Human Error: Given the complexity and density of 10-K filings, human oversight could lead to missed data points or calculation errors, undermining the accuracy of the financial scores.
  • Lack of Scalability: Reviewing and extracting data from hundreds of pages in a single 10-K filing took significant time. As the demand for their financial health scores grew, the company needed a more scalable solution that could handle an increasing number of requests. They wanted to automate data extraction to keep pace with the growth of their service.

To address these pain points, the customer sought to integrate artificial intelligence (AI) to automate the extraction of financial data from the 10-K filings and streamline the process of calculating financial health scores.

The Solution

Business Score Solution with AI

We developed a custom AI-driven solution that automates the extraction of critical financial metrics from SEC 10-K filings. The application leverages the power of natural language processing (NLP) and machine learning to efficiently extract relevant information, process it, and provide it to the customer's scoring system.

Here's how the solution was designed and implemented:

1. SEC 10-K Retrieval

The first step in the automated pipeline is the collection of the 10-K filing associated with a specific company. Users of the application can input a company's name or ticker symbol. The system is integrated with the SEC's API, allowing it to:

  • Query the SEC's database to find the latest 10-K filing for the company.
  • Automatically download the financial section filing ensuring the most up-to-date information is used for analysis.

This eliminates the need for manual document retrieval and ensures the application is always working with the most recent and relevant data.

2. Document Parsing and Financial Report Extraction

Once the relevant financial section of the 10-K filing has been retrieved, the system creates vector embeddings for the text. These vector embeddings are dense, numerical representations of the text that capture semantic information and allow the AI system to:

  • Identify relationships between various parts of the document.
  • Cluster similar information for easier identification of key metrics.
  • Improve searchability within the document for specific financial terms or phrases.

As the creation of vector embeddings is an expensive operation and published documents do not change we generate them once and store the the 10-K filings in ChromaDB for future use.

The main reason we chose ChromaDB as a persistence solution is its simplicity. It automatically creates the embeddings, so you do not have to use a separate tool (like OpenAI Embeddings for example). It also allows grouping documents per company in separate collections and adding metadata to different documents.

3. LLM Integration for Metric Extraction

The most critical part of the solution is the integration of a large language model (LLM) to extract specific financial metrics from the parsed and embedded data. The LLM is able to recognize the specific metrics required by the customer's financial health scoring system, including but not limited to:

  • Revenue growth rate
  • Profit margins
  • Debt-to-equity ratio
  • Free cash flow

In this specific solution, we chose the gpt-4o models from OpenAI. They do a pretty good job of extracting financial data. We use two separate prompts:

  1. The first one is responsible for calculating any intermediate data and also provides reasoning for the calculations
  2. The second one is responsible only for the structured output of the intermediate data. This is a nice functionality of the OpenAI API, which allows the desired data to be mapped to a specific object shape denoted by JSON Schema.

The consumer of this service receives both the final output of the prompt as well as the reasoning behind it.

4. User Interaction for Exception Handling

While the automated system is designed to handle the vast majority of 10-K filings independently, there are always edge cases where the AI might be unsure or unable to extract a certain metric with high confidence. For example, if a company's financial statements are highly unusual or if the 10-K filing contains ambiguous language, the system will flag these cases.

In such instances, the application will prompts the user to manually review the document. This ensures that the customer receives accurate results without being burdened by constant oversight of the automated process.

Technical Overview

Below you can see the architectural diagram of the solution.

Architectural Diagram

The flow is the following:

  1. The user provides the company name or ticker symbol
  2. The application checks if the latest 10-K filings to the company in question are already cached
    1. If not:
      1. The fillings are downloaded
      2. They are split into sentences
      3. These sentences are stored in ChromaDB
  3. The application extracts the relevant company information from the DB
  4. This information is passed to the first prompt, which is responsible for calculating the intermediate values as well as providing the reasoning behind these calculations
  5. The intermediate values are passed to the second prompt which outputs the final result
  6. The UI is updated with the final result as well as with the reasoning behind it

Below you can see the output of the application:

Business Score Application Output

Results and Impact

The implementation of this AI-powered solution provided substantial benefits to the customer, including:

1. Dramatic Time Savings

By automating the retrieval, parsing, and extraction of financial data from 10-K filings, the time required to analyze a filing dropped from several hours to just a few minutes. This allowed the customer to:

  • Significantly speed up their financial health score generation process. 15 min manual work is reduced to 30 sec of automated processing.
  • Handle a larger volume of companies, providing more up-to-date insights to their clients.

2. Increased Accuracy and Consistency

Leveraging AI and LLMs ensured that the data extracted from the 10-K filings was accurate and consistent, reducing the risk of human error. The machine learning models were able to consistently extract the correct metrics, even from filings with varying formats and structures.

3. Scalability

With the manual workload dramatically reduced, the customer was able to scale their operations without needing to hire additional staff. The automated system could handle a significantly higher number of filings, enabling the company to serve a broader range of clients and expand its market presence.

4. Generic Solution

The nature of the solutions allows enhancing it in a way that it can analyze other types of documents as well. This allows for further future improvements.

Conclusion

By integrating AI, we developed a solution that addressed the customer's need for faster, more accurate, and scalable data extraction from SEC 10-K filings. This solution enabled the customer to automate a critical part of their workflow, freeing up resources, improving accuracy, and allowing them to deliver high-quality financial health scores at scale.

Read more about our experience with AI.

Do you need an AI expertise at your company?

Check out the AI services we offer and don't hesitate to contact us for a free consultation.