Complicated Data? Smonik Goes Far Beyond Typical Data Extraction


admin | August 25, 2020| 12:27 pm

The ability to extract data from structured and unstructured documents, specifically as it relates to alternative investment data, has become a hot topic in the institutional investor and data aggregation community. Technology for data processing has improved, and a few firms are out there selling software and/or services to accomplish the task of turning unstructured data, usually delivered in a PDF document format, into usable data available to be integrated with downstream systems. Although they are adept at extracting data from simpler document formats, such as capital statements, capital calls and distribution notices, what about more complex documents? Could they extract the financial highlights sections from a 300-page financial statement, or Schedule of Investments data so that it lines up properly in a spreadsheet output format? This is where you need a data extraction methodology with far more complex functionality.

Private capital investors rely on accurate information and must be able to produce it efficiently. Endowments, foundations, pension plans, and family offices digest large volumes of complicated material. Manually extracting relevant data costs a significant amount of time and runs the risk of human error, especially when the files are difficult to interpret. Investors must employ an intelligent, automated system able to process a wide range of complex documents or they may inadvertently exclude valuable data.

Typical systems used by private capital investors are not equipped to process complicated material. A few vendors have addressed the issue of automating data extraction but have only succeeded with the most common file types: capital statements, capital calls, and distribution notices. These documents have a relatively simple, predictable format, making them easier to process. Detailed portfolio data, such as the Schedule of Investments (SOI), financial highlights, and per share data, can be significantly more complex to extract. A more robust solution is required.To take advantage of the valuable data embedded in more complex documents, institutional investors need a system as flexible, and sophisticated, as Smonik. Smonik’s data management products allow users to process vast amounts of complex documents quickly and accurately. The platform can extract useful data from any Excel, text, CSV, XML, fixed-position file or PDF documents, no matter how complex or inconsistent the format. Complex files with unstructured data are handled easily with Smonik.Two financial statement PDFs recently processed by Smonik demonstrate the accuracy and efficiency of the system. The first document (PDF 1) represented a lengthy report with extensive data. The second (PDF 2) represented a complicated document with an unusual layout. Smonik systems converted both PDFs into useful data in seconds.PDF 1 was a 111-page financial statement with abundant information. The client wanted to extract the Financial Highlights for each of the eleven funds included. The complexity of extracting the data relates to the various number of columns, representing the time periods presented, and the format of the row headings – one line versus multiple lines. In seconds, the 111-page document was reduced to 11 rows of extracted data representing the financial highlights for 11 funds.