Platform





Gridstone uses its proprietary SmartExtract technology for data extraction, serving the needs of its clients, and offering Turnkey Data Extraction solutions. Components of SmartExtract can also be licensed for use in other data extraction applications.

dataextractiontoolsSmartExtract comprises the following key modules and features:

  • Multiple documents types: The system is capable of handling documents in most popular formats -HTML, DOC, XLS, ASCII, PDF, etc.
  • Smart table parsing: A proprietary heuristics library is used for extracting numbers and all relevant attributes from tables and the surrounding text, including complex tables such as cross-tabs, tables with complex columns, etc.
  • Line item identification: A sophisticated accounting-aware semantic rules engine is used to identify the position in the taxonomy that a line should be mapped to. The rules engine is capable of handling nested lines, breakouts, and operational metrics and learns on the fly from operator resolution of exceptions. This rules engine lies at the heart of the “smartness” of the system, and results in a significant proportion of lines being automatically identified correctly.
  • Numbers-in-text processing: Natural Language Processing techniques and proprietary extraction heuristics are used to identify “interesting” numbers (i.e., excluding dates, section numbers, etc.) and relevant attributes for these numbers, including suggesting a relevant line in the taxonomy that a number should be mapped to.
  • SmartCapture, a high productivity UI: Gridstone has invested considerable effort in designing an intuitive user interface. This interface is optimized for capturing data for multiple periods, and for identifying restatements and mapping errors. This allows a user of SmartCapture to handle the few exceptions efficiently and build history on a company very rapidly.
  • Click-through auditability: SmartCapture stores the location of each captured number using a proprietary algorithm that allows true and accurate click through auditability of every number.
  • Data quality verification: An exhaustive set of taxonomy-based rules allow automated data quality verification at multiple stages in the process, ensuring a high level of data quality. By eliminating manual data entry, Gridstone has eliminated a significant source of capture errors. The taxonomy-based data quality rules engine eliminates the small proportion of errors that could creep in due to mapping issues.

Taxonomy-based transformation and standardization: Captured data is transformed (for example, absolute numbers are computed from growth or %-of numbers; 4Q numbers are calculated from data for FY and the first three quarters) and then standardized for comparison. All transformations and standardizations remember the rules used, permitting an easy audit trail from the captured number to the presented (derived) number or ratio.