WCSE 2015
ISBN: 978-981-09-5471-0 DOI: 10.18178/wcse.2015.04.014

From Unstructured to Structured Tabular Data Using a Rule Engine

Alexey O. Shigarov, Igor V. Bychkov

Abstract— Today, a huge amount of unstructured tabular data is contained in tables from different sources, e.g. image documents, web pages, and spreadsheets. Sometimes these tables are only available data source. To use that information in business intelligence we need to transform data from these tables to structured form like relational databases. We propose an approach to the tabular data transformation from unstructured (spreadsheets) to structured (relational databases) form using a rule engine. Our table interpretation rules can use spatial, style (typographical), and natural language information from tables. The experimental evaluation shows that the approach can be applied to a wide range of tables from statistical and financial reports.

Index Terms— unstructured tabular data integration, table understanding, information extraction from tables, table analysis and interpretation.

Institute for System Dynamics and Control Theory of SB RAS, RUSSIA


Cite: Alexey O. Shigarov, Igor V. Bychkov, "From Unstructured to Structured Tabular Data Using a Rule Engine," 2015 The 5th International Workshop on Computer Science and Engineering-Information Processing and Control Engineering (WCSE 2015-IPCE), pp. 85-91, Moscow, Russia, April 15-17, 2015.