ISBN: 978-981-09-5471-0 DOI: 10.18178/wcse.2015.04.014
From Unstructured to Structured Tabular Data Using a Rule Engine
Abstract— Today, a huge amount of unstructured tabular data is contained in tables from different
sources, e.g. image documents, web pages, and spreadsheets. Sometimes these tables are only
available data source. To use that information in business intelligence we need to transform data
from these tables to structured form like relational databases. We propose an approach to the tabular
data transformation from unstructured (spreadsheets) to structured (relational databases) form using
a rule engine. Our table interpretation rules can use spatial, style (typographical), and natural
language information from tables. The experimental evaluation shows that the approach can be
applied to a wide range of tables from statistical and financial reports.
Index Terms— unstructured tabular data integration, table understanding, information extraction from tables, table analysis and interpretation.
Alexey O. Shigarov, Igor V. Bychkov
Institute for System Dynamics and Control Theory of SB RAS, RUSSIA
Cite: Alexey O. Shigarov, Igor V. Bychkov, "From Unstructured to Structured Tabular Data Using a Rule Engine," 2015 The 5th International Workshop on Computer Science and Engineering-Information Processing and Control Engineering (WCSE 2015-IPCE), pp. 85-91, Moscow, Russia, April 15-17, 2015.