Designing Data Marts from XML and Relational Data Sources
Yasser Hachaichi, Jamel Feki, Hanene Ben-Abdallah
Mir@cl Laboratory, Faculté des Sciences Economiques et de Gestion, Tunisia
Synopsis Abstract:
Data warehousing being a dominant dimension in information storage, retrieval and understanding; comes up with a challenge of managing data of versatile structure. Relational data sources and XML structured data are found popular for storage. The chapter considers describing in detail the method of designing data mart to a given DTD from the above mentioned storage structures; relational data sources and XML documents. The method is also supported by the explanation of a use case example from the real world.
Synopsis:
In the competitive market of the present, where think tanks enrich companies with their creative and innovative prospective of judging scenarios, data warehousing plays an equal role for providing fact based statements by analyzing data been stored in huge amount. Decision support system (DSS) assist data warehouse (DW) in analyzing this large volumes of data. The companies mostly provide the data to DW in form one relational data source [1], but for an optimum DW it is mandatory to provide the flexibility of changing the source type to any other type like XML one of the most common data structure for web resources mainly.
In previous related work, [2] proposed the way of creating a DW on the basis of a XML but with restriction that DTD to the XML was to be provided as well and that XML document would be well formed and valid to the given DTD. In the proposed work the designing method is divided into four steps:
Data Source Pretreatment: In this step the data is resolved from the structural differences of both the input data sources. In case of RDBMS, the structures of tables are extracted from the databases, whereas, for XML DTD works as the structure representative, which are then turned into the relational schemas.
Figure 1: Steps of Data Mart designing
Relation Classification: The obtained source schemas are then examined for the relations in it and conceptual classes are made for every relation which later helps in improving results for fact extraction.
Data Mart Schema Construction: In this step facts with their values and dimensions along with their attributes are retrieved on the basis of extraction rules been defined in order to make star models. These rules are traceable acting upon the syntax mostly and can be used for extraction from various semantically different sources.
Data Mart Adaptation: In this final step the designer of the whole process is given various applicable models been created from the sources. These schemas are adjustable to meet final hour changes for meeting designer’s full satisfaction.
“Conception Assistée de Magasins et d’Entrepôts de données” CAME case toolset provides the implementation of the above mention four steps. An example of e-ticket DTD is also used to elaborate the working of the DM construction from XML document.
Figure 2: CAME case toolset snapshot
Used Abbreviations:
XML: Extensible Markup Language
DTD: Document Type Definition
RDBMS: Relational Databases Management System
References:
[1] Bruckner, R., List, B., & Schiefer, J. (2001). Developing requirements for data warehouse systems with use cases. In Proceedings of the 7th Americas Conf. on Information Systems (pp. 329-335).
[2] Golfarelli, M., Maio, D., & Rizzi, S. (1998). Conceptual design of data warehouses from E/R schemas. In Proceedings of the Conference on System Sciences, Kona, Hawaii. Washington, DC, USA: IEEE Computer Society.
