Designing Data Marts from XML and Relational Data Sources

November 13, 2010 Leave a comment

Yasser Hachaichi, Jamel Feki, Hanene Ben-Abdallah
Mir@cl Laboratory, Faculté des Sciences Economiques et de Gestion, Tunisia

Synopsis Abstract:
Data warehousing being a dominant dimension in information storage, retrieval and understanding; comes up with a challenge of managing data of versatile structure. Relational data sources and XML structured data are found popular for storage. The chapter considers describing in detail the method of designing data mart to a given DTD from the above mentioned storage structures; relational data sources and XML documents. The method is also supported by the explanation of a use case example from the real world. Read more…

“Visual Summarization of Web Pages”

November 13, 2010 Leave a comment

Binxing Jiao, Linjun Yang, Jizheng, Feng Wu
Microsoft Research Area, Beijing

Visual Summarization is an innovative new way of representing web pages in a brief yet comprehensive manner. There are mainly two achievements from such summarization. Firstly they act as an overview for webpage retrieval systems and users find it much feasible to look at glimpse of the webpage before visiting it. Secondly, in the task of re-finding visited web pages, visual summarization, are very helpful. Google Chrome, Mozilla FireFox and safari web browsers provide a visual list of most pages order by recent visit or most visits etc. Read more…

Radix sort (by Count Sort), Depth First Search and Breadth First Search

May 18, 2010 3 comments


Experimentation Details:

The experiment is conducted to analyze the Radix sort, Depth first search and breadth first search algorithms. For Radix sort 9 different data sets are experimented varying in length (both count ‘n’ and number of digits ‘d’). Following are the functions for used for experimentation; n= {500, 5000, 15000}, d= {5, 10, 15} Read more…

CiteSeerx- A survey

April 27, 2010 Leave a comment
  • Focused Crawling Using Context Graphs
  • CiteSeerX – A Scalable Autonomous Scientific Digital Library
  • Are Your Citations Clean?


The aim was to identify different problems faced by Citeseer and propose appropriate solutions by introducing a new architecture in order achieve scalability, flexibility and increase in performance. Citeseer was firstly introduced in 1997 and its nature of being available for non commercial and costless increased its popularity in the World Wide Web. In a span of few years it grew into a proper functional digital library storing over 730,000 documents and more than 8 million citations. CiteSeer got its regular users in the form of students, teachers and researchers. After its rapid growth till 2005 the existing architecture dropped down its efficiency due to limited framework and less scalable structure. Resultantly many users experienced problems during accessing it caused by excessive quantity of documents and unexpected increase in latency. Read more…

Sorting Algorithm Experimentation

April 6, 2010 Leave a comment

Experimental Setup:

The experimentation was took place to check the time complexity of the sorting algorithms Quick sort, Merge sort and Heap sort. Details of the experimentation are as follows:

Read more…

Task over PIMS..

March 31, 2010 Leave a comment

Personal Information Management Systems (PIMS) are been widely used, as the aim of it is to provide the end user with proper management techniques which are easy to use and understand for even non technical users. PIMS provide a proper order to store, manage and analyze personal data. Searching is a vital component for PIMS which used by end user very frequently. Most of the PIMS use basic query based searching, which gets a hectic task when dealing with a large data set. In the document, we will examine different aspects related to introduction of facets based searching practices for a targeted system. Read more…

