Data warehouse can be described as a special purpose data repository that integrates information form multiple operational databases in order to enable strategic analysis and decision-making. Dimensional modeling, which is the most prevalent technique for modeling data warehouses, organizes tables into fact tables containing basic quantitative measurements of a subject of analysis and dimension tables that provide descriptions of the facts being stored. The data model that is produced by this method is known as a star-schema (or star-schema extension such as snowflake or constellation). A simple star-schema model of a data warehouse for a retail company may consists of a fact table which contains the sale figures for each sale transaction and four dimensions: Product, Customer, Location, and Calendar, to whom the fact table is connected via foreign keys. Such fact tables typically provide the data for association-rule mining in data warehouses.
We have proposed Qualified Association Rules that extend the scope of association rule mining in a way that is useful for a host of real world situations and especially applicable for the data warehouse environment. We are investigating different aspects of qualified association rules, including cost-based optimizations and dealing with aggregate data.
The first component is an auction environment that can be used for setting up web-based auction experiments (for instructional and research purposes) and running simulations, as well as data-mining the collected data.
For students trying to learn the basics of economic theory, the concepts of demand and supply, and that prices adjust in a market to equate the two, may appear as abstract concepts that have very little to do with reality. In fact, much of economic theory takes as given that prices somehow adjust to equate demand and supply, and spends very little time on analyzing the price formation process. Recent advances on the Internet, however, have brought these abstract concepts much closer to us: online auction sites like eBay have allowed millions of households to become active participants as buyers and sellers in the market for second-hand and collectible goods.
Our aim is to create an online auction environment that will help our students, and ourselves, to interact with and to investigate the operation of a market. This environment will enable our students in introductory economics courses to become participants in market experiments, in which they can take the role of suppliers and buyers whose actions determine the market price. This will enable them to experience first-hand the notions of competition, uncertainty, asymmetric information, strategic decision-making, and market efficiency; solidifying their theoretical understanding of these concepts. It will also enable more advanced economics and computer science students to evaluate the performance of different market rules and institutions through the use of controlled experiments conducted with human subjects and/or with computerized agents.
The second component is devising data mining techniques for mining data from online auction sites such as Ebay. Our goal is not only to discover interesting patterns but also to facilitate, interface, and support economincs-oriented research. For example, we are working on detecting fraud such as shill bidding and high-risk, suspicios auction items as well as quantifying the effects of feedback on auction prices.
This project is a collaboration with Ali Hortacsu (Economics, University of Chicago) and Anne Rogers (Computer Science, University of Chicago). It is funded in part by a grant from the University of Chicago's Provost's Program for Academic Technology Innovation.
The flexibility of XML is convenient but it has its price: lack of fixed simple schema which makes XML data very difficult to manage efficiently. Thus, mining XML inevitably involves the discovery of a succinct description of the particular dataset. The goal of this project is to develop a framework and efficient methods to perform such discoveries and use them to integrate XML data with other data analysis and management methods.