Link to Curtin University CurtinSearch | Curtin Site Index | CEEBI Site Index | CEEBI Intranet       
     
 
Centre For Extended Enterprises and Business Intellgence
Curtin University Centre of Excellence
  CEEBI
  Objectives
  Constitution
  Reports
  Staff
  PhD Student
  Scholarships
  Research
  AEG
  XML
  Ontology
  e-Learning
  Multisite SE
  WEB Services
  Security & Privacy
  Logistics Informatics
  Trusted & Reputation
  Research Training
  Research Exchange
  Work Experience
  FIT Courses
  Publications
  Books
  Journal Papers
  Conference Papers
  Activities
  AoRE
  Exel
  CITA
  Seminar
  PaTREC/CEEBI Short Course
  Conferences
  IFIP TC 12.2/12.4
  INDIN 2005
  ACM SWS
  mBusiness 2005
  PDM 2006
  IEEE-DEST 2007
  PDM 2007
  Collaborators
  Industry & Goverment
  Curtin Network
  National Network
  International Network
  Contact Us

   
Staffs XML
 


Research into Text Mining, Semantic Reasoning and XML Document Management

In this research, we aim to develop a set of advanced methods and tools known as Document Screening, Text Mining and Semantic Reasoning Technology (Doc-SMART), that can screen the document like an X-ray, extract knowledge like a Philosopher and investigate the truth like a Forensic Expert. It is used for screening any kind, any type of documents (reports, articles, text, emails etc) to help organization (Gov, private etc) have better understanding of their administration, management, intelligence, fraud detection, workload, productivity and security.

Document Screening is a document X-ray method (X-Ray usually used for body or luggage screening in Airport). It uses Natural Language Processing (NLP) techniques to read a document word by word, sustenance by sentence and uses dictionaries; thesaurus or grammar rules to give statistical data according to our pre-defined criteria automatically. The output will be number of sentences, words, un-know words, strange-grammar, spell-errors, indices ratio, etc) to be used for text mining and target analysis and the things human cannot do efficiently. The key challenge is speed of screening, as we talk about long documents and hundreds of thousands of them and we want normally hours or days of work done in seconds or minutes.

Text Mining is a knowledge extraction from text technique (like a Philosopher), it uses AI (artificial intelligent) principles, machine learning algorithms, pattern matching and model correlation approaches to identify regularities, patterns, concepts (words in the document) and their relationships and validate the documents according to our expectations, such as pre-defined models or rules or target keywords (and if it does not meet with the expectations, it will go though scientific semantic reasoning). The output will be a summary of the document or documents and represent it visually (text and pictorial presentation) for which human cannot do without length, time and effort. The key challenge is accuracy of the knowledge extraction and abstraction against text.

Semantic Reasoning is a scientific semantic reasoner, like a forensic expert, it carries out detailed and aggressive investigation, interrogation, decryption, de-bedding (opposite embedding, hiding secrets in text, steganographic techniques) coupled with fuzzy analysis or association rule synthesis to classify, diagnose and predict the real semantics (truth) of the document. The output provides intelligence or evidence on issues such as fraud, secret, security, piracy etc. The key challenge is the quality of intelligence of the reasoning and automation.

Research into XML and XML Databases
Conventional data warehouses designs, which are primarily centred on relational databases have their own limitations. The two well-known models/schemas, star and snow flake due to their bases in relational models fail to adequately represent the semantics and operations of multi-dimensional data. There is always the problem of running complex (aggregate) queries on complex data. Also efficient execution of SQL queries is limited when drilling down in a data warehouse based on these models. The relatively new model/schema, star flake, which is the merger between the star and snowflake schema, manages to address some of these issues but fails to address all of them. The proposed new research is to explore current and new data warehouse design techniques and propose a new warehouse design technique to capture data semantics, business rules and business cost associated with each piece of information stored in or retrieved from the data warehouse. The new warehouse design will be based on the Object-Relational data model due to its ability to (1) capture both dynamic and static aspect of data ware houses, (2) utilise the growing O-R database market and (3) encapsulate many business functions as possible to make the data warehouse a stand-alone solution.

With a uniform effort to standardize the format for data representation, EXtensible Markup Language (XML) has gained the spotlight in recent years. Characteristics of XML have made it become a standard data format; the use of XML Schema gives a powerful schematic description to the XML data, and it maintains the data integrity between the XML element and attributes. The wide adaptation to XML creates a demand for efficient XML data management. XML database is introduced as a mechanism provides native support for XML document or data. This has attracted a profound interest in research on the performance of XML database and there have been a number of benchmarks proposed specially crafted for XML database. To supplement the benchmark process, we have developed a synthetic XML (xGEN) data generator for creating a large collection of data set specially for testing purpose. It can generate XML Schema based, context sensitive XML data with independent context of domain. The replication process is automated without additional annotated definition to the source documents.

 
   Total Counter visited
Look ever forever~~~~
Curtin University of Technology CRICOS provider code 00301J Copyright and Disclaimer
Copyright 2004 CEEBI All rights reserved.
Director Prof. Elizabeth Chang
To report errors on this web site please e-mail: CEEBI Web Development Team
Last modified 26 February, 2006
Goto Centre Home