This presentation is a very general summary of the work that I did here at ORNL.
This poster is meant to give the viewer a big picture look at my project. It definately helps if I'm standing next to it!
Noun Phrase Extraction Presentation
This presentation contains an overview of the various, current noun phrase extraction techniques, with a primary focus in the area of machine learning.
This is the mission statement that was required by the RAMS program. It states in very simple terms what my general goals are for this summer.
This archive contains my daily notes from this internship.
With the growing problem of data overabundance, many techniques for knowledge discovery have been formulated. Document clustering is one such method. This method aims to classify documents into meaningful categories that allow the user to quickly browse to pertinent information. Current techniques employ a word by word comparison of each document to be clustered. Certain “stop words” are not included in this process; determiners, conjunctions, prepositions, and pronouns do not lend any additional information to a sentence. This can be very computationally very expensive. We propose a new method of feature selection for document clustering which allows us to perform fewer comparisons. Proprietary issues require me to be no more descriptive than that at this time. Preliminary results suggest a 9% increase in accuracy over the established technique, as well as an 82% reduction in comparisons. Because only the most significant words are being used for clustering, documents are more likely to be clustered correctly. This drastic reduction in comparisons also means that this technique is much less computationally complex.