August 10th

  • Attended final ceremonies for the RAMS program
  • Was dismissed!

August 9th

  • Made final revisions on the technical report and turned it in for consideration

August 8th

  • Listened to other students perform their final presentations
  • Performed final presentation
  • Continued making the suggested revisions on the technical report
  • Presented my poster at the poster session

August 6th

  • Turned in rough draft of technical report to both Cathy and Robert
  • Worked on the final presentation to be given on Wednesday
  • Continued work on the technical report

August 3rd

  • Worked on the final technical report
  • Met with Debbie McCoy to discuss the problems with both my presentation and poster
  • Rewrote poster for Debbie McCoy

August 2nd

  • Met with both Robert and Cathy to discuss the final days of the project and the plans
  • Continued work on the technical report

August 1st

  • Designed the new cluster decider
  • Implemented this algorithm on the test set
  • Calculated F-Measure for this result
  • Met with Cathy to discuss this result

July 31st

  • Realized that we had a drastic improvement in the clustering results
  • Investigated possible performance measures for this result
  • Worked on the final technical report for this project

July 30th

  • Continued analyzing our results
  • Came up with the idea to weight the TF-ICF less, in order to pick out more anomalies
  • Implemented this idea and let it run over night

July 27th

  • New anomalies had the same problems as the old ones

July 26th

  • Added clusterDecider() to UPGMA algorithm - this method splits up the UPGMA tree into however many clusters you like!
  • Saw word clusters for the very first time
  • Realized problem with IDF equation.
  • Set AnomalyFinder to run overnight and calculate the new anomalies based on the new TFIDF data

July 25th

  • Spent the majority of the day making the DHS poster
  • Also spent some time working on my own poster
  • Attended the group meeting to discuss progress

July 24th

  • Spent most of the day tracking down little bugs in the UPGMA clustering algorithm
  • Got the UPGMA clustering algorithm working!! That was cool
  • Wrote a new global frequency finder to sift through multiple corpus tables for the corpus global frequencies
  • Started the corpusGFFinder running on the simple noun phrase test results and corpus

July 23rd

  • Finished implementation, and began debugging UPGMA clustering algorithm
  • Met with Robert and Jim about helping out with the DHS poster
  • Ran simple noun phrase chunker on the test data
  • Ran complex noun phrase chunker on the test data
  • Reviewed a paper for submission

July 20th

  • Began implementing the UPGMA clustering algorithm
  • Finished merging tables for simple noun phrases
  • Began gathering TF data for simple noun phrases

July 19th

  • Spent most of the day preparing application to the Richard Tapia conference
  • Researched clustering algorithms to work with our document set

July 18th

  • Wrote my abstract for submission to the RAMS
  • Reworked my presentation according to the new template

July 17th

  • Made the rough draft for my final poster: link
  • Wrote the rough draft of my final presentation

July 16th

  • Spent most of the day tracking down a bug in the Euclidean distance algorithm
  • Generated more Euclidean distance matrices... I think we finally have it right!
  • Met with Cathy and Robert to discuss the project

July 13th

  • Spent the ENTIRE day preparing the rough draft for my final presentation

July 12th

  • Attended the optional lecture on "adaptive mesh refinement for the calculation of thermal flux in a nuclear reactor"... or something like that!
  • Continued working on the Euclidean distance measure
  • Generated Euclidean distance
  • Continued assembling noun phrase data into tables
  • Restarted CNPE's on several machines

July 11th

  • Continued to refine the overlap function/metric to include weighting
  • Began assembling all of the simple noun phrase data into several large tables
  • Began working on the Euclidean distance measure for the data
  • Attended group progress meeting
  • Attended the required lecture "Abstracts, Papers, and Posters"

July 10th

  • Came up with overlap performance metric
  • Met with Cathy to determine additional metrics and to discuss preliminary results
  • Programmed the overlap performance metric
  • Generated similarity matrix using the overlap metric
  • Reread TFICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams

July 9th

  • Finished anomaly locator
  • Ran the locator on the test set
  • Graphed the anomaly data
  • Wrote progress report for Cathy and Robert

July 6th

  • Began development of the Anomaly Finder program
  • Started to formulate program structure for the final product
  • Decided on one possible performance metric

July 5th

  • Modified the word parser program to read from and write to the database
  • Wrote the GFfinder program to calculate global frequencies for the test data
  • Ran the test data
  • Graphed the test data vs. the corpus data
  • Formulated a more telling graph for demonstrative purposes

July 4th

  • Holiday!

July 3rd

  • Prepared notes for the morning meeting
  • Met with Cathy and Robert to talk about the direction of the project
  • Decided on the next direction and reviewed the timeline
  • Located test data for the project

July 2nd

  • Finally finished TFfinder and put together an Excel document
  • Put together notebook on project
  • Took notes on various possible evaluation metrics for the project
  • Read and took notes on Experiments in Single and Multi-Document Summarization Using MEAD

June 29th

  • First graph was supposed to be done today, but TFfinder crashed overnight, so it will have to be done by Monday
  • Read and took notes on:
    • Evaluation of Phrase-Representation Summarization based on Information Retrival Task
    • The Smart/Empire TIPSTER IR System
    • Using Names and Topics for New Event Detection
    • A System for New Event Detection
    • Recent Developments in Text Summarization

June 28th

  • Read the following papers:
    • Using Lines from the Body of Document and Key Words for Improving Display of Search Results in Information Retrieval Systems
    • A Method for Improving Automatic Word Categorization
    • Term-Weighting Approaches in Autmatic Text Retrival
    • Identifying the Subject of Documents in Digital Libraries Autmatically Using Frequently-Occurring Words - Study and Findings
    • Stochastic Models for Surface Information Extraction in Texts
    • Text-Learning and Related Intelligent Agents: A Survey

June 27th

  • Reworked my powerpoint presentation
  • Began gathering papers on the subject of text summarization
  • Did online research in this area
  • Gave power point presentation to the group to get them up to speed on my project
  • Attended the Big Bang seminar in Wigner

June 26th

  • Optimized term frequency database by applying an index
  • Attended brownbag lunch "Ethics in Computing"
  • Began gathering papers on the subject of term frequency extraction

June 25th

  • Reorganized the data into alphabetical order and recalculated IDF
  • Cleared up some misconceptions with Dr.Jiao
  • Spent a while trying to get the data in the right place for easy manipulation
  • Wrote a program (TFfinder.java) to extract term frequencies for the tokens included in the global frequency table - it also calculates the average term frequency for each token and stores it in the database

June 22nd

  • Met with Dr. Jiao to go over what has been done on the project while she was away
  • Attended website meeting to get criticism/ commentary
  • Started to gather data for the first graph
  • Calculated IDF for the 229023 significant terms collected

June 21st

  • Read documentation for the Greenwood parser to better understand the rule syntax
  • Worked out several new rules to increase the accuracy of the parser particularly on cardinal numbers
  • Tested, debugged, and applied new rules
  • Divided the remainder of the 1,000,000 documents into smaller sets for parallelization
  • Fixed a small problem with the parser and reinitialized the crashed instances
  • Began development on the common expressions matcher

June 20th

  • Spent the majority of the day improving accuracy of the Greenwood parser as well as getting it ready to run on other machines
  • Set up the parser on Robert's machine, two Tigershark machines, Paul's, Jim's, and of course my own.
  • Attended "Fat's where it's at" seminar at 3:30

June 19th

  • Spent the majority of the day getting the Greenwood parser to take input in the correct format
  • Met with Robert to talk about status and direction
  • Successfully got the parser to both read from and write to the correct databases

June 18th

  • Worked to get the output of the Greenwood parser into the appropriate format ( 90% of the day)
  • Set up all the necessary libraries for communication with our databases
  • Learned a great deal about the Eclipse IDE

June 15th

  • Ran tests with both chunkers to determine which one we should use ( 90% of the day )
  • Read documentation on Mark Greenwood's chunker

June 14th

  • Finished citing all sources using IEEE standards
  • Found/Downloaded yet another java-based NP chunker
  • Tested chunker to run with the XML input from our POS tagger

June 13th

  • Wrote side-by-side comparison of all NPE methods
  • Made finishing touches on abridged presentation
  • Began to make reference list using IEEE standards
  • Found/Downloaded a java-based NP chunker
  • Attended group meeting
  • Attended "Biological Interfacing with Nanostructured Materials
  • Worked on my webpage for RAMS
    • Completed mentor page

June 12th

  • Met with Dr.Jiao to determine the exact schedule that my project would follow
  • Wrote the project proposal
  • Fixed key problems with the presentation.
  • Worked on my webpage for RAMS
    • Completed main page
    • Completed projects page
    • Completed links page
    • Completed contact page

June 11th

  • Added a side-by-side comparison of each NPE method to the presentation
  • Added finishing touches to the presentation.
  • Presented NPE power point to the group (1 hour 15 minutes)
  • Investigated possible complex NPE open source chunkers
  • Met with Dr.Jiao to shave down the presentation for Wedensday's group meeting

June 8th

  • Continued slides on Memory-based machine learning method
  • Added slides on Conditional Random Field machine learning method
  • Added slides on Support Vector Machine machine learning method
  • Met with Chris Symons to discuss Conditional Random Fields
  • Read/took notes on the following papers:
    • (Reread) Chunking with Support Vector Machines
    • (Reread) A Memory-Based Approach to Learning Shallow Natural Language Patterns
    • (Reread) Shallow Parsing with Conditional Random Fields

June 7th

  • Attended the "Mouse House" tour
  • Met with Dr.Jiao to discuss due dates, guidelines, and project revisions
  • Added slides on lexical corpora & noun phrase definition
  • Continued slides on HMM
  • Continued slides on FSA's
  • Added slides on Memory-based machine learning method
  • Read/took notes on the following papers:
    • A Memory-Based Approach to Learning Shallow Natural Language Patterns
    • Memory-Based Shallow Parsing (Daelemans, Buchholz, Veenstra)
    • Memory-Based Shallow Parsing (Sang)

June 6th

  • Attended "Defeating Terrorism with Technology" lecture
  • Added more slides on Hidden Markov Model
  • Met with Dr.Jiao to determine research plan changes
  • Compiled notebook "Collected Works on Noun Phrase Extraction"
  • Attended group research meeting
  • Read/took notes on the following papers:
    • Shallow Parsing with Conditional Random Fields
    • (Reread) Chunking with Support Vector Machines
    • (Reread) Shallow Parsing using Specialized HMMs

June 5th

  • Added slides on Hidden Markov Model
  • Added slides on Transformation-based machine learning
  • Attended HTML workshop with Cindy Latham
  • Discussed Hidden Markov Model application to NPE with Chris Symons
  • Read/took notes on the following papers:
    • A Maximum Entropy Approach to Natural Language Processing
    • Text Chunking using Transformation-Based Learning

June 4th

  • Met with Dr. Potok to discuss questions about clustering algorithms
  • Met with Dr. Jiao to discuss questions about the role of machine learning in NPE
  • Met with Chris Symons to discuss questions about Support Vector Machines
  • Requested the book An Introduction to Support Vector Machines and Other Kernel-based Learning Methods at the library
  • Downloaded TinySVM and libSVM for research
  • Continued work on powerpoint presentation Noun Phrase Extraction: An Anyalsis of Modern Techniques
    • Added slides on Simple Rule-based
    • Added slides on LR parsing
  • Read/took notes on the following papers:
    • A Practical Guide to Support Vector Classification
    • Chunking with Support Vector Machines
    • Nymble: a High-Performance Learning Name-finder

June 1st

  • Met with Dr. Potok to discuss various corpora and their role in machine learning
  • Met with Chris Symons to discuss current techniques of noun phrase extraction and to gain a knowledge of the history of the field
  • Researched the Brown and CONLL 2000 corpora
  • Began work on powerpoint presentation Noun Phrase Extraction: An Analysis of Modern Techniques
  • Read/took notes on the following papers:
    • Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger
    • Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation

May 31st

  • Attended the Piranha demo
  • Consulted with linguist to gain a better understanding of the problem
  • Researched Hidden Markov Model and Maximum Entropy model Parts of Speech taggers
  • Read/took notes on the following papers:
    • A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text
    • Retrieving Descriptive Phrases from Large Amounts of Free Text
    • Structural Ambiguity and Lexical Relations
    • A Probabalistic Parser
    • A Machine Learning Approach to Coreference Resolution of Noun Phrases

May 30th

  • Decided on topic and syllabus with Dr. Jiao
  • Attended progress meeting
  • Completed all necessary training
  • Compiled a list of applicable papers
  • Read/took notes on Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases