August 10th
- Attended final ceremonies for the RAMS program
- Was dismissed!
August 9th
- Made final revisions on the technical report and turned it in for consideration
August 8th
- Listened to other students perform their final presentations
- Performed final presentation
- Continued making the suggested revisions on the technical report
- Presented my poster at the poster session
August 6th
- Turned in rough draft of technical report to both Cathy and Robert
- Worked on the final presentation to be given on Wednesday
- Continued work on the technical report
August 3rd
- Worked on the final technical report
- Met with Debbie McCoy to discuss the problems with both my presentation and poster
- Rewrote poster for Debbie McCoy
August 2nd
- Met with both Robert and Cathy to discuss the final days of the project and the plans
- Continued work on the technical report
August 1st
- Designed the new cluster decider
- Implemented this algorithm on the test set
- Calculated F-Measure for this result
- Met with Cathy to discuss this result
July 31st
- Realized that we had a drastic improvement in the clustering results
- Investigated possible performance measures for this result
- Worked on the final technical report for this project
July 30th
- Continued analyzing our results
- Came up with the idea to weight the TF-ICF less, in order to pick out more anomalies
- Implemented this idea and let it run over night
July 27th
- New anomalies had the same problems as the old ones
July 26th
- Added clusterDecider() to UPGMA algorithm - this method splits up the UPGMA tree into however many clusters you like!
- Saw word clusters for the very first time
- Realized problem with IDF equation.
- Set AnomalyFinder to run overnight and calculate the new anomalies based on the new TFIDF data
July 25th
- Spent the majority of the day making the DHS poster
- Also spent some time working on my own poster
- Attended the group meeting to discuss progress
July 24th
- Spent most of the day tracking down little bugs in the UPGMA clustering algorithm
- Got the UPGMA clustering algorithm working!! That was cool
- Wrote a new global frequency finder to sift through multiple corpus tables for the corpus global frequencies
- Started the corpusGFFinder running on the simple noun phrase test results and corpus
July 23rd
- Finished implementation, and began debugging UPGMA clustering algorithm
- Met with Robert and Jim about helping out with the DHS poster
- Ran simple noun phrase chunker on the test data
- Ran complex noun phrase chunker on the test data
- Reviewed a paper for submission
July 20th
- Began implementing the UPGMA clustering algorithm
- Finished merging tables for simple noun phrases
- Began gathering TF data for simple noun phrases
July 19th
- Spent most of the day preparing application to the Richard Tapia conference
- Researched clustering algorithms to work with our document set
July 18th
- Wrote my abstract for submission to the RAMS
- Reworked my presentation according to the new template
July 17th
- Made the rough draft for my final poster: link
- Wrote the rough draft of my final presentation
July 16th
- Spent most of the day tracking down a bug in the Euclidean distance algorithm
- Generated more Euclidean distance matrices... I think we finally have it right!
- Met with Cathy and Robert to discuss the project
July 13th
- Spent the ENTIRE day preparing the rough draft for my final presentation
July 12th
- Attended the optional lecture on "adaptive mesh refinement for the calculation of thermal flux in a nuclear reactor"... or something like that!
- Continued working on the Euclidean distance measure
- Generated Euclidean distance
- Continued assembling noun phrase data into tables
- Restarted CNPE's on several machines
July 11th
- Continued to refine the overlap function/metric to include weighting
- Began assembling all of the simple noun phrase data into several large tables
- Began working on the Euclidean distance measure for the data
- Attended group progress meeting
- Attended the required lecture "Abstracts, Papers, and Posters"
July 10th
- Came up with overlap performance metric
- Met with Cathy to determine additional metrics and to discuss preliminary results
- Programmed the overlap performance metric
- Generated similarity matrix using the overlap metric
- Reread TFICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams
July 9th
- Finished anomaly locator
- Ran the locator on the test set
- Graphed the anomaly data
- Wrote progress report for Cathy and Robert
July 6th
- Began development of the Anomaly Finder program
- Started to formulate program structure for the final product
- Decided on one possible performance metric
July 5th
- Modified the word parser program to read from and write to the database
- Wrote the GFfinder program to calculate global frequencies for the test data
- Ran the test data
- Graphed the test data vs. the corpus data
- Formulated a more telling graph for demonstrative purposes
July 4th
July 3rd
- Prepared notes for the morning meeting
- Met with Cathy and Robert to talk about the direction of the project
- Decided on the next direction and reviewed the timeline
- Located test data for the project
July 2nd
- Finally finished TFfinder and put together an Excel document
- Put together notebook on project
- Took notes on various possible evaluation metrics for the project
- Read and took notes on Experiments in Single and Multi-Document Summarization Using MEAD
June 29th
- First graph was supposed to be done today, but TFfinder crashed overnight, so it will
have to be done by Monday
- Read and took notes on:
- Evaluation of Phrase-Representation Summarization based on Information
Retrival Task
- The Smart/Empire TIPSTER IR System
- Using Names and Topics for New Event Detection
- A System for New Event Detection
- Recent Developments in Text Summarization
June 28th
- Read the following papers:
- Using Lines from the Body of Document and Key Words for Improving Display
of Search Results in Information Retrieval Systems
- A Method for Improving Automatic Word Categorization
- Term-Weighting Approaches in Autmatic Text Retrival
- Identifying the Subject of Documents in Digital Libraries Autmatically
Using Frequently-Occurring Words - Study and Findings
- Stochastic Models for Surface Information Extraction in Texts
- Text-Learning and Related Intelligent Agents: A Survey
June 27th
- Reworked my powerpoint presentation
- Began gathering papers on the subject of text summarization
- Did online research in this area
- Gave power point presentation to the group to get them up to speed on my project
- Attended the Big Bang seminar in Wigner
June 26th
- Optimized term frequency database by applying an index
- Attended brownbag lunch "Ethics in Computing"
- Began gathering papers on the subject of term frequency extraction
June 25th
- Reorganized the data into alphabetical order and recalculated IDF
- Cleared up some misconceptions with Dr.Jiao
- Spent a while trying to get the data in the right place for easy manipulation
- Wrote a program (TFfinder.java) to extract term frequencies for the tokens included in the global frequency table -
it also calculates the average term frequency for each token and stores it in the database
June 22nd
- Met with Dr. Jiao to go over what has been done on the project while she was away
- Attended website meeting to get criticism/ commentary
- Started to gather data for the first graph
- Calculated IDF for the 229023 significant terms collected
June 21st
- Read documentation for the Greenwood parser to better understand the rule syntax
- Worked out several new rules to increase the accuracy of the parser particularly on cardinal numbers
- Tested, debugged, and applied new rules
- Divided the remainder of the 1,000,000 documents into smaller sets for parallelization
- Fixed a small problem with the parser and reinitialized the crashed instances
- Began development on the common expressions matcher
June 20th
- Spent the majority of the day improving accuracy of the Greenwood parser as well
as getting it ready to run on other machines
- Set up the parser on Robert's machine, two Tigershark machines, Paul's, Jim's, and of course my own.
- Attended "Fat's where it's at" seminar at 3:30
June 19th
- Spent the majority of the day getting the Greenwood parser to take input in the correct format
- Met with Robert to talk about status and direction
- Successfully got the parser to both read from and write to the correct databases
June 18th
- Worked to get the output of the Greenwood parser into the appropriate format ( 90% of the day)
- Set up all the necessary libraries for communication with our databases
- Learned a great deal about the Eclipse IDE
June 15th
- Ran tests with both chunkers to determine which one we should use ( 90% of the day )
- Read documentation on Mark Greenwood's chunker
June 14th
- Finished citing all sources using IEEE standards
- Found/Downloaded yet another java-based NP chunker
- Tested chunker to run with the XML input from our POS tagger
June 13th
- Wrote side-by-side comparison of all NPE methods
- Made finishing touches on abridged presentation
- Began to make reference list using IEEE standards
- Found/Downloaded a java-based NP chunker
- Attended group meeting
- Attended "Biological Interfacing with Nanostructured Materials
- Worked on my webpage for RAMS
June 12th
- Met with Dr.Jiao to determine the exact schedule that my project would follow
- Wrote the project proposal
- Fixed key problems with the presentation.
- Worked on my webpage for RAMS
- Completed main page
- Completed projects page
- Completed links page
- Completed contact page
June 11th
- Added a side-by-side comparison of each NPE method to the presentation
- Added finishing touches to the presentation.
- Presented NPE power point to the group (1 hour 15 minutes)
- Investigated possible complex NPE open source chunkers
- Met with Dr.Jiao to shave down the presentation for Wedensday's group meeting
June 8th
- Continued slides on Memory-based machine learning method
- Added slides on Conditional Random Field machine learning method
- Added slides on Support Vector Machine machine learning method
- Met with Chris Symons to discuss Conditional Random Fields
- Read/took notes on the following papers:
- (Reread) Chunking with Support Vector Machines
- (Reread) A Memory-Based Approach to Learning Shallow Natural Language Patterns
- (Reread) Shallow Parsing with Conditional Random Fields
June 7th
- Attended the "Mouse House" tour
- Met with Dr.Jiao to discuss due dates, guidelines, and project revisions
- Added slides on lexical corpora & noun phrase definition
- Continued slides on HMM
- Continued slides on FSA's
- Added slides on Memory-based machine learning method
- Read/took notes on the following papers:
- A Memory-Based Approach to Learning Shallow Natural Language Patterns
- Memory-Based Shallow Parsing (Daelemans, Buchholz, Veenstra)
- Memory-Based Shallow Parsing (Sang)
June 6th
- Attended "Defeating Terrorism with Technology" lecture
- Added more slides on Hidden Markov Model
- Met with Dr.Jiao to determine research plan changes
- Compiled notebook "Collected Works on Noun Phrase Extraction"
- Attended group research meeting
- Read/took notes on the following papers:
- Shallow Parsing with Conditional Random Fields
- (Reread) Chunking with Support Vector Machines
- (Reread) Shallow Parsing using Specialized HMMs
June 5th
- Added slides on Hidden Markov Model
- Added slides on Transformation-based machine learning
- Attended HTML workshop with Cindy Latham
- Discussed Hidden Markov Model application to NPE with Chris Symons
- Read/took notes on the following papers:
- A Maximum Entropy Approach to Natural Language Processing
- Text Chunking using Transformation-Based Learning
June 4th
- Met with Dr. Potok to discuss questions about clustering algorithms
- Met with Dr. Jiao to discuss questions about the role of machine learning in NPE
- Met with Chris Symons to discuss questions about Support Vector Machines
- Requested the book An Introduction to Support Vector Machines and Other Kernel-based Learning Methods at the library
- Downloaded TinySVM and libSVM for research
- Continued work on powerpoint presentation Noun Phrase Extraction: An Anyalsis of Modern Techniques
- Added slides on Simple Rule-based
- Added slides on LR parsing
- Read/took notes on the following papers:
- A Practical Guide to Support Vector Classification
- Chunking with Support Vector Machines
- Nymble: a High-Performance Learning Name-finder
June 1st
- Met with Dr. Potok to discuss various corpora and their role in machine learning
- Met with Chris Symons to discuss current techniques of noun phrase extraction and to gain a knowledge of the history of the field
- Researched the Brown and CONLL 2000 corpora
- Began work on powerpoint presentation Noun Phrase Extraction: An Analysis of Modern Techniques
- Read/took notes on the following papers:
- Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger
- Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation
May 31st
- Attended the Piranha demo
- Consulted with linguist to gain a better understanding of the problem
- Researched Hidden Markov Model and Maximum Entropy model Parts of Speech taggers
- Read/took notes on the following papers:
- A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text
- Retrieving Descriptive Phrases from Large Amounts of Free Text
- Structural Ambiguity and Lexical Relations
- A Probabalistic Parser
- A Machine Learning Approach to Coreference Resolution of Noun Phrases
May 30th
- Decided on topic and syllabus with Dr. Jiao
- Attended progress meeting
- Completed all necessary training
- Compiled a list of applicable papers
- Read/took notes on Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases
|