edu.stanford.nlp.tagger.maxent
Class Train

java.lang.Object
  extended by edu.stanford.nlp.tagger.maxent.Train

public class Train
extends java.lang.Object

This class is used to train a POS tagger from the command line. Options are specified via a properties file and command line arguments. Simple usage:

java edu.stanford.nlp.tagger.maxent.Train -file <inputfile> -model <model prefix>
This will generate a set of files with prefix <model prefix> which correspond to a model trained with <input file>. There are many options for training. While they can be specified on the command line, the easiest way to deal with them is via a properties file, which is passed in with the -props argument. First, generate a default properties file with "-genprops":
java edu.stanford.nlp.tagger.maxent.Train -genprops > <properties file>
Edit the file. Comments within provide documentation. Now to start the training procedure:
java edu.stanford.nlp.tagger.maxent.Train -props <properties file> -file <inputfile> -model <model prefix>
Any parameters from the properties file can be overridden on the commandline. The final configuration will be stored as <model prefix>.props. The training file should be in the following format: one word and one tag per line separated by a space or a tab. Each sentence should end in an EOS word-tag pair. (Actually, I'm not entirely sure that is still the case, but it probably won't hurt. -wmorgan) If you need to add a list of closed-class tags for a new language, do so in TTags (and update the documentation in TaggerConfig). Once trained, you can test your model performance with Test and tag some data with MaxentTagger.


Method Summary
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception