суббота, 5 сентября 2009 г.

Enhancing translation by Stanford POS tagger

It’s better to use some morphological and syntactic information while you are trying to translate English sentence. Now application performs tagging of input sentences with Part of Speech tags in order to reduce the number of possible variants of translations of some particular English word.

For example

“I’m trying to console the crying girl”.

“console” should be treated as a verb here and hence problem with opposite meanings like “install a console” or “TV-console” disappears.

For that purpose Stanford POS tagger was used. For the input:

“I’m trying to console the crying girl”

it produces

“I'm/NN trying/VBG to/TO console/VB the/DT crying/VBG girl/NN”

So that it can be determined that “console” here is a verb!

But nothing is perfect. I’ve tried to compare Stanford POS tagger and SharpNLP.

Input: “I throw stick and dog barks”

SharpNLP:

I/PRP throw/VBP stick/VB and/CC dog/NN barks/NNS

Stanford POS tagger:

I/PRP throw/VBP stick/NN and/CC dog/NN barks/NNS

Overall Stanford’s parser has more relevant output. I think it is because of models that are probably trained better. But I have to make user be able to specify part of speech of words.