I’ve always been casually interested in the field of Natural Langauge Processing (NLP), a field of computer science interested in extracting information from natural human language. I have no training or education whatsoever in the field so I’m not in a position to contribute much to the field, but I am definitely interested in seeing where the state of the art is, and in particular how powerful open-source NLP libraries have gotten (Google and Microsoft certainly have more powerful closed-source systems, but that doesn’t really help me.)
A few years ago I started playing with Apache’s OpenNLP project. I’m a big fan of the Apache foundation and their libraries, but I found myself very frustrated by OpenNLP’s lack of documentation and the hacky-feeling interfaces the library exposed. However recently I took another look at the available NLP libraries and came across Stanford’s CoreNLP project. CoreNLP, as it turns out, is an awesome project, and it took almost zero effort to get their example demo working.
As a total NLP beginnner, the sentence parsing functionality was the most immediately approachable example. Sentence parsing takes a natural-English sentence:
“I am parsing an example sentence.”
and breaks it down into component tokens and their relations:
(ROOT (S (NP (PRP I)) (VP (VBP am) (VP (VBG parsing) (NP (DT an) (NN example) (NN sentence)))) (. .)))
where each token type corresponds to a particular word type–“NP” means “Noun Phrase”, VBG means “Verb, gerund or present participle”, and so forth (I’ve been referencing this as a complete token list.)
I’ve also been looking into JavaScript graph visualization libraries recently (I’ve struggled to find a JS library remotely as powerful and pretty as graphviz), and wanted to test out the dagre library, which re-implements a simplified dot algorithm in javascipt and can render the results to d3 (the current coolest-kid-on-the-block JS graph library). So I put the two together and put together a simple visualization which uses dagre to show CoreNLP’s sentence parse tree. It’s pretty simple, but you can play with it here.
When I have time to work with the two libraries a bit more I’ll hopefully update with something more interesting.
Great stuff. The link in “I’ve been referencing this as a complete token list.)” isn’t working.
Thanks, I swapped in a link that should work.
Thanks for the post, I’d really like to be able to view the d3 example and see the source for it. Unfortunately the link “here” is not working.
Thanks, fixed. Also, the source is all here https://github.com/bpodgursky/nlpviz even if the link dies in the future.
Hello,
Looks great! Please could you make the test link work again at http://nlpviz.bpodgursky.com/ ?
Thanks
Oops, fixed. Try now.
thanks. working now.
The example link was still now working. Can you please fix it?
Even am getting error while running the repo : https://github.com/bpodgursky/nlpviz
edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
Restarted the example, but you should still be able to run it yourself.