Using CoreNLP, d3.js, and dagre.js to visualize sentence parse trees

I’ve always been casually interested in the field of Natural Langauge Processing (NLP), a  field of computer science interested in extracting information from natural human language. I have no training or education whatsoever in the field so I’m not in a position to contribute much to the field, but I am definitely interested in seeing where the state of the art is, and in particular how powerful open-source NLP libraries have gotten (Google and Microsoft certainly have more powerful closed-source systems, but that doesn’t really help me.)

A few years ago I started playing with Apache’s OpenNLP project.  I’m a big fan of the Apache foundation and their libraries, but I found myself very frustrated by OpenNLP’s lack of documentation and the hacky-feeling interfaces the library exposed.  However recently I took another look at the available NLP libraries and came across Stanford’s CoreNLP project.   CoreNLP, as it turns out, is an awesome project, and it took almost zero effort to get their example demo working.

As a total NLP beginnner, the sentence parsing functionality was the most immediately approachable example.  Sentence parsing takes a natural-English sentence:

“I am parsing an example sentence.”

and breaks it down into component tokens and their relations:

(ROOT (S (NP (PRP I)) (VP (VBP am) (VP (VBG parsing) (NP (DT an) (NN example) (NN sentence)))) (. .)))

where each token type corresponds to a particular word type–“NP”  means “Noun Phrase”, VBG means “Verb, gerund or present participle”, and so forth (I’ve been referencing this as a complete token list.)

I’ve also been looking into JavaScript graph visualization libraries recently (I’ve struggled to find a JS library remotely as powerful and pretty as graphviz), and wanted to test out the dagre library, which re-implements a simplified dot algorithm in javascipt and can render the results to d3 (the current coolest-kid-on-the-block JS graph library).  So I put the two together and put together a simple visualization which uses dagre to show CoreNLP’s sentence parse tree.  It’s pretty simple, but you can play with it here.

nlp-screenshot-cropped

When I have time to work with the two libraries a bit more I’ll hopefully update with something more interesting.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to Using CoreNLP, d3.js, and dagre.js to visualize sentence parse trees

  1. Pingback: D3 NLP Visualizer Update | Ben Podgursky

  2. Great stuff. The link in “I’ve been referencing this as a complete token list.)” isn’t working.

  3. Brian says:

    Thanks for the post, I’d really like to be able to view the d3 example and see the source for it. Unfortunately the link “here” is not working.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s