Catalog of Life Taxonomic Tree

A small visualization I’ve wanted to do for a while is a tree of life graph — visualizing all known species and their relationships.

Recently I found that the Catalog of Life project has a very accessible database of all known species / taxonomic groups and their relationships, available for download here.  This let me put together a simple site backed by their database, available here:

http://taxontree.bpodgursky.com/

uncharted-screenshot

All the source code is available on Github.

Design

I’ve used dagre + d3 on a number of other graph visualization projects, so dagre-d3 was the natural choice for the  visualization component.  The actual code required to do the graph building and rendering is pretty trivial.

The data fetching was a bit trickier.  Since pre-loading tens of millions of records was obviously unrealistic, I had to implement a graph class (BackedBiGraph) which lazily expands and collapses, using user-provided callbacks to fetch new data.  In this case, the callbacks were just ajax calls back to the server.

The Catalog of Life database did not come with a Java client, so I thought this would be a good opportunity to use jOOQ to generate Java models and query builders corresponding to the COL database, since I am allergic to writing actual SQL queries.  This ended up working very well — configuring the jOOQ Maven plugin was simple, and the generated code made writing the queries trivial:

 private Collection<TaxonNodeInfo> taxonInfo(Condition condition) {
return context.select()
.from(TAXON)
.leftOuterJoin(TAXON_NAME_ELEMENT)
.on(TAXON.ID.equal(TAXON_NAME_ELEMENT.TAXON_ID))
.leftOuterJoin(SCIENTIFIC_NAME_ELEMENT)
.on(SCIENTIFIC_NAME_ELEMENT.ID.equal(TAXON_NAME_ELEMENT.SCIENTIFIC_NAME_ELEMENT_ID))
.where(condition).fetch().stream().map(record -> new TaxonNodeInfo(
record.into(TAXON),
record.into(TAXON_NAME_ELEMENT),
record.into(SCIENTIFIC_NAME_ELEMENT)
)).collect(Collectors.toList());
}

All in all, there are a lot of rough edges still, but dagre, d3 and jOOQ made this a much easier project than expected.  The code is on Github, so suggestions, improvements, or bugfixes are always welcome.

 

 

This entry was posted in Github, Open Source, Visualization. Bookmark the permalink.

7 Responses to Catalog of Life Taxonomic Tree

  1. Very cool. One potential improvement would be to have each unit have a link to the appropriate wikipedia page

    • bpodgursky says:

      Thanks. Yeah, I had that on my to-do list for a while, but it seemed like it was going to take more scraping / data cleaning than I wanted. I guess I could just link to a wiki search for the term.

      • aaron_in_sf says:

        On a similar note, you could ‘feel lucky’ and grab the top google image search match for the species…🙂

        Full disclosure: I want pictures

  2. Ted Sanders says:

    Great work! For a while I’ve been wanting to do a similar visualization of tree data, but applied to US employment and the North American Industrial Classification system. I’m imagining something similar, but perhaps with the elements being draggable. I’ll take a look at your source and maybe use it as a jumping off point. And if I get anywhere with my project (unlikely, but who knows), maybe I’ll add my planned bells and whistles to your version.

  3. a suggestion says:

    you should add the ability to move with the arrow keys, would make browsing a bit better

  4. Richard Zander says:

    You need some editing by an expert in the groups. Anictangium should be Anoectangium. Stuff like that. Also there are some recently new taxa that are not included. The Catalogue of Life is more than a bit behind times. On the other hand, computer-organized information is amazing to one who started off with a draft with a typewriter, then you cut up the paragraphs and paste them on bigger sheets and write in more stuff, then you type up a final draft, then type it up again after review . . . Research should be moving faster, but it is easy to get distracted by neat applications.

    • bpodgursky says:

      Incomplete data doesn’t shock me. I didn’t see any public datasets which looked more complete than COL, but I’d be happy to look any if you know of them.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s