Getting Up Early in the Morning

Well, sort of. I have some exciting news: in August I will be starting at 5AM Solutions as a data scientist. I’ll be finishing my time at Yale University with Michael Krauthammer, and will soon be wrapping up my computer science Ph.D. at Rensselaer Polytechnic Institute in the Tetherless World Constellation.

The hub of Data Science, bringing together domain expertise, data engineering, the scientific method, math, statistics, advanced computing, visualization, and a hacker mindset to solve problems. (From Wikipedia)

You might be thinking, “What is a data scientist, what do they do?” Recently, data scientists win elections. Data science is made up of a number of disciplines, but essentially uses a large toolset in science, math, computing, statistics, and knowledge representation to solve problems. These problems can be in science, engineering, finance, business, politics, social activism, and anything else where we can apply these strategies.

Much of my work at Yale these past years has been in applying data science to translational research, which tries to take knowledge from basic biological research and applies it to finding treatments and cures for particular diseases. I mostly worked on issues of data and knowledge representation, annotation of data with pre-existing knowledge, data interoperability, provenance, and visualization. 5AM Solutions is a company that focuses on the development of life science analytical software, and I plan to keep working in this area in my new job.

So what’s going to be new for me at 5AM? The big thing is, I’m now available for consulting and collaborations. For instance, do you need help developing and executing a data sharing and publication plan? These can be with the world at large or within organizations, but research grants funded by the NIH or NSF require them. Much of my research has focused on data sharing, interoperability, and provenance, so I can help research groups or commercial organizations come up with effective strategies that can increase the value (including citability) of data that needs to be published.

Another interesting area where I hope to contribute is in using knowledge representation and semantic web technologies to make data that is easier to understand (for both humans and computers) and to exchange. Imagine being able to have, say, a database of sequencing data for genomic changes, patient phenotypes, and rna sequencing that you don’t have to make from scratch. You can pull in representations of phenotypes from a phenotype vocabulary, representations of genomic variants from a sequencing ontology, etc. and then lay a statistical vocabulary over it to do multidimensional analysis and visualization. Further, if others are using those same vocabularies, and you want to use their data, no work is needed to merge them, it happens automatically. Knowledge representation and semantic web technologies enable this, and it’s been making my research far easier to do.

So that’s the short of it. If you’re interested in working with me on anything that involves data science, semantics, knowledge representation, visualization, or anything else, you can contact me through 5AM Solutions, Inc.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s