SpaceX is Making History

Some quick thoughts on SpaceX’s attempt today:

This is their first launch failure. Whenever something doesn’t work as expected these guys really dig in to figure out what went wrong, and make sure it never happens again.

A fully reusable spacecraft (no, the space shuttle wasn’t fully reusable, only the orbiter was) is a big step forward. I’ve heard that a rocket costs as much as a 747 to build. Up until now, we have been in the, to quote Indiana Jones, “Fly? Yes! Land? No!” stage of space flight. This technology means that the same rocket can have years of service, rather than moments, which will drastically reduce the price of space flight.

Make your own π

A lot of folks are making their own pie for Pi day (3/14/15). It’s almost as easy to make your own π! Click on the image below to use this jsFiddle to calculate your very own approximation of π for Pi day:

Making your own PI

If you would like to see behind the scenes or tinker, the full fiddle is here.

ICBO 2014 Report, Day 1

ICBO has gotten off to a good start with a tutorial and three workshops. I mostly spent the day in the workshops, but the tutorial (OBO Tutorial: Getting Things Done with Open Biological and Biomedical Ontologies) introduces a number of interesting new tools, including Ontofox, Ontodog, and Ontorat. These complement the linked data ontology browsing tool Ontobee. It also included an introduction to RDF and SPARQL that looks like it was very helpful.

The <a href=""Drug-Drug Interaction Knowledge Representation Workshop had a really eye-opening keynote about Drug-Drug Interactions (DDIs) by Dan Malone. He showed how there is little agreement among existing DDI authorities, because the standards of evidence are very low. According to him, most major drug interactions are supported by nothing more than case studies.  Many of these interactions persist in these resources mostly because of liability issues – if an interaction is pulled and a patient later suffers (or appears to suffer) from it, then the authorities might be held liable for withholding that information. He also showed some preliminary results that clinicians want to be presented with alternatives to and explanations of the DDIs they are warned about.

I also attended a morning workshop on developing a biobanking ontology using the Ontology for Biomedical Investigations. I think this sort of common vocabulary could have a similar benefit to what has made it easier to index structured information from web pages. Frank Manion also presented the start of an Informed Consent Ontology, which currently represents informed consent documents. Some notable thoughts and/or quotes:

  • “In data sharing, you’re sharing data with yourself in two years” – Dave Parrish
  • Ontology theory and development would be a really useful undergraduate elective course. – Penn Medicine Biobank team

I’ll check in again tomorrow!

4 Reasons Why Semantics Help Make Biobanks Better

My first blog post at 5AM is up:

The Semantic Web provides a means to link information on the web to each other and to things in real life in an interoperable way. Internationalized Resource Identifiers, of which URLs are a type, are used to identify nearly everything, and linked data makes it possible to visit those URLs to get more information about the things they represent. This has some very useful applications, especially in biobanking. Semantics was literally made for biomedical research, and here are 4 ways in which that relationship can help make biobanks better information resources…


Validating RDF

I spent the day listening in on the RDF Validation Workshop, which kind of spilled over into the Cambridge Semantic Web Meetup. Here are my general musings and notes from the day. They may be completely wrong for many reasons, including possible misunderstandings of what the speaker said and information that is now out of date.

Google Says They’re Triplifying the Web

The Google Knowledge Graph is including a need to support users in how they can provide rich snippets in Google search results. They are building a validator for these formats against the RDF representations of their microdata. Most of their constraints are property paths, and they use SPARQL for the rest. They are also mostly concerned with suitability to their purpose, which is based on rich snippets and knowledge graph. They are using SPARQL-based constraints and are using RDFlib for prototyping, but will be moving to their own parser, which is used by the Structured Data Testing Tool. Here is an example path constraint, with the resulting SPARQL queries that are generated from it:

SELECT ?context WHERE {?context schema:flightNumber ?constraint.}
ASK WHERE {?context schema:flightNumber ?constraint.}

Currently, they are only validating things that are necessary, they won’t check for things that are optional.

Semantic Web Meetup

The general idea seems to be that the RDF community needs to provide a means to say the following things about RDF graphs:

  • The graph must at least contain X.
  • The graph must contain at most Y.
  • The graph can never contain Z.

The general idea seems to be to provisionally close any given RDF graph before validation in order to produce the report. That closure can include some fixed set of other graphs (such as vocabularies used), but ultimately, for the purposes of validation, the Unique Name Assumption and the Closed World Assumption need to be used to validate the graph as given. Eric Prud’hommeaux presented an interesting framework based on YACC-style grammars by providing “shapes” of objects to validate. This is similar to OSLC’s (Open Services for Lifecycle Collaboration) Resource Shape vocabulary, but with additional capabilities around disjunction and non-declarative validation processes.

Citing Your Sources on the Web

I was involved in the World Wide Web Consortium (W3C) Provenance Working Group, which was an amazing experience, even though I couldn’t put as much time into it as I would have liked. My friend and collaborator, Tim Lebo, edited the Provenance Ontology (PROV-O). PROV-O is, in my narrow perspective of the world, a fantastic foundation for talking about how stuff happens and, most importantly to this post, how to cite people and resources on the web.

Continue reading

Getting Up Early in the Morning

Well, sort of. I have some exciting news: in August I will be starting at 5AM Solutions as a data scientist. I’ll be finishing my time at Yale University with Michael Krauthammer, and will soon be wrapping up my computer science Ph.D. at Rensselaer Polytechnic Institute in the Tetherless World Constellation. Continue reading

Thanksgiving Science!

I’ve got a little formula that predicts how long it will take for our Thanksgiving turkey to cook. It works really well for our temperatures and preparation, but I’d like to make it a little more general so everyone else can use it, regardless of temperature. As a wise man once said, if it’s worth doing, it’s worth overdoing, unless you’re overcooking turkey.

Towards that end, and because this is a science blog, I would like to perform a hypothesis-generating experiment. If you’re willing to further science, please share some details about how you prepared your turkey, and how it turned out. Humanity will thank you. Turkeys will not. I will post the results when I can, and maybe we can try again next year for a full prediction.

Click here to share your turkey data.

Firetruck: In which I write and record my first song…

Ian asked me for a firetruck song tonight for bedtime songs. I thought maybe there might be some kind of melody hiding in the fire truck siren sound, so I started there and half-assed my way through a melody. After he went down, I thought I might have enough to make a “real” song. It only has one chord progression and melody, due to it being a kid’s song and the first song I’ve ever written, but I found a surprisingly good loop to match against, and it kind of came out as a ballad for firefighters from a kid’s perspective.

Like I said, be gentle, it’s supposed to be a little lame. And I’ve never recorded a song before either.

Here are the lyrics:’m releasing the recording and the lyrics as creative commons share-alike (

Data Processing with Python: Part 2

As I’ve said, I’ve been doing tons and tons of tabular data manipulation using Python in the past few years, and I’m sharing some of the patterns I’ve developed. Please look at Part 1 to see some of the more basic stuff, and review the rules of the road. Below the fold, we will be talking about filtering data by column and row and doing processing without loading the whole file into memory.

Continue reading