Why Java’s Date isn’t deprecated, even though it might look like it

This shouldn’t be news at this point (JDK 1.5 was released 5 years ago), but in Java 5 Sun decided to deprecate large swaths of java.util.Date’s methods and constructors and introduce the Calendar class, which can handle locale-specific conversions and manipulations of the units of time we humans are used to seeing. This has created some confusion here and there about whether to represent datetimes as Dates or as Calendars, as it looks at a casual glance like Date has been more or less completely deprecated. It remains, however, as the preferred way of representing datetimes in Java. An explanation is after the jump.

Continue reading

Searching for Text in Images

We (my lab, http://krauthammerlab.med.yale.edu) recently published on our new image search engine for biomedical images, called YIF, or Yale Image Finder. From our blog post on our web site:

We have recently released a new biomedical image search engine we call YIF. You can access it at:


You can search the actual image content of over 34,000 Open Access articles from PubMed Central. We use OCR with different levels of image correction (article and corpus) for highly accurate image text extraction.

For more details about our algorithms, we have a paper in Bioinformatics titled “Yale Image Finder (YIF): a new search engine for retrieving biomedical images“.

Running Standard Deviations

Update, 7/13/2013: I’m amazed at the continued staying power of this post, considering that I had originally worked the math out for this 14 years ago. People are still commenting on this and suggesting fixes. I’m also amazed that I’ve peppered enough errors in the math and code for people to still be finding errors 5 years after the fact.

My friend Dan at Invisible Blocks came up with a great way to compute a long-running mean from the count and mean:

count += 1
mean += (x - mean) / count

I remembered that I had come up with a similar thing for standard deviation back when I was developing clustering algorithms that could use that value. It uses a power sum average, where you track the power sum as an average (divide the power sum by n) in a similar way.

Continue reading

Data Mining: An Introduction

Data mining is, in the most general terms, an attempt to extract patterns and knowledge from data using various types of software and techniques. Data mining is used to learn and predict. This is applied to biology, neuroscience, fraud detection, national security, and even sports.

Some of these are more successful than others. For instance, text mining has been very successful at extracting proper nouns (names, places, etc.) from text, and what might be considered the biggest success of data mining comes from text mining: internet search engines. But at the same time, text mining has been less successful at automated text summarization. Continue reading