Searching for Text in Images

We (my lab, recently published on our new image search engine for biomedical images, called YIF, or Yale Image Finder. From our blog post on our web site:

We have recently released a new biomedical image search engine we call YIF. You can access it at:

You can search the actual image content of over 34,000 Open Access articles from PubMed Central. We use OCR with different levels of image correction (article and corpus) for highly accurate image text extraction.

For more details about our algorithms, we have a paper in Bioinformatics titled “Yale Image Finder (YIF): a new search engine for retrieving biomedical images“.


Running Standard Deviations

Update, 7/13/2013: I’m amazed at the continued staying power of this post, considering that I had originally worked the math out for this 14 years ago. People are still commenting on this and suggesting fixes. I’m also amazed that I’ve peppered enough errors in the math and code for people to still be finding errors 5 years after the fact.

My friend Dan at Invisible Blocks came up with a great way to compute a long-running mean from the count and mean:

count += 1
mean += (x - mean) / count

I remembered that I had come up with a similar thing for standard deviation back when I was developing clustering algorithms that could use that value. It uses a power sum average, where you track the power sum as an average (divide the power sum by n) in a similar way.

Continue reading