Subluminal Messages

A science blog for all scientists, even amateurs.

Running Standard Deviations

My friend Dan at Invisible Blocks came up with a great way to compute a long-running mean from the count and mean:

count += 1
mean += (x - mean) / count

I remembered that I had come up with a similar thing for standard deviation back when I was developing clustering algorithms that could use that value. It uses a power sum average, where you track the power sum as an average (divide the power sum by n) in a similar way.

Read more »

July 31, 2008 Posted by jpmccusker | Computer Science, Science, Statistics, bioinformatics | , , , , , , | 3 Comments

Data Mining: An Introduction

Data mining is, in the most general terms, an attempt to extract patterns and knowledge from data using various types of software and techniques. Data mining is used to learn and predict. This is applied to biology, neuroscience, fraud detection, national security, and even sports.

Some of these are more successful than others. For instance, text mining has been very successful at extracting proper nouns (names, places, etc.) from text, and what might be considered the biggest success of data mining comes from text mining: internet search engines. But at the same time, text mining has been less successful at automated text summarization. Read more »

July 27, 2008 Posted by jpmccusker | Computer Science, Science | , , , , , | 2 Comments