Good prospects ahead for data mining

Glossary

algorithm. A logical, step-by-step procedure used to solve problems in mathematics and computer programming. In the case of biometrics the algorithm refers to a computer program designed to turn raw data into code that can be used more easily by identification/verification software.

computer memory. Computer memory is measured in bytes.

  • 1 byte is equivalent to 8 bits. The information in a byte is equivalent to a letter in a word.
  • 1 kilobyte is roughly 1000 (210 or 1024) bytes or characters, approximately equal to one page of double-spaced text.
  • 1 megabyte is roughly 1,000,000 (220 or 1,048,576) bytes, approximately equal to one novel.
  • 1 gigabyte is about 1,000,000,000 (230 or 1,073,741,824) bytes, approximately equal to 1000 novels.
  • 1 terabyte is about 1,000,000,000,000 (240 or 1,099,511,627,776) bytes, approximately equal to 1,000,000 novels.

For more information see How bytes and bits work (How Stuff Works, USA).

decision tree. A hierarchy of rules within a computer program, represented by a tree-like structure, that enables a set of data to be classified. A series of selection criteria classify the data into smaller and smaller categories.

linear discriminant analysis. A method of classification that uses a weighted sum. For each object that is to be classified, linear discriminant analysis takes a weighted sum of values of the variables that determine the classification. The value of the weighted sum is then used to determine the classification. For example, a bank may wish to classify loan customers into those at risk of defaulting and those not at risk, based on salary and financial commitments. In the plot of financial commitments against salary, a linear discriminant function appears as a line. The high-risk customers will have a low salary and high financial commitments and lie above the line, while those with a high salary and low financial commitment will have low risk and lie below the line.

neural network. A statistical analysis procedure based on models of nervous system learning in animals. Neural networks have the ability to ‘learn’ from a collection of examples to discover patterns and trends. These data-mining techniques can be used in forecasting or predicting. For more information see An introduction to neural networks (University of Stirling, UK).

regression. A regression relationship allows the approximate prediction of one variable from the value of one or more other variables. For example, we might be interested in the prediction of the weight of Australian women given their height. Such a relationship is commonly expressed in the form of a mathematical equation, often a straight line equation.

External sites are not endorsed by the Australian Academy of Science.
Page updated September 2006.