Gaussian Mixture Models

Balkema, W. 2007. Variable-Size Gaussian Mixture Models for Music Similarity Measures. Proc. International Symposium on Music Information Retrieval (ISMIR) , pp. 491- 494, Vienna, Austria, September 2007. link

The authors employ two GMMs: a variable size model, which attempts to reduce the complexity of a fixed Gaussian model, with a fixed 20 Gaussian model. The author suggests the variable size model (which averaged 15 Gaussians) is both more efficient and more accurate at clustering the musical stimuli according to genre, using timbral dimensions as the determining variables.

Bilmes, J. 1998. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report ICSI-TR-97-021, University of Berkeley. link

The author provides a clear explanation of the expectation maximization algorithm, and how it solves problems generated by estimates of maximum-likelihood.

Borman, S. 2004. The Expectation Maximization Algorithm. Retrieved October 3, 2009, from http://www.seanborman.com/publications/EM_algorithm.pdf. Version: 2004. [n.p.].link

The author provides a brief summary of the EM Algorithm, which I found particularly useful in deciphering Dempster's seminal paper.

Dellaert, F. 2002. The Expectation Maximization Algorithm. Technical Report GIT-GVU-02-20, College of Computing, Georgia Institute of Technology, February 2002. link

The author attempts a more intuitive explanation of EM by discussing the process of lower bound maximization: the E-step constructs the local lower-bound, and the M-step optimizes the bound, thereby improving the estimate of the unknowns. Unlike several of the other authors, his figures are particularly clear and useful.

Dempster, A. P. Laird, N.M. & Rubin, D. B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B.. 19(1): 1-38.link

Though the topic had been briefly discussed before this article, Dempster was the first author to provide an algorithm that could solve the problem generated by maximum likelihood estimation: namely, how one can compute ML estimates from incomplete data? This paper is heavily theoretical, and isn't for the mathematically faint of heart.

Eck, D. Eck's Presentations on Machine Learning & Classification. Retrieved on October 3, 2009, from http://www.iro.umontreal.ca/%7Epift6080/H09/documents/intro_ml.pdf. [n.p.].link

The author's presentations are insightful and easy to understand, clearly laying out the theoretical background for machine learning, as well as a few basics for classification, though he doesn't cover GMMs.

Marolt, M. 2004. Gaussian Mixture Models for Extraction of Melodic Lines from Audio Recordings. Proc. 5th Int. Conf. on Music Information Retrieval (ISMIR'04), pp. 80-83. link

The author uses a GMM to extract melodic lines from polyphonic audio recordings. She specifically attempts to cluster one of 7 timbres using pitch estimation (rather than using timbral information, though pitch and timbre certainly interact) in Aretha Franklin's song, "Respect." She successfully classifies the lead vocals (accuracy of .93) and the bass voice (accuracy of .97).

Marques, J. & Moreno, P. 1999. A Study of Musical Instrument Classification Using Gaussian Mixture Models and Support Vector Machines. Tech. Rep., Cambridge Research Laboratory, Cambridge, Mass, USA, June 1999.link

The authors use 3 feature sets--linear prediction coefficients, FFT based cepstral coefficients, and FFT based mel cepsgtral coefficients--in a GMM to classify different musical instruments. The GMM had an overall error rate of 37% classifying sounds .2 seconds in length.

Moon, T. 1996. The Expectation-Maximization Algorithm. IEEE Trans. Signal Process. 44, 1, 47–59. link

The author provides a "reader friendly" explanation of the EM Algorithm, using a few clear descriptive examples. The author also discusses the various applications of the EM Algorithm.