I have been asked by many, many people for some introductory reading on Machine Learning for ecologists. Here are my favourite references!
Hastie (2009) The Elements of Statistical Learning, Springer.
I believe is the best textbook around for Machine Learning. Quite math-heavy, but has good explanations of algorithm convergence and real-life examples on the use of ML. Online chapters may be available through your university.
Duda (2001) Pattern Classification.
Has a good chapter on estimating and comparing classifiers.
Webb (2002) Statistical Pattern Recognition.
Particularly good for performance measures and feature selection.
Ripley (1996) Pattern recognition and neural networks
I haven’t used it extensively, but have been recommended it from neural networks users.
Ecological Applications of ML
Recknagel F (2001) Applications of machine learning to ecological modelling. Ecological Modelling 146:303– 310.
Olden JD, Lawler JJ, Poff NL (2008) Machine learning methods without tears: a primer for ecologists. The Quarterly review of biology 83:171–93.
Cutler RD et al. (2007) Random forests for classification in ecology. Ecology 88:2783–92.
De’ath G (2007) Boosted Trees for Ecological Modeling and Prediction. Ecology 88:243–251.
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. The Journal of Animal Ecology 77:802–13. An excellent guide to boosted regression trees with custom functions.
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199.
Lek S, Gue JF (1999) Artificial neural networks as a tool in ecological modelling, an introduction. Ecological Modelling 120:65 – 73.
Ozesmi S, Tan C, Ozesmi U (2006) Methodological issues in building, training, and testing artificial neural networks in ecological applications. Ecological Modelling 195:83–93.
Warner B, Misra M (1996) Understanding Neural Networks as Statistical Tools. The American Statistician 50:284–293.
Comparison of ML tools
Kampichler C, Wieland R, Calmé S, Weissenberger H, Arriaga-Weiss S (2010) Classification in conservation biology: A comparison of five machine-learning methods. Ecological Informatics 5:441–450.
Keller RP, Kocev D, Džeroski S (2011) Trait-based risk assessment for invasive species: high performance across diverse taxonomic groups, geographic ranges and machine learning/statistical tools. Diversity and Distributions 17:451–461.
Concepts in ML
I find it generally difficult to find information on the conceptual/philosophical basis of ML so let me know if you are aware of others!
Breiman L (2001) Statistical modeling: the two cultures. Statistical Science 16:199–231.
Make sure you download the version with replies from influential statisticians. Might radically change your views on algorithmic modelling, GLMs and statistical inference!
Glymour C, Madigan D, Pregibon D (1997) Statistical Themes and Lessons for Data Mining, in Data Mining and Knowledge Discovery (Kluwer Academic Publishers, Netherlands), pp 11–28.
Has some interesting points on inference from ML.
I use caret in R, which automates a lot of the training and data pre-processing. The vignettes are very helpful. http://caret.r-forge.r-project.org/