Mastering Java for Data Science
上QQ阅读APP看书,第一时间看更新

Machine learning and data mining libraries

There are quite a few machine learning and data mining libraries available for Java and other JVM languages. Some of them are as follows:

  • Weka (http://www.cs.waikato.ac.nz/ml/weka/) is probably the most famous data mining library in Java, contains a lot of algorithms and has many extensions.
  • JavaML (http://java-ml.sourceforge.net/) is quite an old and reliable ML library, but unfortunately not updated anymore
  • Smile (http://haifengl.github.io/smile/) is a promising ML library that is under active development at the moment and a lot of new methods are being added there.
  • JSAT (https://github.com/EdwardRaff/JSAT) contains quite an impressive list of machine learning algorithms.
  • H2O (http://www.h2o.ai/) is a framework for distributed ML written in Java, but is available for multiple languages, including Scala, R, and Python.
  • Apache Mahout (http://mahout.apache.org/) is used for in-core (one machine) and distributed machine learning. The Mahout Samsara framework allows writing the code in a framework-independent way and then executes it on Spark, Flink, or H2O.

There are several libraries that specialize solely on neural networks:

We will cover some of these libraries throughout the book.