Machine Learning in Go
There are limited options for doing machine learning in the Go ecosystem. I’ve found a curated list here as a good starting point and it covers most of the main packages I’ve found through search. When looking for golang packages the best places to start are here and here.
Currently, there are only two general purpose libraries (golearn and mlgo) and several algorithm specific ones. Of the two general purpose libraries golearn is in a much better state than mlgo. Golearn has more algorithms, is easier to use and has more active contributors. I admit some bias, I have a pull request for adding an average perceptron classifier to golearn waiting right now. Decide for yourself by following the links.
Go Learn – Machine Learning for Go
Mlgo – General Machine Learning for Go
Golearn currently has the following algorithms implemented (as of this writing of course) and support for CSV and ARFF files. It also supports cross validation. The repo has several datasets useful for testing out algorithms with located here.
SVM
Linear Regression
KNN Classification
KNN Regression
Neural Networks
Naive Bayes
Decision Trees (Random and ID3)
RandomForest
Multiple options for pairwise functions (see)
The best place to get started with golearn is the actual README as it will get you through the install process and has a test script to get you started. The next place you will want to look is the example folder as there are several examples you can run that show how to load a CSV, split it into train/test sets then train and pull out the results.
Here is an example pulled out the examples folder for using the KNNClassifier to predict after doing a train / test split on some flower data.
package main import ( "fmt" "github.com/sjwhitworth/golearn/base" "github.com/sjwhitworth/golearn/evaluation" "github.com/sjwhitworth/golearn/knn" ) func main() { rawData, err := base.ParseCSVToInstances("../datasets/iris_headers.csv", true) if err != nil { panic(err) } //Initialises a new KNN classifier cls := knn.NewKnnClassifier("euclidean", 2) //Do a training-test split trainData, testData := base.InstancesTrainTestSplit(rawData, 0.50) cls.Fit(trainData) //Calculates the Euclidean distance and returns the most popular label predictions := cls.Predict(testData) fmt.Println(predictions) // Prints precision/recall metrics confusionMat, err := evaluation.GetConfusionMatrix(testData, predictions) if err != nil { panic(fmt.Sprintf("Unable to get confusion matrix: %s", err.Error())) } fmt.Println(evaluation.GetSummary(confusionMat)) }
Golearn makes it easy to crunch data using Go but there are some challenges when it comes to putting it into a data stream analysis pipeline. Part of the issue stems from how the data model is constructed and the assumption that classifiers are not long lived processes that are loading up pre-trained. For a talk I tried to make a simple server out of the above example that would expose the classifier over a rest API and ended up just quickly writing my own KNNClassifier so that I could manage state easier. All in all it is turning into a fairly decent package for simple ML tasks and I think it has a bright future ahead of it.
Thats great your Go packages. Are there some way to save the model created, for use after?
You can use the GOB package and just serialize up and write out the trained model to disk.