singa-incubating-0.1.0 Release Notes

SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. SINGA supports a wide variety of popular deep learning models.

This release includes following features:

Job management
- SINGA-3 Use Zookeeper to check stopping (finish) time of the system
- SINGA-16 Runtime Process id Management
- SINGA-25 Setup glog output path
- SINGA-26 Run distributed training in a single command
- SINGA-30 Enhance easy-to-use feature and support concurrent jobs
- SINGA-33 Automatically launch a number of processes in the cluster
- SINGA-34 Support external zookeeper service
- SINGA-38 Support concurrent jobs
- SINGA-39 Avoid ssh in scripts for single node environment
- SINGA-43 Remove Job-related output from workspace
- SINGA-56 No automatic launching of zookeeper service
- SINGA-73 Refine the selection of available hosts from host list

Installation with GNU Auto tool
- SINGA-4 Refine thirdparty-dependency installation
- SINGA-13 Separate intermediate files of compilation from source files
- SINGA-17 Add root permission within thirdparty/install.
- SINGA-27 Generate python modules for proto objects
- SINGA-53 Add lmdb compiling options
- SINGA-62 Remove building scrips and auxiliary files
- SINGA-67 Add singatest into build targets

Distributed training
- SINGA-7 Implement shared memory Hogwild algorithm
- SINGA-8 Implement distributed Hogwild
- SINGA-19 Slice large Param objects for load-balance
- SINGA-29 Update NeuralNet class to enable layer partition type customization
- SINGA-24 Implement Downpour training framework
- SINGA-32 Implement AllReduce training framework
- SINGA-57 Improve Distributed Hogwild

Training algorithms for different model categories
- SINGA-9 Add Support for Restricted Boltzman Machine (RBM) model
- SINGA-10 Add Support for Recurrent Neural Networks (RNN)

Checkpoint and restore
- SINGA-12 Support Checkpoint and Restore

Unit test
- SINGA-64 Add the test module for utils/common

Programming model
- SINGA-36 Refactor job configuration, driver program and scripts
- SINGA-37 Enable users to set parameter sharing in model configuration
- SINGA-54 Refactor job configuration to move fields in ModelProto out
- SINGA-55 Refactor main.cc and singa.h
- SINGA-61 Support user defined classes
- SINGA-65 Add an example of writing user-defined layers

Other features
- SINGA-6 Implement thread-safe singleton
- SINGA-18 Update API for displaying performance metric
- SINGA-77 Integrate with Apache RAT

Some bugs are fixed during the development of this release

SINGA-2 Check failed: zsock_connect
SINGA-5 Server early terminate when zookeeper singa folder is not initially empty
SINGA-15 Fixg a bug from ConnectStub function which gets stuck for connecting layerdealer
SINGA-22 Cannot find openblas library when it is installed in default path
SINGA-23 Libtool version mismatch error.
SINGA-28 Fix a bug from topology sort of Graph
SINGA-42 Issue when loading checkpoints
SINGA-44 A bug when reseting metric values
SINGA-46 Fix a bug in updater.cc to scale the gradients
SINGA-47 Fix a bug in data layers that leads to out-of-memory when group size is too large
SINGA-48 Fix a bug in trainer.cc that assigns the same NeuralNet instance to workers from diff groups
SINGA-49 Fix a bug in HandlePutMsg func that sets param fields to invalid values
SINGA-66 Fix bugs in Worker::RunOneBatch function and ClusterProto
SINGA-79 Fix bug in singatool that can not parse -conf flag

Features planned for the next release

SINGA-11 Start SINGA using Mesos
SINGA-31 Extend Blob to support xpu (cpu or gpu)
SINGA-35 Add random number generators
SINGA-40 Support sparse Param update
SINGA-41 Support single node single GPU training