Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

H2Oai GPU Edition

License

NotificationsYou must be signed in to change notification settings

h2oai/h2o4gpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Join the chat at https://gitter.im/h2oai/h2o4gpu

H2O4GPU is a collection of GPU solvers byH2Oai with APIs in Python and R. The Python API builds upon the easy-to-usescikit-learn API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn (i.e.import h2o4gpu as sklearn) with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algorithms and falls back to CPU algorithms when the GPU algorithm does not support an important existing scikit-learn class option. The R package is a wrapper around the H2O4GPU Python package, and the interface follows standard R conventions for modeling.

Daal library added for CPU, currently supported only x86_64 architecture.

Requirements

  • PC running Linux with glibc 2.17+

  • Install CUDA with bundled display drivers (CUDA 8orCUDA 9orCUDA 9.2)orCUDA 10)

  • Python shared libraries (e.g. On Ubuntu: sudo apt-get install libpython3.6-dev)

When installing, choose to link the cuda install to /usr/local/cuda .Ensure to reboot after installing the new nvidia drivers.

  • Nvidia GPU with Compute Capability >= 3.5 (Capability Lookup).

  • For advanced features, like handling rows/32 > 2^16 (i.e., rows > 2,097,152) in K-means, need Capability >= 5.2

  • For building the R package,libcurl4-openssl-dev,libssl-dev, andlibxml2-dev are needed.

User Installation

Note: Installation steps mentioned below are for users planning to use H2O4GPU. SeeDEVEL.md for developer installation.

H2O4GPU can be installed using either PIP or Conda

Prerequisites

Add to~/.bashrc or environment (set appropriate paths for your OS):

export CUDA_HOME=/usr/local/cuda # or choose /usr/local/cuda9 for cuda9 and /usr/local/cuda8 for cuda8export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64/:$CUDA_HOME/lib/:$CUDA_HOME/extras/CUPTI/lib64
  • Install OpenBlas dev environment:
sudo apt-get install libopenblas-dev pbzip2

If you are building the h2o4gpu R package, it is necessary to install the following dependencies:

sudo apt-get -y install libcurl4-openssl-dev libssl-dev libxml2-dev

PIP install

Download the Python wheel file (For Python 3.6):

Start a fresh pyenv or virtualenv session.

Install the Python wheel file. NOTE: If you don't use a fresh environment, this willoverwrite your py3nvml and xgboost installations to use our validatedversions.

pip install h2o4gpu-0.3.0-cp36-cp36m-linux_x86_64.whl

Conda installation

Ensure you meet the Requirements and have installed the Prerequisites.

If not already done you need toinstall conda package manager. Ensure youtest your conda installation

H204GPU packages for CUDA8, CUDA 9 and CUDA 9.2 are available fromh2oai channel in anaconda cloud.

Create a new conda environment with H2O4GPU based on CUDA 9.2 and all its dependencies using the following command. For other cuda versions substitute the package name as needed. Note the requirement for h2oai and conda-forge channels.

conda create -n h2o4gpuenv -c h2oai -c conda-forge -c rapidsai h2o4gpu-cuda10

Once the environment is created activate itsource activate h2o4gpuenv.

To test, start an interactive python session in the environment and follow the steps in the Test Installation section below.

h2o4gpu R package

At this point, you should have installed the H2O4GPU Python package successfully. You can then go ahead and install theh2o4gpu R package via the following:

if (!require(devtools)) install.packages("devtools")devtools::install_github("h2oai/h2o4gpu",subdir="src/interface_r")

Detailed instructions can be foundhere.

Test Installation

To test your installation of the Python package, the following code:

import h2o4gpuimport numpy as npX = np.array([[1.,1.], [1.,4.], [1.,0.]])model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)model.cluster_centers_

should give input/output of:

>>> import h2o4gpu>>> import numpy as np>>>>>> X = np.array([[1.,1.], [1.,4.], [1.,0.]])>>> model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)>>> model.cluster_centers_array([[ 1.,  1.  ],       [ 1.,  4.  ]])

To test your installation of the R package, try the following example that builds a simpleXGBoost random forest classifier:

library(h2o4gpu)# Setup datasetx<-iris[1:4]y<- as.integer(iris$Species)-1# Initialize and train the classifiermodel<- h2o4gpu.random_forest_classifier() %>% fit(x,y)# Make predictionspredictions<-model %>% predict(x)

Next Steps

For more examples using Python API, please check out ourJupyter notebook demos. To run the demos using a local wheel run, at least downloadsrc/interface_py/requirements_runtime_demos.txt from the Github repo and do:

pip install -r src/interface_py/requirements_runtime_demos.txt

and then run the jupyter notebook demos.

For more examples using R API, please visit thevignettes.

Running Jupyter Notebooks

You can run Jupyter Notebooks with H2O4GPU in the below two ways

Creating a Conda Environment

Ensure you have a machine that meets the Requirements and Prerequisites mentioned above.

Next follow Conda installation instructions mentioned above. Once you have activated the environment, you will need to downgrade tornado to version 4.5.3refer issue #680. Start Jupyter notebook, and navigate to the URL shown in the log output in your browser.

source activate h2o4gpuenvconda install tornado==4.5.3jupyter notebook --ip='*' --no-browser

Start a Python 3 kernel, and try the code inexample notebooks

Using precompiled docker image

Requirements:

Download the Docker file (for linux_x86_64):

  • Bleeding edge (changes with every successful master branch build):

Load and run docker file (e.g. for bleeding-edge of cuda92):

jupyter notebook --generate-configecho "c.NotebookApp.allow_remote_access = False >> ~/.jupyter/jupyter_notebook_config.py # Choose True if want to allow remote accesspbzip2 -dc h2o4gpu-0.3.0.10000-cuda92-runtime.tar.bz2 | nvidia-docker loadmkdir -p log ; nvidia-docker run --name localhost --rm -p 8888:8888 -u `id -u`:`id -g` -v `pwd`/log:/log -v /home/$USER/.jupyter:/jupyter --entrypoint=./run.sh opsh2oai/h2o4gpu-0.3.0.10000-cuda92-runtime &find log -name jupyter* -type f -printf '%T@ %p\n' | sort -k1 -n | awk '{print $2}' | tail -1 | xargs cat | grep token | grep http | grep -v NotebookApp

Copy/paste the http link shown into your browser. If the "find" command doesn't work, look for the latest jupyter.log file and look at contents for the http link and token.

If the link shows no token or shows ... for token, try a token of "h2o" (without quotes). If running on your own host, the weblink will look likehttp://localhost:8888:token with token replaced by the actual token.

This container has a /demos directory which contains Jupyter notebooks and some data.

Plans

The vision is to develop fast GPU algorithms to complement the CPUalgorithms in scikit-learn while keeping full scikit-learn APIcompatibility and scikit-learn CPU algorithm capability. The h2o4gpuPython module is to be used as a drop-in-replacement for scikit-learnthat has the full functionality of scikit-learn's CPU algorithms.

Functions and classes will be gradually overridden by GPU-enabled algorithms (unlessn_gpu=0 is set and we have no CPU algorithm except scikit-learn's).The CPU algorithms and code initially will be sklearn, but graduallythose may be replaced by faster open-source codes like those in IntelDAAL.

This vision is currently accomplished by using the open-sourcescikit-learn and xgboost and overriding scikit-learn calls with ourown GPU versions. In cases when our GPU class is currentlyincapable of an important scikit-learn feature, we revert to thescikit-learn class.

As noted above, there is an R API in development, which will bereleased as a stand-alone R package. All algorithms supported byH2O4GPU will be exposed in both Python and R in the future.

Another primary goal is to support all operations on the GPU via theGOAIinitiative.This involves ensuring the GPU algorithms can take and return GPUpointers to data instead of going back to the host. In scikit-learnAPI language these are called fit_ptr, predict_ptr, transform_ptr,etc., where ptr stands for memory pointer.

RoadMap

2019 Q2:

  • A new processing engine that allows to scale beyond GPU memory limits
  • k-Nearest Neighbors
  • Matrix Factorization
  • Factorization Machines
  • API Support: GOAI API support
  • Data.table support

More precise information can be found in themilestone's list.

Solver Classes

Among others, the solver can be used for the following classes of problems

  • GLM: Lasso, Ridge Regression, Logistic Regression, Elastic Net Regulariation
  • KMeans
  • Gradient Boosting Machine (GBM) viaXGBoost
  • Singular Value Decomposition(SVD) + Truncated Singular Value Decomposition
  • Principal Components Analysis(PCA)

Benchmarks

Our benchmarking plan is to clearly highlight when modeling benefitsfrom the GPU (usually complex models) or does not (e.g. one-shotsimple models dominated by data transfer).

We have benchmarked h2o4gpu, scikit-learn, and h2o-3 on a variety ofsolvers. Some benchmarks have been performed for a few selected casesthat highlight the GPU capabilities (i.e. compute or on-GPU memoryoperations dominate data transfer to GPU from host):

Benchmarks for GLM, KMeans, and XGBoost for CPU vs. GPU.

A suite of benchmarks are computed when doing "make testperf" from abuild directory. These take all of our tests and benchmarks h2o4gpuagainst h2o-3. These will soon be presented as a livecommit-by-commit streaming plots on a website.

Contributing

Please refer to ourCONTRIBUTING.md andDEVEL.md for instructions on how to build and test theproject and how to contribute. The h2o4gpuGitter chatroom can be used fordiscussion related to open source development.

GitHubissues are used for bugs, feature and enhancement discussion/tracking.

Questions

References

  1. Parameter Selection and Pre-Conditioning for a Graph Form Solver -- C. Fougner and S. Boyd
  2. Block Splitting for Distributed Optimization -- N. Parikh and S. Boyd
  3. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers -- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein
  4. Proximal Algorithms -- N. Parikh and S. Boyd

Copyright

Copyright (c) 2017, H2O.ai, Inc., Mountain View, CAApache License Version 2.0 (see LICENSE file)This software is based on original work under BSD-3 license by:Copyright (c) 2015, Christopher Fougner, Stephen Boyd, Stanford UniversityAll rights reserved.Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditions are met:    * Redistributions of source code must retain the above copyright      notice, this list of conditions and the following disclaimer.    * Redistributions in binary form must reproduce the above copyright      notice, this list of conditions and the following disclaimer in the      documentation and/or other materials provided with the distribution.    * Neither the name of the <organization> nor the      names of its contributors may be used to endorse or promote products      derived from this software without specific prior written permission.THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" ANDANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIEDWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AREDISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANYDIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED ANDON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THISSOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

[8]ページ先頭

©2009-2025 Movatter.jp