DEV Community

Dr. Carlos Ruiz Viquez
Dr. Carlos Ruiz Viquez

Posted on

**Multi-Node Distributed Training with Horovod and Keras**

Multi-Node Distributed Training with Horovod and Keras

from horovod import Keras
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(784,)))
model.add(Dense(32, activation='relu'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')

hvd.init()
model = Keras(model)
model.train(
    steps_per_epoch=100,
    epochs=10,
    validation_steps=20,
    validation_data=(X_val, y_val),
    callbacks=[tf.keras.callbacks.EarlyStopping(patience=5)]
)
Enter fullscreen mode Exit fullscreen mode

This code snippet uses Horovod to distribute the training of a Keras neural network across multiple nodes. Here's what it does:

  • Initializes the Horovod library with hvd.init().
  • Creates a Keras model using Keras(model) and configures it for distributed training.
  • Trains the model on a dataset split into training and validation sets, with early stopping enabled to prevent overfitting.

This compact code snippet allows for efficient and scalable distributed training, making it ideal for large-scale machine learning projects.


Publicado automáticamente

Top comments (0)