# Lightning ⚡ Intel Habana


[Intel® Gaudi® AI Processor (HPU)](https://habana.ai/) training processors are built on a heterogeneous architecture with a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries, and a configurable Matrix Math engine.

The TPC core is a VLIW SIMD processor with an instruction set and hardware tailored to serve training workloads efficiently.
The Gaudi memory architecture includes on-die SRAM and local memories in each TPC and,
Gaudi is the first DL training processor that has integrated RDMA over Converged Ethernet (RoCE v2) engines on-chip.

On the software side, the PyTorch Habana bridge interfaces between the framework and SynapseAI software stack to enable the execution of deep learning models on the Habana Gaudi device.

Gaudi provides a significant cost-effective benefit, allowing you to engage in more deep learning training while minimizing expenses.

For more information, check out [Gaudi Architecture](https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Overview.html) and [Gaudi Developer Docs](https://developer.habana.ai).

______________________________________________________________________

## Installing Lighting Habana

To install Lightning Habana, run the following command:

```bash
pip install -U lightning lightning-habana
```

______________________________________________________________________

**NOTE**

Ensure either of lightning or pytorch-lightning is used when working with the plugin.
Mixing strategies, plugins etc from both packages is not yet validated.

______________________________________________________________________

## Using PyTorch Lighting with HPU

To enable PyTorch Lightning with HPU accelerator, provide `accelerator=HPUAccelerator()` parameter to the Trainer class.

```python
from lightning import Trainer
from lightning_habana.pytorch.accelerator import HPUAccelerator

# Run on one HPU.
trainer = Trainer(accelerator=HPUAccelerator(), devices=1)
# Run on multiple HPUs.
trainer = Trainer(accelerator=HPUAccelerator(), devices=8)
# Choose the number of devices automatically.
trainer = Trainer(accelerator=HPUAccelerator(), devices="auto")
```

The `devices=1` parameter with HPUs enables the Habana accelerator for single card training using `SingleHPUStrategy`.

The `devices>1` parameter with HPUs enables the Habana accelerator for distributed training. It uses `HPUDDPStrategy` which is based on DDP strategy with the integration of Habana’s collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes.

# Support Matrix

| **SynapseAI**         | **1.18.0**                                          |
| --------------------- | --------------------------------------------------- |
| PyTorch               | 2.4.0                                               |
| (PyTorch) Lightning\* | 2.4.x                                               |
| **Lightning Habana**  | **1.7.0**                                           |
| DeepSpeed\*\*         | Forked from v0.14.4 of the official DeepSpeed repo. |

\* covers both packages [`lightning`](https://pypi.org/project/lightning/) and [`pytorch-lightning`](https://pypi.org/project/pytorch-lightning/)

For more information, check out [HPU Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html)