Nvidia is looking to build a bigger presence outside of GPU sales by bringing its AI-specific software development kit to more applications.
Nvidia has announced that it will be adding support for the TensorRT-LLM SDK to Windows and models such as Stable Diffusion. The company said in a document: blog post Its purpose is to speed up the execution of large-scale language models (LLMs) and related tools.
TensorRT accelerates inference, the process of examining pre-trained information, such as newly generated stable diffusion images, to calculate probabilities and derive results. With this software, Nvidia hopes to play a bigger role on the inference side of generative AI.
TensorRT-LLM decomposes LLM and allows it to run faster on Nvidia’s H100 GPU. Works with LLMs such as Meta’s Llama 2 and other AI models such as Stability AI’s Stable Diffusion. The company says that by running his LLM through TensorRT-LLM, “this speedup significantly improves the experience of using more advanced LLMs, such as writing and coding assistants.”
In other words, Nvidia not only provides GPUs to train and run LLMs, but also provides the ability to run and work on models so that users don’t have to look for other ways to make the AI they generate more cost-effective. We would also like to provide software that can speed up the process.
The company said that TensorRT-LLM is “publicly available to anyone who wants to use or integrate it,” and that the SDK can be accessed on top of TensorRT-LLM. site.
Nvidia already has a near monopoly on powerful chips that train LLMs like GPT-4. And training and running an LLM typically requires a large amount of GPUs. Demand for H100 GPUs is skyrocketing. The estimated price reaches $40,000 per chip. The company has announced that a new version of its GPU, his GH200, will be released next year. No wonder in the second quarter he Nvidia’s revenue increased to his $ 13.5 billion.
However, the world of generative AI is rapidly advancing, and new ways to run LLM without requiring large amounts of expensive GPUs are emerging. Companies such as Microsoft and AMD have announced that they will manufacture their own chips to reduce their dependence on Nvidia.
And companies are turning to the inference side of AI development. AMD plans to acquire software company Nod.ai specifically to support LLM running on AMD chips. SambaNova already offers It also includes services that facilitate model execution.
For now, Nvidia remains the hardware leader in generative AI, but it’s already looking toward a future where people won’t need to buy its GPUs in bulk.