The Programmer News Hubb
Advertisement Banner
  • Home
  • Technical Insights
  • Tricks & Tutorial
  • Contact
No Result
View All Result
  • Home
  • Technical Insights
  • Tricks & Tutorial
  • Contact
No Result
View All Result
Gourmet News Hubb
No Result
View All Result
Home Technical Insights

Databricks introduces public preview of GPU and LLM optimization support for Databricks Model Serving

admin by admin
October 1, 2023
in Technical Insights


Databricks introduced a public preview of GPU and LLM optimization support for Databricks Model Serving. This new feature enables the deployment of various AI models, including LLMs and Vision models, on the Lakehouse Platform. 

Databricks Model Serving offers automatic optimization for LLM Serving, delivering high-performance results without the need for manual configuration. According to Databricks, it’s the first serverless GPU serving product built on a unified data and AI platform, allowing users to create and deploy GenAI applications seamlessly within a single platform, covering everything from data ingestion to model deployment and monitoring.

Databricks Model Serving simplifies the deployment of AI models, making it easy even for users without deep infrastructure knowledge. Users can deploy a wide range of models, including natural language, vision, audio, tabular, or custom models, regardless of how they were trained (from scratch, open-source, or fine-tuned with proprietary data). 

Just log your model with MLflow, and Databricks Model Serving will automatically prepare a production-ready container with GPU libraries like CUDA and deploy it to serverless GPUs. This fully managed service handles everything from managing instances, maintaining version compatibility, to patching versions. It also automatically adjusts instance scaling to match traffic patterns, saving on infrastructure costs while optimizing performance and latency.

Databricks Model Serving has introduced optimizations for serving large language models (LLM) more efficiently, resulting in up to a 3-5x  reduction in latency and cost. To use Optimized LLM Serving, you simply provide the model and its weights, and Databricks takes care of the rest, ensuring your model performs optimally. 

This streamlines the process, allowing you to concentrate on integrating LLM into your application rather than dealing with low-level model optimization. Currently, Databricks Model Serving automatically optimizes MPT and Llama2 models, with plans to support additional models in the future.



Source link

Previous Post

40+ Best Free Premiere Pro Templates for 2023

Next Post

Which Backend Technology to Choose in 2023: Node vs Python?

Next Post

Which Backend Technology to Choose in 2023: Node vs Python?

Recommended

Using The New Constrained Layout In WordPress Block Themes | CSS-Tricks

1 year ago

The Basic Principles of Web Accessibility — SitePoint

2 months ago

AWS and DeepLearning.AI launch course on LLMs

5 months ago

What They Are and When to Use Them — SitePoint

1 month ago

GSAS Talk: Understanding the Differences Between Modularity and Granularity

7 months ago

Weekly News for Designers № 712

3 months ago

© The Programmer News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Technical Insights
  • Tricks & Tutorial
  • Contact

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Technical Insights
  • Tricks & Tutorial
  • Contact

© 2022 The Programmer News Hubb All rights reserved.