Why small language models are the next big thing in AI

Author Topic: Why small language models are the next big thing in AI  (Read 60 times)

Offline Imrul Hasan Tusher

  • Jr. Member
  • **
  • Posts: 66
  • Test
    • View Profile
    • Looking for a partner for an unforgettable night?
Why small language models are the next big thing in AI
« on: April 15, 2024, 02:41:16 PM »
Why small language models are the next big thing in AI


In the AI wars, where tech giants have been racing to build ever-larger language models, a surprising new trend is emerging: small is the new big. As progress in large language models (LLMs) shows some signs of plateauing, researchers and developers are increasingly turning their attention to small language models (SLMs). These compact, efficient and highly adaptable AI models are challenging the notion that bigger is always better, promising to change the way we approach AI development.

Are LLMs starting to plateau?

Recent performance comparisons published by Vellum and HuggingFace suggest that the performance gap between LLMs is quickly narrowing. This trend is particularly evident in specific tasks like multi-choice questions, reasoning and math problems, where the performance differences between the top models are minimal. For instance, in multi-choice questions, Claude 3 Opus, GPT-4 and Gemini Ultra all score above 83%, while in reasoning tasks, Claude 3 Opus, GPT-4, and Gemini 1.5 Pro exceed 92% accuracy.

Interestingly, even smaller models like Mixtral 8x7B and Llama 2 – 70B are showing promising results in certain areas, such as reasoning and multi-choice questions, where they outperform some of their larger counterparts. This suggests that the size of the model may not be the sole determining factor in performance and that other aspects like architecture, training data, and fine-tuning techniques could play a significant role.

The latest research papers announcing new LLMs all point in the same direction: “If you just look empirically, the last dozen or so articles that come out, they’re kind of all in the same general territory as GPT-4,” says Gary Marcus, the former head of Uber AI and author of “Rebooting AI,” a book about building trustworthy AI. Marcus spoke with VentureBeat on Thursday.

“Some of them are a little better than GPT-4, but there’s no quantum leap. I think everybody would say that GPT-4 is a quantum step ahead of GPT-3.5. There hasn’t been any [quantum leap] in over a year,” said Marcus.

As the performance gap continues to close and more models demonstrate competitive results, it raises the question of whether LLMs are indeed starting to plateau. If this trend persists, it could have significant implications for the future development and deployment of language models, potentially shifting the focus from simply increasing model size to exploring more efficient and specialized architectures.

Drawbacks of the LLM approach

The LLMs, while undeniably powerful, come with significant drawbacks. Firstly, training LLMs requires an enormous amount of data, requiring billions or even trillions of parameters. This makes the training process extremely resource-intensive, and the computational power and energy consumption required to train and run LLMs are staggering. This leads to high costs, making it difficult for smaller organizations or individuals to engage in core LLM development. At an MIT event last year, OpenAI CEO Sam Altman stated the cost of training GPT-4 was at least $100M.

The complexity of tools and techniques required to work with LLMs also presents a steep learning curve for developers, further limiting accessibility. There is a long cycle time for developers, from training to building and deploying models, which slows down development and experimentation. A recent paper from the University of Cambridge shows companies can spend 90 days or longer deploying a single machine learning (ML) model. 

Another benefit of SLMs is their potential for enhanced privacy and security. With a smaller codebase and simpler architecture, SLMs are easier to audit and less likely to have unintended vulnerabilities. This makes them attractive for applications that handle sensitive data, such as in healthcare or finance, where data breaches could have severe consequences. Additionally, the reduced computational requirements of SLMs make them more feasible to run locally on devices or on-premises servers, rather than relying on cloud infrastructure. This local processing can further improve data security and reduce the risk of exposure during data transfer.

SLMs are also less prone to undetected hallucinations within their specific domain compared to LLMs. SLMs are typically trained on a narrower and more targeted dataset that is specific to their intended domain or application, which helps the model learn the patterns, vocabulary and information that are most relevant to its task. This focus reduces the likelihood of generating irrelevant, unexpected or inconsistent outputs. With fewer parameters and a more streamlined architecture, SLMs are less prone to capturing and amplifying noise or errors in the training data.

Clem Delangue, CEO of the AI startup HuggingFace, suggested that up to 99% of use cases could be addressed using SLMs, and predicted 2024 will be the year of the SLM. HuggingFace, whose platform enables developers to build, train and deploy machine learning models, announced a strategic partnership with Google earlier this year. The companies have subsequently integrated HuggingFace into Google’s Vertex AI, allowing developers to quickly deploy thousands of models through the Google Vertex Model Garden.

After initially forfeiting their advantage in LLMs to OpenAI, Google is aggressively pursuing the SLM opportunity. Back in February, Google introduced Gemma, a new series of small language models designed to be more efficient and user-friendly. Like other SLMs, Gemma models can run on various everyday devices, like smartphones, tablets or laptops, without needing special hardware or extensive optimization.

Since the release of Gemma, the trained models have had more than 400,000 downloads last month on HuggingFace, and already a few exciting projects are emerging. For example, Cerule is a powerful image and language model that combines Gemma 2B with Google’s SigLIP, trained on a massive dataset of images and text. Cerule leverages highly efficient data selection techniques, which suggests it can achieve high performance without requiring an extensive amount of data or computation. This means Cerule might be well-suited for emerging edge computing use cases.

Another example is CodeGemma, a specialized version of Gemma focused on coding and mathematical reasoning.  CodeGemma offers three different models tailored for various coding-related activities, making advanced coding tools more accessible and efficient for developers.

The transformative potential of small language models

As the AI community continues to explore the potential of small language models, the advantages of faster development cycles, improved efficiency, and the ability to tailor models to specific needs become increasingly apparent. SLMs are poised to democratize AI access and drive innovation across industries by enabling cost-effective and targeted solutions. The deployment of SLMs at the edge opens up new possibilities for real-time, personalized, and secure applications in various sectors, such as finance, entertainment, automotive systems, education, e-commerce and healthcare.

By processing data locally and reducing reliance on cloud infrastructure, edge computing with SLMs enables faster response times, improved data privacy, and enhanced user experiences. This decentralized approach to AI has the potential to transform the way businesses and consumers interact with technology, creating more personalized and intuitive experiences in the real world. As LLMs face challenges related to computational resources and potentially hit performance plateaus, the rise of SLMs promises to keep the AI ecosystem evolving at an impressive pace.

Source: https://venturebeat.com/ai/why-small-language-models-are-the-next-big-thing-in-ai/




Imrul Hasan Tusher
Senior Administrative Officer
Office of the Chairman, BoT
01847334718, Ext: 339
cmoffice2@daffodilvarsity.edu.bd
Daffodil International University