Artificial intelligence is the biggest buzzword of 2023, with Google and Microsoft showing off their product lines, plans, and big visions for leveraging AI. Amid all the turmoil surrounding AI, Apple has been noticeably quiet or slow when it comes to showcasing its AI prowess. Perhaps this is why many are wondering what Apple is doing to combat the AI arms race. The answer is simple. Apple has been working on AI in various capacities for years. The user just hasn’t been able to integrate something like his ChatGPT into his iPhone.
But things are about to change. In a new research paper, Apple has demonstrated breakthrough technology that helps run AI on the iPhone. This technique involves streamlining bulky LLMs using flash storage optimization. It will also be a big deal when Apple integrates advanced AI into the iPhone. The Cupertino-based tech giant announced significant developments in AI through his two new research papers published this month. This paper reveals new techniques for 3D avatars and efficient language model inference.
The new study, “LLM in a Flash: Efficient Large Language Model Inference with Limited Memory,” published on December 12th, can provide a more immersive visual experience and allow users to access complex AI systems on their iPhone. , has the potential to transform your iPhone experience. iPhone and iPad. Research papers primarily focus on efficiently running large language models on devices with limited DRAM capacity. DRAM is a dynamic random access memory used in PCs and is known for its high speed, high density, affordability, and low power consumption.
Here are some takeaways from the research that put Apple ahead of its peers.
In this paper, we address the challenge of running LLM beyond the practically available DRAM, storing model parameters in flash memory, and executing them on demand to DRAM. We describe an inference cost model developed to optimize data transfer from flash memory, considering the characteristics of flash and DRAM.
The techniques described in this paper include windowing, which reduces data transfer by reusing previously activated neurons, and matrix banding, which increases data chunk size for efficient flash memory reads. It’s a ring.
This paper also focuses on exploiting sparsity in feedforward network (FFN) layers to selectively load parameters to increase efficiency. Another important aspect is memory management, which proposes strategies to efficiently manage the data loaded into DRAM and minimize overhead.
The researchers demonstrated their approach using models such as OPT 6.7B and Falcon 7B. According to the paper, the results showed that the model achieved 4-5x and 20-25x speedup on CPU and GPU, respectively, compared to traditional methods.
Regarding the practical application of the research, the two models demonstrated significant improvements in resource-limited environments.
New research from Apple presents an innovative approach to efficiently run LLM in hardware-constrained environments. This opens new directions for future research in on-device and next-generation user experiences.
What does that mean for iPhone users?
From a user perspective, discoveries about efficient LLM inference with limited memory could greatly benefit both Apple and iPhone users. Powerful LLM runs efficiently on devices with limited DRAM, such as iPhones and iPads, allowing users to immediately experience enhanced AI capabilities. These features include improved language processing, a more sophisticated voice assistant, increased privacy, the potential to reduce internet bandwidth usage, and most importantly, every iPhone user has access to advanced AI. This includes being able to access and respond.
Regardless of Apple’s future ability to demonstrate how it is working towards dominating AI research and applications, experts appear to be in alert mode. Some experts seem to suggest that tech giants need to take great care and responsibility when incorporating research results into real-world use cases. Some emphasize the need to consider privacy protection, ways to mitigate potential abuse and overall impact.
© IE Online Media Services Pvt Ltd
Date first published: December 22, 2023 16:01 IST