{"id":89061,"date":"2026-04-30T08:00:25","date_gmt":"2026-04-30T15:00:25","guid":{"rendered":"https:\/\/phisonblog.com\/?p=89061"},"modified":"2026-04-30T17:30:04","modified_gmt":"2026-05-01T00:30:04","slug":"agentic-ai-is-becoming-practical-local-systems-still-need-more-memory","status":"publish","type":"post","link":"https:\/\/phisonblog.com\/zh-tw\/agentic-ai-is-becoming-practical-local-systems-still-need-more-memory\/","title":{"rendered":"\u667a\u80fd\u9ad4\u4eba\u5de5\u667a\u6167\u6b63\u8b8a\u5f97\u5be6\u7528\uff1a\u672c\u5730\u7cfb\u7d71\u4ecd\u9700\u66f4\u591a\u5167\u5b58"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;0px||||false|false&#8221; custom_padding=&#8221;0px||||false|false&#8221; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; width=&#8221;100%&#8221; max_width=&#8221;100%&#8221; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;0px||||false|false&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; header_2_line_height=&#8221;1.7em&#8221; header_3_line_height=&#8221;1.7em&#8221; custom_margin=&#8221;||-10px||false|false&#8221; custom_padding=&#8221;||0px||false|false&#8221; locked=&#8221;off&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<blockquote>\n<p>Agentic AI workloads demand more memory than traditional AI, especially when running locally. As models grow and agents maintain long-running state, memory becomes the primary bottleneck. This article explains how <a href=\"https:\/\/www.phisonenterprise.com\/pascari-aidaptiv\/\" target=\"_blank\" rel=\"noopener\">aiDAPTIV<\/a> extends effective AI memory to enable larger, more capable models to run reliably on practical systems.<\/p>\n<\/blockquote>\n<p>&nbsp;<\/p>\n<p>Persistent, tool-using AI agents are moving into real workflows. AMD has coined a new device category for this moment, the \u201cAgent Computer.\u201d NVIDIA announced NemoClaw, an open-source security and privacy layer built on top of OpenClaw that adds policy-based guardrails for enterprise deployments. Everyone is talking about what agents can do. Far less attention is going to what it takes to run capable agentic AI well on local systems.<\/p>\n<p>That matters because agentic workloads raise the bar. They do more than answer a single prompt. They plan, use tools, keep state over time, and work across multiple steps. For that kind of work, model quality matters more, which often pushes developers toward larger, more capable models.<\/p>\n<div class=\"banner_wrapper\" style=\"height: 83px;\"><div class=\"banner  banner-88870 bottom vert custom-banners-theme-default_style\" style=\"\"><img decoding=\"async\" width=\"1080\" height=\"150\" src=\"https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/The-AI-Memory-Wall-Why-AI-PCs-Cant-Keep-Up-banner.jpg\" class=\"attachment-full size-full\" alt=\"\" style=\"height: 83px;\" srcset=\"https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/The-AI-Memory-Wall-Why-AI-PCs-Cant-Keep-Up-banner.jpg 1080w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/The-AI-Memory-Wall-Why-AI-PCs-Cant-Keep-Up-banner-980x136.jpg 980w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/The-AI-Memory-Wall-Why-AI-PCs-Cant-Keep-Up-banner-480x67.jpg 480w\" sizes=\"(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) and (max-width: 980px) 980px, (min-width: 981px) 1080px, 100vw\" \/><a class=\"custom_banners_big_link\" href=\"https:\/\/phisonblog.com\/phison-rescales-local-ai-inferencing-with-flash-memory-expansion\/?utm_source=chatgpt.com\"><\/a><div class=\"banner_caption\" style=\"\"><div class=\"banner_caption_inner\"><div class=\"banner_caption_text\" style=\"\">Read: Phison Rescales Local AI Inferencing with Flash Memory Expansion<\/div><\/div><\/div><\/div><\/div>\n<p>&nbsp;<\/p>\n<h3>Why local agentic AI matters<\/h3>\n<p>The case for running <a href=\"https:\/\/phisonblog.com\/category\/ai\/?utm_source=chatgpt.com\">AI<\/a> agents locally is straightforward. Local deployment keeps sensitive data on-device, avoids cloud inference costs that scale with usage, reduces latency for interactive workloads, and gives developers more control over the model and its behavior. For enterprises handling proprietary data, and for OEMs building always-on agent experiences, <a href=\"https:\/\/phisonblog.com\/phison-rescales-local-ai-inferencing-with-flash-memory-expansion\/?utm_source=chatgpt.com\">local inference<\/a> is often a requirement, not just a preference.<\/p>\n<p>The problem is that many developers still rely on the cloud for capable agentic AI for a simple reason: stronger models are easier to run there.<\/p>\n<p>For many local systems, the bottleneck is not compute alone. It is memory. GPU VRAM is limited. <a href=\"https:\/\/phisonblog.com\/dram-or-not-the-difference-between-dram-and-dram-less-ssds-and-why-it-matters\/\">System DRAM<\/a> or unified memory is limited. When memory runs short, local deployments often fall back to smaller models, tighter limits, or more aggressive quantization.<\/p>\n<p>Those tradeoffs may help a model fit, but they can also reduce reliability in multi-step reasoning, tool use, and longer-running tasks. In other words, local agentic AI often settles for smaller models not because smaller models are ideal, but because memory limits force the compromise.<\/p>\n<p>&nbsp;<\/p>\n<h3>What makes agentic workloads harder<\/h3>\n<p>An AI agent is more than a chatbot. Instead of answering one prompt and stopping, it can keep state over time, use tools, check external systems, and take actions on your behalf.<\/p>\n<p>That changes the memory problem.<\/p>\n<p>Most chatbot interactions are relatively short-lived compared with agent workflows. Agents are different. They maintain persistent session state, manage long context windows, monitor changing data sources over time, and orchestrate multiple tools simultaneously. That means they often need to keep more state available for longer.<\/p>\n<p>For developers building more complex workflows, it may not be just one agent. A coding agent tied to a repo, a research agent monitoring data sources, and a writing agent managing long-form context each bring their own session state and tools, multiplying memory demand quickly.<\/p>\n<p>&nbsp;<\/p>\n<h3>Local hardware hits a memory wall<\/h3>\n<p>A large local model at 4-bit quantization can require dozens of gigabytes just for weights. A high-end consumer GPU may have 24 GB of VRAM. The math gets difficult before you account for KV cache, runtime overhead, or agent state.<\/p>\n<p>That is why local agentic AI often gets pushed toward smaller models. The issue is not just whether a model can launch. It is whether a more capable model can run reliably enough to support persistent, tool-using, multi-step workloads on practical client hardware.<\/p>\n<p>Without a solution, developers and OEMs face hard tradeoffs: truncate context and lose coherence, accept latency spikes when data spills into slower tiers, or require expensive high-memory GPUs that price many users out of the market. None of these are ideal if agent PCs are supposed to reach mainstream adoption.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"banner_wrapper\" style=\"height: 83px;\"><div class=\"banner  banner-88912 bottom vert custom-banners-theme-default_style\" style=\"\"><img decoding=\"async\" width=\"1085\" height=\"150\" src=\"https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/Pascari-Adaptiv-Banner-e1775768160620.png\" class=\"attachment-full size-full\" alt=\"\" style=\"height: 83px;\" srcset=\"https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/Pascari-Adaptiv-Banner-e1775768160620-980x150.png 980w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/Pascari-Adaptiv-Banner-e1775768160620-480x150.png 480w\" sizes=\"(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) and (max-width: 980px) 980px, (min-width: 981px) 1085px, 100vw\" \/><a class=\"custom_banners_big_link\" href=\"https:\/\/www.phisonenterprise.com\/pascari-aidaptiv\" target=\"_blank\" rel=\"noopener\"><\/a><div class=\"banner_caption\" style=\"\"><div class=\"banner_caption_inner\"><div class=\"banner_caption_text\" style=\"\">Accelerate Your AI Deployment with Phison's Pascari aiDAPTIV <\/div><\/div><\/div><\/div><\/div>\n<p>&nbsp;<\/p>\n<h3>How Phison\u2019s Pascari aiDAPTIV\u2122 extends AI memory<\/h3>\n<p>aiDAPTIV addresses these challenges by extending effective AI memory across GPU memory, system DRAM, and <a href=\"https:\/\/www.phison.com\/en\/technology\" target=\"_blank\" rel=\"noopener\">flash memory<\/a>, creating a memory hierarchy that helps larger models run on more practical systems without requiring developers to manage each tier manually.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-89093 size-full\" src=\"https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-scaled.png\" alt=\"\" width=\"2880\" height=\"1620\" srcset=\"https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-scaled.png 2880w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-300x169.png 300w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-1024x576.png 1024w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-768x432.png 768w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-1536x864.png 1536w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-2048x1152.png 2048w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-18x10.png 18w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-610x343.png 610w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-1080x608.png 1080w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-1280x720.png 1280w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-980x551.png 980w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-Phisons-Pascari-aiDAPTIV\u2122-extends-AI-memory-1-1-480x270.png 480w\" sizes=\"(max-width: 2880px) 100vw, 2880px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Hot data stays in VRAM. Warm data stays in DRAM, ready for fast reuse. Colder data can tier to an aiDAPTIV Cache Memory SSD instead of being discarded outright. This does not eliminate latency tradeoffs, but it can make larger and longer-running agent workloads practical on systems that would otherwise hit <a href=\"https:\/\/phisonblog.com\/the-ai-memory-wall-why-ai-pcs-cant-keep-up\/?utm_source=chatgpt.com\">memory<\/a> limits much sooner.<\/p>\n<p>&nbsp;<\/p>\n<h3>How aiDAPTIV runs larger MoE models locally<\/h3>\n<p>In a <a href=\"https:\/\/www.ibm.com\/think\/topics\/mixture-of-experts\" target=\"_blank\" rel=\"noopener\">Mixture of Experts model<\/a>, only a subset of experts is active for each token. aiDAPTIV helps keep the active or recently used experts closer to compute, while less-active experts can tier to lower-cost memory.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-89095 size-full\" src=\"https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-aiDAPTIV-runs-larger-MoE-models-locally-scaled.png\" alt=\"\" width=\"2880\" height=\"1620\" srcset=\"https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-aiDAPTIV-runs-larger-MoE-models-locally-scaled.png 2880w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-aiDAPTIV-runs-larger-MoE-models-locally-1280x720.png 1280w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-aiDAPTIV-runs-larger-MoE-models-locally-980x551.png 980w, https:\/\/phisonblog.com\/wp-content\/uploads\/2026\/04\/How-aiDAPTIV-runs-larger-MoE-models-locally-480x270.png 480w\" sizes=\"(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) and (max-width: 980px) 980px, (min-width: 981px) and (max-width: 1280px) 1280px, (min-width: 1281px) 2880px, 100vw\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The router selects the experts needed for the current token. Active experts stay in GPU memory for immediate execution. Recently used experts can remain in system DRAM for fast reuse. Less-active experts can tier to aiDAPTIV Cache Memory instead of forcing the model down to a smaller configuration. When one of those experts is needed again, aiDAPTIV helps bring it back into a faster tier.<\/p>\n<p>aiDAPTIV\u2019s dynamic MoE offloading integrates with llama.cpp, making this capability available through a standard inference API endpoint.<\/p>\n<h3>\u00a0<\/h3>\n<h3>GTC 2026: Running a 120B model on a laptop<\/h3>\n<p><a href=\"https:\/\/phisonblog.com\/phison-rescales-local-ai-inferencing-with-flash-memory-expansion\/\">At GTC 2026<\/a>, Phison demonstrated a simple OpenClaw app on an Acer laptop with an NVIDIA\u00ae GeForce RTX\u2122 5090 GPU, 24 GB of VRAM, and 64 GB of system DRAM. With <a href=\"https:\/\/www.phisonenterprise.com\/wp-content\/uploads\/2026\/03\/aiDAPTIVDatasheet_86af5w2g6.pdf\" target=\"_blank\" rel=\"noopener\">aiDAPTIV<\/a>, the system ran gpt-oss-120B locally, using MoE expert offloading to extend effective memory, at approximately 15 tokens per second. According to OpenAI, gpt-oss-120B requires a single 80 GB GPU to run natively, more than three times the VRAM available on the demo system. That is the tradeoff aiDAPTIV changes. Instead of forcing local agentic AI to settle for a smaller model, it helps larger, more capable models run on practical client hardware.<\/p>\n<p>Note: OpenAI\u2019s model card lists 80 GB for single-GPU deployment. Demo throughput was approximately 15 tok\/s, or approximately 5\u20136 tok\/s with KV cache reuse enabled.<\/p>\n<p>What this means for device makers<\/p>\n<p>OEMs building agent PCs today face a difficult choice: spec a GPU with enough memory to handle larger agent workloads, which raises cost and limits the addressable market, or ship with constrained memory and accept reduced capability. Neither is an ideal product story.<\/p>\n<p>aiDAPTIV changes that equation. A mid-range or memory-constrained client system paired with fast DRAM and aiDAPTIV Cache Memory can extend effective memory, making larger agent workloads practical without requiring much-higher-memory GPUs.<\/p>\n<p>That matters especially in notebooks and other lower-memory client systems. As integrated GPUs and client GPUs become more capable, memory remains one of the main barriers to running serious agentic workloads locally. Intelligent memory tiering helps make longer-running, more capable agents practical on thinner, lower-cost systems that would otherwise hit those limits much sooner.<\/p>\n<h3>\u00a0<\/h3>\n<h3>The gap aiDAPTIV fills<\/h3>\n<p>The shift to agentic AI exposes a practical problem underneath the excitement: stronger local agents need stronger models, and stronger models need more memory. That memory-elasticity layer remains a real gap in the local AI stack. That is the problem <a href=\"https:\/\/www.phisonenterprise.com\/pascari-aidaptiv\/\" target=\"_blank\" rel=\"noopener\">aiDAPTIV<\/a> is built to solve.<\/p>\n<h3>\u00a0<\/h3>\n<h3>Work with Phison<\/h3>\n<p>AI memory is often the bottleneck for local agentic AI. <a href=\"https:\/\/phisonblog.com\/contact-2\/\">Contact us<\/a> to find out how aiDAPTIV can help.<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; width=&#8221;100%&#8221; max_width=&#8221;100%&#8221; custom_margin=&#8221;||||false|false&#8221; custom_padding=&#8221;0px||||false|false&#8221; saved_tabs=&#8221;all&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h3><strong>Frequently Asked Questions (FAQ) :<\/strong><\/h3>\n<p>[\/et_pb_text][et_pb_toggle title=&#8221;What is agentic AI and how is it different from traditional AI models?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW134562294 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW134562294 BCX0\">Agentic AI refers to systems that go beyond single-response interactions. These agents plan tasks, use external tools,\u00a0<\/span><span class=\"NormalTextRun SCXW134562294 BCX0\">maintain<\/span><span class=\"NormalTextRun SCXW134562294 BCX0\">\u00a0memory across sessions, and execute multi-step workflows. Unlike traditional chatbots, they\u00a0<\/span><span class=\"NormalTextRun SCXW134562294 BCX0\">require<\/span><span class=\"NormalTextRun SCXW134562294 BCX0\">\u00a0persistent state and longer context handling, increasing both\u00a0<\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW134562294 BCX0\">compute<\/span><span class=\"NormalTextRun SCXW134562294 BCX0\"> and memory demands.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;Why is memory a critical bottleneck for local AI deployment?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW237703736 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW237703736 BCX0\">Local systems are constrained by GPU VRAM and system DRAM capacity. Advanced models require tens of gigabytes for weights alone, not including runtime overhead like KV cache and agent state. When memory is insufficient, developers must reduce model size or performance, limiting real-world usability.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;Why do enterprises prefer running AI locally instead of in the cloud?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW180947092 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW180947092 BCX0\">Local AI deployment ensures data sovereignty, reduces inference latency,\u00a0<\/span><span class=\"NormalTextRun SCXW180947092 BCX0\">eliminates<\/span><span class=\"NormalTextRun SCXW180947092 BCX0\"> recurring cloud costs, and provides full control over model behavior. For enterprise IT environments handling proprietary data, local inference is often a compliance and security requirement.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;What challenges do agentic workloads introduce compared to standard inference?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW182648171 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW182648171 BCX0\">Agentic workloads require continuous memory allocation for session state, tool orchestration, and long context retention. Multiple concurrent agents can amplify memory demand, making traditional hardware configurations insufficient without optimization.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;Why are larger models necessary for agentic AI?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW149535183 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW149535183 BCX0\">Larger models\u00a0<\/span><span class=\"NormalTextRun SCXW149535183 BCX0\">generally deliver<\/span><span class=\"NormalTextRun SCXW149535183 BCX0\">\u00a0better reasoning, planning, and tool-use capabilities. These attributes are essential for reliable multi-step workflows. However, their memory requirements make\u00a0<\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW149535183 BCX0\">them<\/span><span class=\"NormalTextRun SCXW149535183 BCX0\">\u00a0<\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW149535183 BCX0\">difficult<\/span><span class=\"NormalTextRun SCXW149535183 BCX0\"> to deploy on standard client hardware.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;How does Phison aiDAPTIV\u2122 improve local AI performance?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW157873199 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW157873199 BCX0\">aiDAPTIV<\/span><span class=\"NormalTextRun SCXW157873199 BCX0\"> introduces a hierarchical memory architecture that dynamically distributes workloads across GPU VRAM, system DRAM, and SSD-based cache. This approach expands effective memory capacity without requiring manual data management, enabling larger models to run efficiently on constrained systems.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;What role does aiDAPTIV play in Mixture of Experts (MoE) models?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW163826019 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW163826019 BCX0\">aiDAPTIV<\/span><span class=\"NormalTextRun SCXW163826019 BCX0\">\u00a0<\/span><span class=\"NormalTextRun SCXW163826019 BCX0\">optimizes<\/span><span class=\"NormalTextRun SCXW163826019 BCX0\">\u00a0<\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW163826019 BCX0\">MoE<\/span><span class=\"NormalTextRun SCXW163826019 BCX0\">\u00a0execution by keeping active experts in VRAM, recently used experts in DRAM, and less-active experts in SSD cache. This dynamic tiering ensures that only necessary components\u00a0<\/span><span class=\"NormalTextRun SCXW163826019 BCX0\">remain<\/span><span class=\"NormalTextRun SCXW163826019 BCX0\"> in high-speed memory, improving efficiency without reducing model size.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;How does aiDAPTIV enable large model deployment on consumer hardware?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW120282731 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW120282731 BCX0\">By offloading inactive model components to lower-cost memory tiers,\u00a0<\/span><span class=\"NormalTextRun SCXW120282731 BCX0\">aiDAPTIV<\/span><span class=\"NormalTextRun SCXW120282731 BCX0\"> allows systems with limited VRAM to run models that would traditionally require high-end GPUs. This significantly lowers hardware barriers for local AI deployment.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;What was demonstrated at GTC 2026 using aiDAPTIV?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW243080207 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2Themed SCXW243080207 BCX0\">Phison<\/span><span class=\"NormalTextRun SCXW243080207 BCX0\">\u00a0demonstrated a 120B parameter model running locally on a laptop with a 24 GB GPU. The system achieved approximately 15 tokens per second using\u00a0<\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW243080207 BCX0\">aiDAPTIV\u2019s<\/span><span class=\"NormalTextRun SCXW243080207 BCX0\">\u00a0memory tiering and\u00a0<\/span><span class=\"NormalTextRun SpellingErrorV2Themed SCXW243080207 BCX0\">MoE<\/span><span class=\"NormalTextRun SCXW243080207 BCX0\">\u00a0offloading, proving that large-scale models can\u00a0<\/span><span class=\"NormalTextRun SCXW243080207 BCX0\">operate<\/span><span class=\"NormalTextRun SCXW243080207 BCX0\"> on practical hardware.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][et_pb_toggle title=&#8221;How does aiDAPTIV benefit OEMs and enterprise system builders?&#8221; _builder_version=&#8221;4.27.6&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<p><span class=\"TextRun SCXW263107892 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW263107892 BCX0\">aiDAPTIV<\/span><span class=\"NormalTextRun SCXW263107892 BCX0\"> enables OEMs to design AI-capable systems without over-provisioning expensive GPU memory. It supports scalable, low-latency, and AI-ready architectures, making it possible to deliver enterprise-grade agentic AI performance in cost-efficient devices.<\/span><\/span><\/p>\n<p>[\/et_pb_toggle][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Agentic AI workloads demand more memory than traditional AI, especially when running locally. As models grow and agents maintain long-running state, memory becomes the primary bottleneck. This article explains how aiDAPTIV extends effective AI memory to enable larger, more capable models to run reliably on practical systems. &nbsp; Persistent, tool-using AI agents are moving into [&hellip;]<\/p>\n","protected":false},"author":79,"featured_media":89115,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","inline_featured_image":false,"footnotes":""},"categories":[120,23,116],"tags":[22],"class_list":["post-89061","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-all-posts","category-featured","tag-long-content"],"acf":[],"_links":{"self":[{"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/posts\/89061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/users\/79"}],"replies":[{"embeddable":true,"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/comments?post=89061"}],"version-history":[{"count":14,"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/posts\/89061\/revisions"}],"predecessor-version":[{"id":89116,"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/posts\/89061\/revisions\/89116"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/media\/89115"}],"wp:attachment":[{"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/media?parent=89061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/categories?post=89061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/phisonblog.com\/zh-tw\/wp-json\/wp\/v2\/tags?post=89061"}],"curies":[{"name":"\u53ef\u6fd5\u6027\u7c89\u5291","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}