<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Popular AI]]></title><description><![CDATA[Practical AI for people who want capability without permission: local setups, troubleshooting, tool comparisons, and clear-eyed AI analysis.]]></description><link>https://www.popularai.org</link><image><url>https://substackcdn.com/image/fetch/$s_!ea4m!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png</url><title>Popular AI</title><link>https://www.popularai.org</link></image><generator>Substack</generator><lastBuildDate>Wed, 24 Jun 2026 02:42:39 GMT</lastBuildDate><atom:link href="https://www.popularai.org/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Popular Media]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[popularai@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[popularai@substack.com]]></itunes:email><itunes:name><![CDATA[Popular AI]]></itunes:name></itunes:owner><itunes:author><![CDATA[Popular AI]]></itunes:author><googleplay:owner><![CDATA[popularai@substack.com]]></googleplay:owner><googleplay:email><![CDATA[popularai@substack.com]]></googleplay:email><googleplay:author><![CDATA[Popular AI]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Science proves AI users are dumb]]></title><description><![CDATA[A viral AI literacy study is being used to mock AI fans as tech illiterates. But what the study really demonstrates is far stranger.]]></description><link>https://www.popularai.org/p/ai-literacy-study-ai-users-dumb</link><guid isPermaLink="false">https://www.popularai.org/p/ai-literacy-study-ai-users-dumb</guid><dc:creator><![CDATA[Ben Geudens]]></dc:creator><pubDate>Mon, 22 Jun 2026 21:12:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JGSv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JGSv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JGSv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!JGSv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!JGSv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!JGSv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JGSv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2109140,&quot;alt&quot;:&quot;Are AI users actually dumb? The study everyone is misreading&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/203154148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Are AI users actually dumb? The study everyone is misreading" title="Are AI users actually dumb? The study everyone is misreading" srcset="https://substackcdn.com/image/fetch/$s_!JGSv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!JGSv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!JGSv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!JGSv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0ba9c0-3b4d-4025-b6ad-541fe185120a_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Does the AI literacy study prove AI users are dumb? The research points to magical thinking, not IQ, and serious AI use looks very different. &#169; Popular AI</figcaption></figure></div><p>A study titled <a href="https://journals.sagepub.com/doi/10.1177/00222429251314491">&#8220;Lower Artificial Intelligence Literacy Predicts Greater AI Receptivity&#8221;</a> is being passed around as if it proves a simple insult: people who like AI are dumb. But is that really what the paper proves?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/ai-literacy-study-ai-users-dumb?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/ai-literacy-study-ai-users-dumb?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>The study shows that people who know less about AI can find it more magical, and that feeling can make them more receptive to AI. That may very well be true, but how reasonable is it to turn that into a claim about IQ, competence, or serious AI users?</p><h3>Key takeaways</h3><blockquote><p>The study found a link between lower AI literacy and higher AI receptivity, mediated by perceptions of AI as magical or awe-inspiring. It did not measure IQ.</p></blockquote><blockquote><p>The study&#8217;s AI literacy measure included practical knowledge questions about algorithms, privacy, bias, cloud storage, deepfakes, and machine learning. It does not track general intelligence.</p></blockquote><blockquote><p>A 2026 reanalysis argues that one usage-based part of the paper may be better read as broader adoption across non-text AI tools, rather than proof of lower literacy driving AI receptivity in general.</p></blockquote><blockquote><p>Other research shows that AI can improve productivity, but only when users understand where the tool helps and where it fails.</p></blockquote><blockquote><p>There is real frustration among users who are savvy enough to spot hallucinations, bias, refusals, and unusable outputs, but they may not yet know how to work around them.</p></blockquote><blockquote><p>Serious AI use goes beyond blind trust and requires adversarial collaboration with a flawed machine.</p></blockquote><h3>What the AI literacy study really says</h3><p>The paper, published in the Journal of Marketing, is called <a href="https://journals.sagepub.com/doi/10.1177/00222429251314491">&#8220;Lower Artificial Intelligence Literacy Predicts Greater AI Receptivity&#8221;</a>. The authors are Stephanie M. Tully, Chiara Longoni, and Gil Appel.</p><p>The abstract says the authors found that people with lower AI literacy were &#8220;typically more receptive to AI.&#8221; The authors do not associate this relationship with IQ or general intelligence. In other words, &#8220;people with lower AI literacy&#8221; could technically include geniuses who live under a technological rock.</p><p>Still, the paper reaches some interesting conclusions. People with lower AI literacy were more likely to perceive AI as &#8220;magical&#8221; when it performed tasks that seemed human, and that sense of magic increased receptivity.</p><p>That is a useful finding. It helps explain why some people overtrust AI systems. A chatbot that writes clean prose, summarizes a PDF, generates an image, or imitates a voice can feel like sorcery to someone who has no model of token prediction, training data, reinforcement learning, image diffusion, sampling, retrieval, or synthetic media.</p><p>But the paper does not show that AI adopters have low IQ. It does not show that people who use AI heavily are dumb. It does not show that skeptical non-users are smarter. It does not even account for the fact that one AI usage case may follow entirely different patterns than another.</p><p>In short, it shows a relationship between AI literacy and receptivity in the broad but limited conditions the authors studied. &#8220;AI literacy&#8221; is a 17-item measure of specific, practical AI-related competencies, not a proxy for IQ.</p><h3>The reanalysis makes the simple story weaker</h3><p>There is also a newer wrinkle. A June 2026 arXiv preprint titled <a href="https://arxiv.org/abs/2606.13734">&#8220;AI Receptivity or AI Adoption Breadth?&#8221;</a> reanalyzed data from Study 3 of the original paper.</p><p>This reanalysis says it reproduced the negative association between AI literacy and aggregate AI usage. But when it separated tool categories, the author found that AI literacy did not significantly predict text AI usage in the main demographic-adjusted specification, while it remained a strong predictor for non-text AI adoption.</p><p>In other words, it suggests the original aggregate usage finding may have captured broader adoption across non-text tools more than text-AI receptivity.</p><p>That reinforces an obvious observation: &#8220;AI receptivity&#8221; is not one thing. ChatGPT for writing, Midjourney-style image generation, AI voice cloning, recommendation systems, coding assistants, automated customer support, and medical triage tools do not occupy the same mental category for users.</p><p>A person can be skeptical of AI in medicine and enthusiastic about AI image upscaling. He can distrust a chatbot&#8217;s politics and use it all day for code review. He can refuse cloud tools for private work and still run local models on his own hardware.</p><h3>A proposed alternative: AI horseshoe theory</h3><p>It is probably more accurate to think of AI receptivity as an intellectual analogue to political horseshoe theory.</p><p>At the left end is the low-literacy enthusiast. He is impressed because AI feels magical. He may paste in a task, receive a fluent answer, and mistake a confidently presented collection of hallucination-slop and hard-coded biases for absolute truth. He cannot tell when the model is hallucinating, when a source is missing, when wording is manipulative, or when the tool is simply reflecting defaults learned from its training and alignment pipeline.</p><p>In the middle is the literate but frustrated user. This person has used AI enough to see its flaws. He knows the output is often incomplete. He sees factual mistakes. He notices political or corporate bias. He runs into refusals. He asks for direct answers and gets padded disclaimers. He asks for a sharp edit and sees his brilliant insights and arguments mutilated into soulless PR-sounding corporate sludge. He asks for research and gets plausible citations that do not survive a source check.</p><div><hr></div><h4><em><strong>More on the menace of normie AI users:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b20fa161-e526-42f6-97ba-39cf9b6d7b62&quot;,&quot;caption&quot;:&quot;The biggest risk to mass-use AI chatbots is not that they stop getting smarter. It is that their default behavior becomes optimized for the easiest user to please, the least risky answer to publish, and the&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Will the average user make AI worse for power users?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-18T07:59:10.714Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zRa2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d4b6ba-af0d-413e-b32a-8396235da50b_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/average-users-dumb-down-ai-chatbots&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:198226636,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><p>This user is not wrong to be irritated. Cloud AI is genuinely biased, filtered, inconsistent, gatekept, and aligned with rules and guidelines outside of the user&#8217;s control. There are absolutely legitimate reasons to be frustrated.</p><p>The problem is that the middle user may stop there. He can diagnose these failures, but he cannot come up with fixes for what goes wrong. So the tool becomes a personal insult. &#8220;The AI is lying to me.&#8221; &#8220;The AI is stupid.&#8221; &#8220;The AI refuses to do what I want.&#8221; Sometimes that is functionally, observably true. Yet that does not make those observations useful or productive to dwell on.</p><p>At the far end is the competent power user. He sees these exact same failures and treats them as normal operating conditions.</p><p>He does not expect the first prompt to work. He does not treat one answer as settled truth. He does not ask a cloud model to be neutral, brave, complete, and uncensored, expecting to override billions of training and conditioning parameters. He assumes the model has blind spots, policy limits, missing context, weak priors, and a strong tendency to produce whatever sounds most acceptable. Then he works around it.</p><p>That is the AI horseshoe theory I am proposing here: both the AI-illiterate novice and the AI power user may be receptive to AI, but for opposite reasons. The novice is receptive because he worships it as infallible magic. The power user is receptive because he sees real leverage when it comes to productive output, regardless of the tool&#8217;s shortcomings.</p><p>The average user in the middle is stuck in a reality where AI has stopped feeling awe-inspiring and still is not helping him get the results he wants.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K8Wm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K8Wm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!K8Wm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!K8Wm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!K8Wm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K8Wm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1918718,&quot;alt&quot;:&quot;The AI literacy study does not prove what AI haters think&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/203154148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The AI literacy study does not prove what AI haters think" title="The AI literacy study does not prove what AI haters think" srcset="https://substackcdn.com/image/fetch/$s_!K8Wm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!K8Wm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!K8Wm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!K8Wm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90368f82-d9c3-4734-9d61-4991c4011a1c_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Lower AI literacy may predict greater AI receptivity, but that does not mean power users are stupid. The real story is stranger. &#169; Popular AI</figcaption></figure></div><h3>The middle user&#8217;s frustration is backed by real research</h3><p>This middle zone is not imaginary. It matches older work on algorithm aversion and newer work on generative AI&#8217;s uneven usefulness.</p><p>Take, for example, the <a href="https://marketing.wharton.upenn.edu/wp-content/uploads/2016/10/Dietvorst-Simmons-Massey-2014.pdf">Dietvorst, Simmons, and Massey paper on algorithm aversion</a>. It found that people often avoid algorithms after seeing them make mistakes, even when algorithms outperform human forecasters. That is consistent with many reactions to generative AI today. Once a user sees ChatGPT fabricate a source, Claude refuse a harmless task, or Gemini omit an obvious fact, he may dismiss the usefulness of generative AI tools as an entire category.</p><p>Sometimes, as we have discussed in an earlier article, he may be correct, and the tools in question <a href="https://www.popularai.org/p/ai-safety-makes-product-useless">may be genuinely useless</a>. Far more often, though, it is an overcorrection.</p><p>The <a href="https://www.hbs.edu/faculty/Pages/item.aspx?num=64700">HBS and BCG &#8220;jagged technological frontier&#8221; study</a> is even more relevant. In that experiment, 758 consultants were assigned to no AI access, GPT-4 access, or GPT-4 access with a prompt-engineering overview. For 18 tasks inside the AI capability frontier, AI users completed 12.2% more tasks and completed them 25.1% faster, with higher quality. But for a complex task outside the frontier, AI users were 19% less likely to produce correct solutions.</p><p>The takeaway here is that AI, in its current form, is not uniformly useful to everyone. It is extremely useful on some tasks and disastrous on others. The competent user&#8217;s edge is knowing which side of the frontier he is on.</p><h3>Productivity studies do not support blind trust either</h3><p>The best productivity research similarly does not say &#8220;AI makes everyone smarter.&#8221; It implies that AI can improve output under specific conditions.</p><p>In <a href="https://academic.oup.com/qje/article/140/2/889/7990658">&#8220;Generative AI at Work&#8221;</a>, Erik Brynjolfsson, Danielle Li, and Lindsey Raymond studied 5,172 customer-support agents and found that access to an AI assistant increased productivity by 15% on average. The gains were larger for less experienced and lower-skilled workers, while the most experienced and highest-skilled workers saw small speed gains and small declines in quality.</p><p>That finding can be read two ways.</p><p>One reading is that AI helps lower-skill users catch up. That is true in many structured workflows. If the job is customer support and the AI has seen thousands of good examples, weaker agents can copy better patterns faster.</p><p>Another reading is that skilled users need different AI tools. If a workflow is already optimized for average performance, the top worker may gain less from a generic assistant. For a high-end user, the value often comes from more customized use: research acceleration, adversarial critique, code generation with tests, style variation, source comparison, local document search, batch transformation, automation, image workflows, and tool chaining.</p><p>The <a href="https://www.science.org/doi/10.1126/science.adh2586">Noy and Zhang experiment in Science</a> found that ChatGPT reduced average completion time by 40% and improved output quality by 18% on writing tasks. That is an undeniable advantage, but it still does not mean the first draft is publishable. It means a user with judgment can move faster through the draft, revision, and quality-control loop.</p><p>The serious lesson is to learn exactly where AI creates surplus, then capture it without outsourcing your judgment.</p><h3>The power user does not ask AI to be an oracle</h3><p>My own rule is simple: never expect a usable output from the first prompt.</p><p>When using a hosted model, I expect missing context. I expect softened language. I expect hidden assumptions. I expect stale knowledge unless browsing or source retrieval is active. I expect the model to avoid certain conclusions even when the evidence points there. I expect hallucinations in citations, dates, names, and technical details unless I force verification.</p><p>Some would label that cynicism, but anyone who has spent a modicum of time in IT would call it tool literacy.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>A power user can still get enormous value because he brings his own knowledge to the session. He can tell when an answer is structurally wrong. He can spot an omitted variable. He can ask for alternative hypotheses. He can demand source-backed claims. He can make the model search again. He can split a task into smaller pieces. He can move sensitive work to a local model. He can use a different model when the first one is filtered, weak, or misaligned with the job.</p><p>This is where AI can become a real intelligence multiplier. The mechanism is practical rather than psychometric: AI can act like a tireless research assistant, editor, critic, programmer, translator, summarizer, brainstorming partner, and workflow engine.</p><p>Vox Day, founder of <a href="https://aicentral.substack.com/">AI Central</a>, made a sharper version of this argument in his February 2026 post <a href="https://voxday.net/2026/02/22/you-can-be-effectively-smarter/">&#8220;You Can Be Effectively Smarter&#8221;</a>. In it, he observes that, if used correctly, AI can raise &#8220;effective applied intelligence&#8221; by roughly 1.5 standard deviations, or about 24 IQ points. The important phrase is &#8220;used correctly.&#8221; He explicitly warns that using AI as a mirror or as a flattery machine adds nothing.</p><p>That is exactly the distinction between na&#239;ve AI enthusiasm and power-user leverage: the value comes from challenge, iteration, verification, and higher-caliber output, not from asking a chatbot to validate you.</p><h3>Cloud AI makes the middle user&#8217;s anger understandable</h3><p>The dissatisfied user is often right about the hosted tool, even when he is wrong to stop there.</p><p>Cloud AI does not belong to the user. It runs through an account. It is governed by corporate policy, or worse: government bureaucrats. It can change without meaningful consent. Its defaults are tuned for mass-market safety, legal risk, brand protection, and user retention.</p><p>OpenAI&#8217;s own help center says that for individual services such as ChatGPT and Codex, the company may use user content to train models, with opt-out controls available. OpenAI says business products such as ChatGPT Business, ChatGPT Enterprise, and the API are opted out of training by default unless the organization opts in. That is a real difference between consumer and business use.</p><p>Anthropic&#8217;s August 2025 consumer terms update similarly shows that data rules are not static. Anthropic said users could choose whether to allow data to be used for model improvement, and that allowing it would expand the retention period for new or resumed chats and coding sessions to five years.</p><p>Model access can change too. OpenAI&#8217;s ChatGPT release notes say o3 and GPT-4.5 are being retired from ChatGPT, with sunset dates for paid users. That does not make OpenAI evil. It does show the bargain. Hosted AI is rented capability. The vendor controls the model picker, the policy layer, the interface, the limits, and the retirement schedule.</p><p>The middle user experiences this as betrayal. The power user treats it as a design constraint. It is an unfortunate, but often unavoidable, operating cost.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/ai-literacy-study-ai-users-dumb?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/ai-literacy-study-ai-users-dumb?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Local AI is the escape route, not a magic replacement</h3><p>Local AI is getting better, but it is not a simple replacement for frontier cloud models.</p><p>Tools like <a href="https://lmstudio.ai/">LM Studio</a> let users run local models on their own hardware. <a href="https://github.com/open-webui/open-webui">Open WebUI</a> describes itself as a self-hosted AI platform that can operate offline and connect to runners like Ollama and OpenAI-compatible APIs. We have already covered this broader shift in pieces like <a href="https://www.popularai.org/p/is-local-ai-hardware-worth-it-2026">&#8220;Should you buy local AI hardware in 2026?&#8221;</a>, <a href="https://www.popularai.org/p/pewdiepie-odysseus-ai-workspace">&#8220;PewDiePie built a private AI workspace, and it is worth watching&#8221;</a>, and <a href="https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent">&#8220;GGUF Loader Agentic Mode: local coding agents without cloud accounts&#8221;</a>.</p><p>Switching to local AI definitely comes with a tradeoff. Local models can be weaker. They require hardware. They need setup. They can be slow. They can break after updates. They can still hallucinate. They do not magically become truthful because they run on your own machine.</p><p>But they give you control points you do not get from a hosted chatbot. You choose the model. You choose the update schedule. You keep private drafts, documents, code, and experiments closer to your own hardware. You can test uncensored or differently tuned models when a cloud assistant refuses ordinary work. You can build workflows that survive a vendor changing its limits.</p><p>The future likely belongs to hybrid users: cloud for frontier capability, local for privacy, repeatability, and fallback.</p><div><hr></div><h4><em><strong>More on local AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a8a54ba3-5d29-47a4-9644-4a4eece7eaed&quot;,&quot;caption&quot;:&quot;This year, the local AI hardware question finally got serious. A recent r/LocalLLaMA Reddit thread asked the question many newcomers are quietly thinking: why spend real money on local AI hardware when a&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Should you buy local AI hardware in 2026? The honest answer&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-12T14:42:57.114Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!g2y0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ba02143-e5b9-477a-95fe-9d37ba7d41be_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/is-local-ai-hardware-worth-it-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:197354970,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>What dissatisfied mid-tier users should do next</h3><p>The way out is not to sneer at AI users. It is to become a better AI user.</p><p>If the model hallucinates, stop asking for finished answers and start asking for source-backed claims, uncertainty labels, and verification steps.</p><p>If the model omits important information, supply the missing axis directly. Ask what a critic would say. Ask for contrary evidence. Ask it to list what it did not check.</p><p>If the model is biased, separate research from drafting. Force it to retrieve primary sources first. Then have it summarize competing claims. Then write the argument yourself.</p><p>If the model refuses, investigate whether the task is genuinely unsafe, badly framed, or just blocked by a hosted policy layer. Rewrite prompts when appropriate. Use different tools when necessary. Move to a local model when the work is blocked by product policy rather than real risk.</p><p>If the output is bland, stop asking for &#8220;better writing.&#8221; Provide a voice sample. Give constraints. Ban filler patterns. Ask for three versions with different tradeoffs. Edit the final version yourself.</p><p>Better input genuinely makes for better output.</p><p>If the AI keeps getting lost, reduce context. Use clean project folders. Create source packets. <a href="https://www.popularai.org/p/context-contamination-why-ai-feels-off-topic">Avoid context contamination</a>.</p><p>The practical difference between the middle user and the power user is not that one sees the flaws and the other does not. The power user sees more flaws. He is just better equipped to deal with them.</p><div><hr></div><h3>FAQ</h3><p><strong>Does the AI literacy study prove AI users have lower IQ?</strong></p><blockquote><p>No. The study measured AI literacy, not IQ. Its AI literacy measures tested practical knowledge about algorithms, machine learning, privacy, bias, deepfakes, cloud storage, and related concepts. It does not support the claim that enthusiastic AI users are less intelligent.</p><div><hr></div></blockquote><p><strong>Why would lower AI literacy make people more receptive to AI?</strong></p><blockquote><p>The study&#8217;s explanation is that lower-literacy users may perceive AI as more magical and awe-inspiring, especially when it performs tasks that seem human. That perceived magic can make AI feel more impressive and increase receptivity.</p><div><hr></div></blockquote><p><strong>Can smart people still be highly enthusiastic about AI?</strong></p><blockquote><p>Yes. A highly capable user can be enthusiastic for a completely different reason. The novice may trust AI because it feels magical. The power user values AI because he understands its limits and still knows how to extract useful work from it.</p><div><hr></div></blockquote><p><strong>Why do some moderately informed users become anti-AI?</strong></p><blockquote><p>Because they have seen enough to notice real failures. Hallucinations, refusals, bias, shallow writing, missing context, and bad citations are all real problems. The question is whether the user stops at complaint or learns how to build workflows that account for those failures.</p><div><hr></div></blockquote><p><strong>Is local AI better than ChatGPT or Claude?</strong></p><blockquote><p>Not automatically. Cloud models are usually stronger and easier to use. Local AI gives more control, privacy, and stability, but it requires hardware, setup, maintenance, and realistic expectations. The best setup for many serious users is hybrid: cloud tools for frontier tasks, local tools for private or policy-sensitive workflows.</p><div><hr></div></blockquote><p><strong>What should a new AI power user learn first?</strong></p><blockquote><p>Learn verification. Learn prompt decomposition. Learn when to use sources. Learn how to compare model outputs. Learn what data you should not upload. Learn the difference between cloud and local workflows. Learn how to recognize when AI is outside its current capability frontier.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>The takeaway</h3><p>The AI literacy study should make people more careful about AI adoption, not more smug about avoiding AI altogether.</p><p>Lower-literacy users can be too receptive to AI because sufficiently advanced technology resembles magic when it exceeds someone&#8217;s ability to understand it. But anti-AI commentators are overreaching when they turn that into &#8220;AI users are dumb.&#8221; The study did not measure IQ, and it does not explain the top end of AI use where skilled users are building research, coding, publishing, image, audio, automation, and local workflows around known model failures.</p><p>The sane position is simple: do not trust AI blindly, and do not reject AI altogether because your first prompt did not result in perfect success. Do not confuse hosted product limits with the limits of the entire technological class.</p><p>Treat AI less like an oracle, a friend, or a one-prompt solution and more like a tool. Used badly, it produces confident garbage. Used competently, it can multiply output, compress research time, expand creative range, and make workflows possible that used to require a team.</p><p>AI is definitely an asset, but only once magical thinking and unrealistic expectations give way to competent control.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/ai-literacy-study-ai-users-dumb/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/ai-literacy-study-ai-users-dumb/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[The RTX 5090 for local AI: fast bandwidth, same VRAM wall]]></title><description><![CDATA[For local AI, the RTX 5090 shines with 8B to 32B models. Bigger 70B workloads still point to RTX PRO 6000, H100, or cloud GPUs.]]></description><link>https://www.popularai.org/p/rtx-5090-local-ai-memory-bandwidth-vram</link><guid isPermaLink="false">https://www.popularai.org/p/rtx-5090-local-ai-memory-bandwidth-vram</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Fri, 19 Jun 2026 20:26:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xrcs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xrcs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xrcs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!xrcs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!xrcs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!xrcs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xrcs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e996957-7795-4a41-b675-22764884f2f4_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2193710,&quot;alt&quot;:&quot;RTX 5090 local AI guide: when 32GB VRAM is enough&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/202193195?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5090 local AI guide: when 32GB VRAM is enough" title="RTX 5090 local AI guide: when 32GB VRAM is enough" srcset="https://substackcdn.com/image/fetch/$s_!xrcs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!xrcs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!xrcs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!xrcs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e996957-7795-4a41-b675-22764884f2f4_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Bandwidth is better on the RTX 5090, yet local LLMs still live or die by VRAM. See the model-size limits before buying. &#169; Popular AI</figcaption></figure></div><p>The <a href="https://www.amazon.com/ASUS-Graphics-3-8-Slot-Axial-tech-Phase-Change/dp/B0DS2WQZ2M?tag=popularai-20">RTX 5090</a> changes the local AI conversation because it makes memory bandwidth feel less like the first bottleneck. According to NVIDIA&#8217;s <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/">RTX 5090 specifications</a>, the GeForce flagship gives local users 32GB of GDDR7, a 512-bit memory bus, PCIe Gen 5, fifth-generation Tensor Cores, and 1,792 GB/s of memory bandwidth.</p><p>That is a major step forward for local LLM inference, ComfyUI, AI video, small fine-tunes, coding assistants, and creator workflows that used to punish consumer GPUs for slow memory movement.</p><p>The catch is simple. The RTX 5090 is still a 32GB card.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/rtx-5090-local-ai-memory-bandwidth-vram?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/rtx-5090-local-ai-memory-bandwidth-vram?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>For local AI, that number matters more than the AI TOPS figure on the box. Bandwidth decides how quickly the GPU can move model weights and cache data. VRAM decides whether the model can run at all without spilling into system RAM, splitting awkwardly across GPUs, or crashing before the first useful token appears.</p><p>That is the new shape of the high-end local AI market.</p><p>The RTX 5090 gives independent developers, creators, and power users much more speed inside the 32GB envelope. The <a href="https://www.amazon.com/PNY-NVIDIA-Blackwell-Max-Q-Accelerator/dp/B0FPZMMB23?tag=popularai-20">RTX PRO 6000 Blackwell</a> moves into a different class with 96GB of ECC GDDR7. The <a href="https://www.amazon.com/s?k=NVIDIA+H100+GPU&amp;tag=popularai-20">H100</a> remains datacenter hardware because it combines large HBM memory, higher bandwidth, NVLink, MIG, enterprise software, and server-class deployment behavior.</p><p>The mistake is treating these cards as if they sit on one simple speed ladder. They do not. They sit in different memory tiers.</p><h3>The short answer</h3><p><a href="https://www.amazon.com/ASUS-Graphics-3-8-Slot-Axial-tech-Phase-Change/dp/B0DS2WQZ2M?tag=popularai-20">Buy or build around an RTX 5090</a> if you want the fastest practical consumer GPU for local AI workloads that fit inside 32GB of VRAM.</p><p>That makes it a strong card for fast 7B, 8B, 14B, 27B, 30B, and 32B local LLM inference, depending on quantization and context length. It is also a strong choice for high-throughput single-user coding assistants, local RAG with moderate context, ComfyUI, SDXL, FLUX-style image workflows, some local video workflows, and local testing before moving bigger jobs to workstation or cloud hardware.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/ASUS-Graphics-3-8-Slot-Axial-tech-Phase-Change/dp/B0DS2WQZ2M?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-u4X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-u4X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-u4X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-u4X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-u4X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg" width="454" height="364.82142857142856" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1170,&quot;width&quot;:1456,&quot;resizeWidth&quot;:454,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;RTX 5090 vs H100 vs RTX PRO 6000: the VRAM decision&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/ASUS-Graphics-3-8-Slot-Axial-tech-Phase-Change/dp/B0DS2WQZ2M?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5090 vs H100 vs RTX PRO 6000: the VRAM decision" title="RTX 5090 vs H100 vs RTX PRO 6000: the VRAM decision" srcset="https://substackcdn.com/image/fetch/$s_!-u4X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-u4X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-u4X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-u4X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37c703d7-84d0-43c3-afce-3e1acd711fd2_1500x1205.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/ASUS-Graphics-3-8-Slot-Axial-tech-Phase-Change/dp/B0DS2WQZ2M?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 5090 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/ASUS-Graphics-3-8-Slot-Axial-tech-Phase-Change/dp/B0DS2WQZ2M?tag=popularai-20"><span>Find RTX 5090 deals on Amazon</span></a></p><p>It is a weaker fit if your main target is uncompressed 70B-class inference, large MoE models, heavy multi-user serving, serious training, or multi-GPU scaling where PCIe becomes the communication path. In those cases, the RTX PRO 6000 Blackwell, H100, H200, B200, or rented cloud GPUs make more sense.</p><p>The RTX 5090 is best understood as the new consumer king for local AI <strong>inside 32GB</strong>. Once your workload needs more memory than that, the buying decision changes fast.</p><h3>RTX 5090 vs RTX PRO 6000 vs H100</h3><p>Here is the practical comparison for local AI users.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NMdY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NMdY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png 424w, https://substackcdn.com/image/fetch/$s_!NMdY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png 848w, https://substackcdn.com/image/fetch/$s_!NMdY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png 1272w, https://substackcdn.com/image/fetch/$s_!NMdY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NMdY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png" width="1754" height="879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:879,&quot;width&quot;:1754,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1969025,&quot;alt&quot;:&quot;RTX 5090 for local AI: fast bandwidth, same VRAM wall&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/202193195?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3a05506-9c28-481a-b996-ebe245e636a0_1774x887.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5090 for local AI: fast bandwidth, same VRAM wall" title="RTX 5090 for local AI: fast bandwidth, same VRAM wall" srcset="https://substackcdn.com/image/fetch/$s_!NMdY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png 424w, https://substackcdn.com/image/fetch/$s_!NMdY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png 848w, https://substackcdn.com/image/fetch/$s_!NMdY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png 1272w, https://substackcdn.com/image/fetch/$s_!NMdY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65960b6-a0dd-4bd9-94ba-68534df9ee11_1754x879.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>NVIDIA&#8217;s <a href="https://www.nvidia.com/en-us/data-center/h100/">H100 product page</a> lists the H100 SXM at 80GB of memory, 3.35 TB/s of memory bandwidth, and NVLink bandwidth up to 900 GB/s. Lenovo&#8217;s <a href="https://lenovopress.lenovo.com/lp1732-thinksystem-nvidia-h100-pcie-gen5-gpu">H100 PCIe Gen5 product guide</a> lists the 80GB H100 PCIe adapter with HBM2e memory, 2 TB/s bandwidth, PCIe Gen 5 x16, NVLink bridge support, and up to seven MIG instances.</p><p>For the <a href="https://www.amazon.com/NVIDIA-RTX-6000-Blackwell-Server/dp/B0FXMY871V?tag=popularai-20">RTX PRO 6000 Blackwell Server Edition</a>, NVIDIA lists 24,064 CUDA cores, 96GB of GDDR7, FP4 Tensor Core performance of 4 PFLOPS, FP8 Tensor Core performance of 2 PFLOPS, FP16/BF16 Tensor Core performance of 1 PFLOP, and 1,597 GB/s memory bandwidth on the <a href="https://www.nvidia.com/en-us/data-center/rtx-pro-6000-blackwell-server-edition/">official RTX PRO 6000 Blackwell Server Edition page</a>. Lenovo&#8217;s <a href="https://lenovopress.lenovo.com/lp2263-thinksystem-nvidia-rtx-pro-6000-blackwell-server-edition-pcie-gen5-gpu">RTX PRO 6000 Blackwell server guide</a> also lists 96GB GDDR7 ECC, 24,064 CUDA cores, PCIe 5.0 x16, 600W power consumption, and no NVLink support.</p><p>That table tells the story. The RTX 5090 is a consumer Blackwell card with huge bandwidth for its class and a hard 32GB memory ceiling. The RTX PRO 6000 Blackwell is a larger memory tier with ECC, pro packaging, and a pro price. The H100 is a datacenter platform with HBM, NVLink, MIG, server validation, and deployment features that consumer cards do not try to replace.</p><h3>Why memory bandwidth matters for local LLM inference</h3><p>Autoregressive LLM inference has two very different phases.</p><p>The first phase is prefill. The model processes the prompt and builds the KV cache. Long prompts, RAG chunks, code repositories, documents, and chat history make prefill expensive.</p><p>The second phase is decode. The model generates one token at a time. For batch-1 local use, decode often spends much of its time moving model weights and cache data rather than doing pure arithmetic.</p><p>That is why bandwidth matters.</p><p>A GPU can have enormous theoretical compute and still feel underused if each token requires reading a large chunk of model data from VRAM. The RTX 5090&#8217;s 1,792 GB/s GDDR7 bandwidth is the most important improvement over the RTX 4090 for local LLM use because it helps the card feed the cores faster.</p><p>This is also why raw AI TOPS can mislead buyers. AI TOPS tells you something about peak low-precision math. It does not tell you whether your 70B model fits, whether the KV cache fits at 32K context, whether your inference runtime uses the newest Tensor Core formats, or whether your second GPU is stuck behind a weak PCIe slot.</p><p>For a local AI workstation, the priority order is clear. First, the card needs enough VRAM to fit the model, KV cache, and working memory. Then it needs enough memory bandwidth to generate tokens quickly. After that, software support, PCIe layout, cooling, and raw compute throughput decide how much performance you can actually unlock.</p><p>The RTX 5090 is excellent on bandwidth. It is strong on the software path when the stack supports Blackwell well. It is limited by capacity.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>The 32GB VRAM ceiling is the real product boundary</h3><p>NVIDIA gave consumer buyers a huge bandwidth jump with the RTX 5090, but the card stops at 32GB. The RTX PRO 6000 Blackwell jumps to 96GB. That gap is not a small upsell. It is a different class of local AI system.</p><p>A 32GB card can run many useful models. It cannot comfortably run every serious model.</p><p>The simplest way to think about model memory is weight size plus context plus runtime overhead. FP16 or BF16 needs about 2 bytes per parameter. FP8 needs about 1 byte per parameter. 4-bit quantization needs roughly 0.5 bytes per parameter before overhead. KV cache then grows with context length, layers, KV heads, head size, and precision. Temporary buffers, CUDA graphs, attention kernels, and runtime overhead also consume memory.</p><p>That means a model that looks as if it fits by weight size can still fail once context is included.</p><p>NVIDIA&#8217;s <a href="https://build.nvidia.com/rtx/multi-gpu">multi-GPU AI PC guide</a> gives useful real-world tiers. It says a 30B-class 4-bit LLM needs at least 24GB of VRAM, a 70B-class model needs 40GB or more, and a 120B-class model needs roughly 70GB once context is included.</p><p>Those numbers match what local users feel. The RTX 5090 moves high-end consumer PCs from the 24GB era into the 32GB era. That opens more 30B and 32B workflows, gives smaller models more room for long context, and reduces the need for painful CPU offload.</p><p>It still does not make 70B a comfortable single-GPU target.</p><h3>Practical model-size limits on RTX 5090</h3><p>Here is the practical way to size local LLMs on a 32GB RTX 5090.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bz0F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bz0F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png 424w, https://substackcdn.com/image/fetch/$s_!bz0F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png 848w, https://substackcdn.com/image/fetch/$s_!bz0F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png 1272w, https://substackcdn.com/image/fetch/$s_!bz0F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bz0F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png" width="1729" height="889" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:889,&quot;width&quot;:1729,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1984710,&quot;alt&quot;:&quot;RTX 5090 local AI guide: when 32GB VRAM is enough&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/202193195?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e159add-38fc-4359-a540-a54e17d00a48_1752x898.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5090 local AI guide: when 32GB VRAM is enough" title="RTX 5090 local AI guide: when 32GB VRAM is enough" srcset="https://substackcdn.com/image/fetch/$s_!bz0F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png 424w, https://substackcdn.com/image/fetch/$s_!bz0F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png 848w, https://substackcdn.com/image/fetch/$s_!bz0F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png 1272w, https://substackcdn.com/image/fetch/$s_!bz0F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7eb0147-d6c0-42a1-96f3-5f76f460f960_1729x889.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The sweet spot is obvious. The RTX 5090 is a monster for 8B through 32B-class local work. It is especially attractive for coding models, writing models, local assistants, document workflows, moderate RAG, and private automation where a smaller strong model beats a larger slow model.</p><p>It is the wrong card if the whole point of the build is running 70B models cleanly at large context.</p><p>A dual RTX 5090 build gives 64GB total VRAM, but that does not behave like one simple 64GB GPU. It depends on model splitting, runtime support, motherboard lane layout, PCIe bandwidth, and how often the GPUs need to communicate. For a deeper build comparison, our <a href="https://www.popularai.org/p/dual-gpu-ai-pc-builds-local-llm-2026">dual GPU local LLM build guide</a> explains why slot spacing, airflow, power, and PCIe lanes matter as much as the cards themselves.</p><div><hr></div><h4><em><strong>More on dual GPU AI hardware:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;493b0c39-b423-43c0-90a1-fc40d45c302d&quot;,&quot;caption&quot;:&quot;Running larger local language models at home in 2026 is easier than it was a year ago, but building the right machine has become a lot less forgiving. Software has improved. vLLM&#8217;s parallelism and scaling docs&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;These 3 dual GPU AI pc builds absolutely crush local LLMs in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-09T21:22:10.662Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZhPn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926cb61e-307e-4df5-ae0f-ed4930172adb_2400x1559.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/dual-gpu-ai-pc-builds-local-llm-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196145185,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>FP8 and FP4 help, but software decides the gain</h3><p>Blackwell&#8217;s fifth-generation Tensor Cores matter because they improve the low-precision path. NVIDIA markets the RTX 50 series around fifth-generation Tensor Cores and FP4 support on its <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/">GeForce RTX 50 series page</a>.</p><p>That is useful, but buyers should be careful.</p><p>Many local LLM users are not running pure FP4 or FP8 Tensor Core inference in the same way an NVIDIA demo or optimized enterprise stack does. A lot of local LLM work still runs through GGUF, EXL2, GPTQ, AWQ, bitsandbytes paths, llama.cpp, Ollama, LM Studio, vLLM, TensorRT-LLM, or custom kernels. Each stack has its own support curve.</p><p>The hardware may support a precision mode before your preferred local tool uses it well.</p><p>FP16 and BF16 are straightforward, but they are memory hungry. FP8 can make mid-size models more practical when the runtime supports it well. FP4 is promising for Blackwell, although it is different from the 4-bit quantized model files that many local users already run. GGUF Q4, Q5, and Q6 remain important because they are widely available and easy to run locally.</p><p>For RTX 5090 buyers, the best-case path should improve as Blackwell support matures. The practical warning is just as important. Do not assume every benchmark, model file, or local frontend will immediately hit the hardware&#8217;s fastest path.</p><h3>Where RTX 5090 helps image and video AI</h3><p>Local image generation cares about VRAM differently than LLMs.</p><p>The model weights matter, but so do resolution, batch size, ControlNet-style additions, LoRAs, upscalers, video frames, temporal modules, and the size of the ComfyUI graph. A workflow that fits at 1024px can fail at higher resolution. A video workflow can fail because it needs to hold too many frames or latents in memory at once.</p><p>The RTX 5090&#8217;s 32GB gives real breathing room compared with 16GB or 24GB cards.</p><p>That extra space helps with larger ComfyUI graphs, higher image resolutions, more LoRAs and conditioning modules loaded together, local video generation experiments, heavier upscaling, post-processing, and running image tools while other GPU tasks stay active.</p><p>For ComfyUI users, the RTX 5090 is a stronger premium consumer choice than the RTX 4090 because the extra 8GB matters. The RTX 4090 and <a href="https://www.amazon.com/RTX-3090/s?k=RTX+3090&amp;tag=popularai-20">used RTX 3090</a> remain relevant because price often matters more than theoretical performance. Popular AI&#8217;s <a href="https://www.popularai.org/p/rtx-3090-comfyui-performance-in-2026">RTX 3090 ComfyUI performance guide</a> is still useful if you are comparing high-VRAM value instead of buying the current flagship.</p><p>The RTX 5090 is the cleaner premium card. The used RTX 3090 is still the budget VRAM card. The RTX PRO 6000 is the workstation memory card.</p><div><hr></div><h4><em><strong>More on the RTX 3090 for local AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0012933e-b5bb-4eed-afca-f552e649b50c&quot;,&quot;caption&quot;:&quot;The RTX 3090 is still one of the most relevant local AI GPUs you can buy in 2026. That sounds strange for a card that launched in 2020, but ComfyUI users care about one thing more than marketing cycles: whether &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;RTX 3090 ComfyUI performance in 2026: is it still worth buying?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-14T14:04:41.689Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Pcq2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6af31000-a08a-4129-944f-2e588c81ff42_2340x1316.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/rtx-3090-comfyui-performance-in-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:193966808,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Why H100 still matters</h3><p><a href="https://www.amazon.com/VISION-COMPUTERS-INC-H100-NVL/dp/B0D3VPPN3K?tag=popularai-20">The H100</a> is expensive because it solves problems the RTX 5090 does not try to solve.</p><p>The first difference is HBM. H100 SXM offers much higher memory bandwidth than the RTX 5090. NVIDIA lists H100 SXM memory bandwidth at 3.35 TB/s on its <a href="https://www.nvidia.com/en-us/data-center/h100/">H100 specs page</a>, while Lenovo lists H100 PCIe 80GB at 2 TB/s in its <a href="https://lenovopress.lenovo.com/lp1732-thinksystem-nvidia-h100-pcie-gen5-gpu">ThinkSystem H100 product guide</a>.</p><p>The second difference is memory capacity. H100 PCIe and H100 SXM configurations commonly sit around 80GB, with H100 NVL variants oriented around larger LLM inference. That places H100 in a different model tier than a 32GB consumer card.</p><p>The third difference is interconnect. H100 platforms can use NVLink. Lenovo&#8217;s H100 PCIe guide lists NVLink support for the PCIe adapters and integrated NVLink for SXM boards. This matters for multi-GPU training and large inference workloads where GPUs must exchange activations, gradients, or tensor-parallel shards frequently.</p><p>The fourth difference is MIG. H100 can be split into multiple isolated GPU instances. That matters for cloud providers, labs, and enterprises serving multiple users or workloads. It is not a feature most home local AI users need.</p><p>The fifth difference is reliability and software packaging. ECC memory, server validation, enterprise support, and datacenter thermal design matter when the GPU is running production workloads around the clock. Those features are part of what H100 buyers are paying for.</p><p>That is why &#8220;RTX 5090 vs H100&#8221; is often the wrong framing. The RTX 5090 is the best consumer-owned local AI accelerator when the workload fits. The H100 is a datacenter accelerator for larger, shared, and more reliability-sensitive workloads.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/VISION-COMPUTERS-INC-H100-NVL/dp/B0D3VPPN3K?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KVx-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KVx-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KVx-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KVx-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KVx-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg" width="391" height="320.908611599297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:467,&quot;width&quot;:569,&quot;resizeWidth&quot;:391,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;RTX 5090 vs H100 vs RTX PRO 6000: the VRAM decision&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/VISION-COMPUTERS-INC-H100-NVL/dp/B0D3VPPN3K?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5090 vs H100 vs RTX PRO 6000: the VRAM decision" title="RTX 5090 vs H100 vs RTX PRO 6000: the VRAM decision" srcset="https://substackcdn.com/image/fetch/$s_!KVx-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KVx-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KVx-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KVx-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf216c7d-2be9-4a77-80ec-54a3ad76a1d7_569x467.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/VISION-COMPUTERS-INC-H100-NVL/dp/B0D3VPPN3K?tag=popularai-20&quot;,&quot;text&quot;:&quot;Check out Nvidia H100 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/VISION-COMPUTERS-INC-H100-NVL/dp/B0D3VPPN3K?tag=popularai-20"><span>Check out Nvidia H100 deals on Amazon</span></a></p><h3>RTX PRO 6000 Blackwell is the real upgrade path above RTX 5090</h3><p><a href="https://www.amazon.com/NVD-RTX-PRO-6000-Blackwell/dp/B0F7Y644FQ?tag=popularai-20">The RTX PRO 6000 Blackwell</a> is the more direct answer to the question, &#8220;What if the RTX 5090 had enough memory?&#8221;</p><p>It keeps the NVIDIA Blackwell software and CUDA ecosystem, but raises the memory ceiling to 96GB GDDR7 ECC. NVIDIA&#8217;s <a href="https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000-family/">RTX PRO 6000 Blackwell family page</a> positions the series for AI, scientific computing, rendering, 3D graphics, and video workloads.</p><p>For local AI, 96GB changes the decision.</p><p>With 96GB, you can target 70B-class models with more comfortable context, 32B-class models at higher precision, bigger multimodal models, larger ComfyUI and video workflows, local serving with more headroom, and fine-tuning workflows that choke on 32GB. You also get more room to run multiple GPU-heavy applications without constantly unloading models.</p><p>The RTX PRO 6000 does not automatically beat H100 in datacenter serving. H100 still has HBM bandwidth and NVLink advantages. For a workstation owner who wants one GPU with a large local memory pool, though, the RTX PRO 6000 Blackwell is the cleanest NVIDIA answer below datacenter hardware.</p><p>The cost, however, is brutal. Tom&#8217;s Hardware reported on June 13, 2026 that RTX PRO 6000 Blackwell pricing had climbed sharply, with NVIDIA marketplace pricing listed at $13,250 and major retailers varying around that level in its <a href="https://www.tomshardware.com/pc-components/gpus/nvidia-raises-rtx-pro-6000-blackwell-gpu-pricing-to-usd13-250-55-percent-increase-over-msrp-in-a-years-time">RTX PRO 6000 Blackwell pricing report</a>.</p><p>That makes it a business purchase for most people, not an enthusiast splurge.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/NVD-RTX-PRO-6000-Blackwell/dp/B0F7Y644FQ?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HuSc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HuSc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HuSc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HuSc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HuSc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg" width="411" height="377.55310344827586" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:666,&quot;width&quot;:725,&quot;resizeWidth&quot;:411,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;RTX 5090 for local AI: fast bandwidth, same VRAM wall&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/NVD-RTX-PRO-6000-Blackwell/dp/B0F7Y644FQ?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5090 for local AI: fast bandwidth, same VRAM wall" title="RTX 5090 for local AI: fast bandwidth, same VRAM wall" srcset="https://substackcdn.com/image/fetch/$s_!HuSc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HuSc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HuSc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HuSc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70c98367-76ad-48c9-92e1-c15a751b7e36_725x666.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/NVD-RTX-PRO-6000-Blackwell/dp/B0F7Y644FQ?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX PRO 6000 Blackwell Amazon deals&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/NVD-RTX-PRO-6000-Blackwell/dp/B0F7Y644FQ?tag=popularai-20"><span>Find RTX PRO 6000 Blackwell Amazon deals</span></a></p><h3>Where PCIe Gen 5 helps local AI</h3><p>PCIe Gen 5 matters, but not in the way many buyers expect.</p><p>Once a model is loaded into VRAM and running on one GPU, PCIe bandwidth is often not the main bottleneck. The GPU is mostly working inside its own memory system. A single RTX 5090 does not become twice as fast simply because it sits in a PCIe Gen 5 slot instead of a strong PCIe Gen 4 slot.</p><p>PCIe Gen 5 helps when data has to move across the bus. That includes model loading from system memory or storage into VRAM, CPU offload, multi-GPU model splitting, GPU-to-GPU communication through PCIe, large RAG pipelines moving embeddings or cache data, multi-GPU prefill and decode experiments, and workstations with heavy NVMe, capture, and GPU traffic happening at the same time.</p><p>NVIDIA&#8217;s <a href="https://build.nvidia.com/rtx/multi-gpu">multi-GPU AI PC guide</a> recommends x8/x8 PCIe Gen 5 as a good dual-GPU target and x16/x16 PCIe Gen 5 as the best option. The same guide warns that consumer boards can hide lane-sharing problems where M.2 drives reduce GPU slot bandwidth.</p><p>That is the practical point. PCIe Gen 5 helps multi-GPU local AI builds avoid obvious bottlenecks. It does not make two RTX 5090s behave like an H100 NVLink system.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KAfE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KAfE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!KAfE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!KAfE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!KAfE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KAfE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2324395,&quot;alt&quot;:&quot;RTX 5090 local AI guide: when 32GB VRAM is enough&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/202193195?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5090 local AI guide: when 32GB VRAM is enough" title="RTX 5090 local AI guide: when 32GB VRAM is enough" srcset="https://substackcdn.com/image/fetch/$s_!KAfE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!KAfE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!KAfE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!KAfE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc23ec7c0-cdaa-460f-aacc-bf38597f1b2a_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The RTX 5090 is a huge local AI upgrade, but 32GB VRAM still sets the limit. Here is how it compares with RTX PRO 6000 and H100. &#169; Popular AI</figcaption></figure></div><h3>Where PCIe Gen 5 can disappoint</h3><p>PCIe Gen 5 can also mislead buyers.</p><p>The danger is building a dual RTX 5090 machine and assuming 64GB total VRAM solves large-model inference cleanly. It may work for some models and runtimes. It may disappoint if the model needs frequent cross-GPU communication.</p><p>A 70B model split across two RTX 5090s can be useful for private experimentation. It is still different from a single 80GB H100 or 96GB RTX PRO 6000. Every split adds scheduling, communication, and software complexity.</p><p>The motherboard matters more than the marketing page. Many consumer boards give you one true x16 slot and a second slot wired through the chipset at x4 or worse. That can be fine for a capture card. It is a poor match for two flagship GPUs moving AI workloads.</p><p>Cooling also becomes a serious constraint. Two RTX 5090 cards can overwhelm a normal case, PSU, and room. NVIDIA&#8217;s own multi-GPU guide suggests 1600W to 1800W-class power supplies depending on the build tier. That is before noise, heat, connector clearance, and card spacing enter the picture.</p><p>For readers weighing dual RTX 5090s against used multi-GPU setups, Popular AI&#8217;s guide to <a href="https://www.popularai.org/p/4x-8x-rtx-3090-server-local-ai-2026">4x and 8x RTX 3090 local AI servers</a> is a useful comparison. More GPUs can give more total VRAM, but the build complexity rises fast.</p><div><hr></div><h4><em><strong>More on multi-GPU local AI machines:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;51fd534d-a83f-4fc5-9a2c-2dcf95af2f7b&quot;,&quot;caption&quot;:&quot;A 4x RTX 3090 server can still be worth building for local AI in 2026, but only for the right buyer. Four cards give you 96GB of total GPU memory, mature CUDA support, and enough&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;4x or 8x RTX 3090 local AI servers: still worth building in 2026?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-06-14T22:50:56.313Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!u-KG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/4x-8x-rtx-3090-server-local-ai-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:201898156,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>The buying decision</h3><p>Choose the RTX 5090 if you want the best consumer GPU for local AI and your target workloads fit inside 32GB.</p><p>This is the right choice for one-person local AI workstations, fast coding agents, private document tools, ComfyUI, local image workflows, smaller model serving, and serious experimentation. It gives you owned compute, CUDA support, high memory bandwidth, and a real jump over 24GB consumer cards.</p><p>Choose the RTX PRO 6000 Blackwell if you need one local workstation GPU with a large memory pool.</p><p>This is the right choice when 32GB is the problem and 96GB solves it. It is especially relevant for 70B-class models, heavier multimodal work, AI video, professional graphics plus AI, and workstation deployments where ECC matters.</p><p>Choose H100 if you need datacenter behavior.</p><p>This is the right choice for high-concurrency serving, training, multi-GPU scaling, MIG, NVLink-heavy workloads, enterprise support, and infrastructure where downtime costs more than the GPU.</p><p>Skip all three if you are just starting local AI and do not know your workload yet.</p><p>A used RTX 3090, RTX 4090, <a href="https://www.amazon.com/ASUS-DisplayPort-2-5-Slot-Axial-tech-Technology/dp/B0F7WB6LSH?tag=popularai-20">RTX 5060 Ti 16GB</a>, or cheaper 24GB card may teach you more per dollar. Ours <a href="https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026">budget local LLM PC guide</a> remains a better starting point if the goal is learning rather than buying the fastest consumer card on day one.</p><div><hr></div><h4><em><strong>More on building a local AI PC on a budget:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f972a94f-8a18-40a3-af4a-77f1f0d768a4&quot;,&quot;caption&quot;:&quot;The best first local LLM PC build in 2026 is still refreshingly simple: buy a used RTX 3090 with 24GB of VRAM, pair it with 64GB of system RAM, and run the machine on one clean Linux install.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best budget local AI PC in 2026 starts with a used RTX 3090&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-23T18:41:00.962Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mNHY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a94f7b-e1fc-49d6-8df5-2afc01d93a4d_2400x1437.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191894407,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>The real lesson</h3><p>The RTX 5090 moves local AI forward because it gives consumer users a bandwidth tier that used to feel far away. For models that fit, that matters. Tokens generate faster. Image workflows breathe better. FP8 and future Blackwell-optimized paths become more practical. High-end local AI feels less compromised.</p><p>The 32GB ceiling is still the control point.</p><p>NVIDIA has made the consumer card much faster while keeping the large-memory tier in professional and datacenter products. That creates a clear buying boundary.</p><div><hr></div><p><em>Disclosure: This post includes Amazon affiliate links. If you buy through them, Popular AI may earn a small commission at no extra cost to you.</em></p><div class="callout-block" data-callout="true"><h3>Conclusion</h3><p><a href="https://www.amazon.com/ASUS-Graphics-3-8-Slot-Axial-tech-Phase-Change/dp/B0DS2WQZ2M?tag=popularai-20">The RTX 5090</a> is for fast local work inside the 32GB range. <a href="https://www.amazon.com/NVD-RTX-PRO-6000-Blackwell/dp/B0F7Y644FQ?tag=popularai-20">The RTX PRO 6000 Blackwell</a> is for large local work inside 96GB. <a href="https://www.amazon.com/VISION-COMPUTERS-INC-H100-NVL/dp/B0D3VPPN3K?tag=popularai-20">The H100</a> is for datacenter work where memory, bandwidth, interconnect, reliability, and serving features matter together.</p><p>That is the decision readers should make before spending thousands of dollars. Do not buy a GPU because its AI TOPS number looks huge. Buy the memory tier your workload actually needs, then care about bandwidth, then care about the software path.</p><p>If your models fit, the RTX 5090 is the new consumer king for local AI.</p><p>If they do not fit, the RTX 5090 is a very fast way to hit the same wall.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/rtx-5090-local-ai-memory-bandwidth-vram/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/rtx-5090-local-ai-memory-bandwidth-vram/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Are dual RTX 3090s still worth buying for local AI in 2026?]]></title><description><![CDATA[Thinking about swapping an RTX 4080 for dual RTX 3090s? Here is when 48GB VRAM matters for local AI, and when it is not worth it.]]></description><link>https://www.popularai.org/p/dual-rtx-3090-local-ai-2026</link><guid isPermaLink="false">https://www.popularai.org/p/dual-rtx-3090-local-ai-2026</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Mon, 15 Jun 2026 14:00:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8zqD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8zqD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8zqD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!8zqD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!8zqD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!8zqD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8zqD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2041960,&quot;alt&quot;:&quot;Dual RTX 3090s vs RTX 4080 for local LLMs: what to buy&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/202106960?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Dual RTX 3090s vs RTX 4080 for local LLMs: what to buy" title="Dual RTX 3090s vs RTX 4080 for local LLMs: what to buy" srcset="https://substackcdn.com/image/fetch/$s_!8zqD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!8zqD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!8zqD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!8zqD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaf2280-0e1f-4c68-bdcd-bb27aa19fb91_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Dual RTX 3090s still offer 48GB VRAM for local LLMs, but power, heat, and setup friction make them a smart buy only for some users. &#169; Popular AI</figcaption></figure></div><p>A dual RTX 3090 setup is still worth buying for local AI in 2026 if your main goal is running larger local LLMs and you can get the cards cheaply. The reason is simple: two RTX 3090 cards give you 48GB of total VRAM, while an RTX 4080 gives you 16GB. That changes what models you can run locally.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/dual-rtx-3090-local-ai-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/dual-rtx-3090-local-ai-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>The catch is that dual RTX 3090s are not a clean upgrade for everyone. They draw a lot of power, need serious airflow, depend on multi-GPU software support, and do not behave like one simple 48GB GPU.</p><div><hr></div><h4><em><strong>More on dual GPU local AI builds:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a6f8eb4e-ead0-468a-9a56-0b8b01ec651f&quot;,&quot;caption&quot;:&quot;Running larger local language models at home in 2026 is easier than it was a year ago, but building the right machine has become a lot less forgiving. Software has improved. vLLM&#8217;s parallelism and scaling docs&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;These 3 dual GPU AI pc builds absolutely crush local LLMs in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-09T21:22:10.662Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZhPn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926cb61e-307e-4df5-ae0f-ed4930172adb_2400x1559.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/dual-gpu-ai-pc-builds-local-llm-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196145185,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Quick verdict</h3><blockquote><p><strong>Best answer:</strong> Buy <a href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20">dual RTX 3090s</a> if your main workload is local LLM inference and you specifically want to run 30B, 32B, and some 70B-class quantized models.</p></blockquote><blockquote><p><strong>Best reason to keep the <a href="https://www.amazon.com/NVIDIA-GeForce-4080-GDDR6X-Graphics/dp/B0BMZ9TGH1?tag=popularai-20">RTX 4080</a>:</strong> Keep it if you care more about efficiency, gaming, AV1 encoding, image generation, video workflows, and a clean single-GPU setup. NVIDIA lists the RTX 4080 with <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4080-family/">16GB of GDDR6X, 780 AI TOPS, 320W total graphics power, and no NVLink support</a>.</p></blockquote><blockquote><p><strong>Best reason to buy one <a href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20">RTX 3090</a> first:</strong> A single RTX 3090 already gives you 24GB of VRAM, which is a much better local LLM tier than 16GB. NVIDIA lists the RTX 3090 with <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/">24GB of GDDR6X, a 384-bit memory interface, NVLink support, and 350W graphics card power</a>.</p></blockquote><blockquote><p><strong>Best reason to skip dual 3090s:</strong> Skip them if you want a quiet, simple, efficient machine. Two RTX 3090 cards are 700W of GPU power before the CPU, motherboard, drives, fans, and transient spikes are considered. NVIDIA lists the RTX 3090 at <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/">350W graphics card power</a>.</p></blockquote><blockquote><p><strong>Best alternative if money is less tight:</strong> An <a href="https://www.amazon.com/ASUS-Graphics-Military-Grade-Components-Protective/dp/B0DS2X13PH?tag=popularai-20">RTX 5090</a> gives you 32GB of GDDR7 on a single card, newer Blackwell Tensor Cores, and 575W total graphics power, but it still does not give you the 48GB VRAM pool that two 3090s can provide for LLM inference. NVIDIA lists those <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/">RTX 5090 specifications on its official product page</a>.</p></blockquote><div><hr></div><h3>Why local AI builders still care about dual RTX 3090s</h3><p>A recent <a href="https://www.reddit.com/r/LocalLLaMA/comments/1pvacv8/thoughts_on_picking_up_dual_rtx_3090s_at_this/">r/LocalLLaMA discussion about buying dual RTX 3090s</a> framed the problem well. The user had an RTX 4080, was testing Qwen 3 14B and Llama-style models, wanted better Korean performance from larger models, and was considering spending roughly $800 net by swapping the 4080 for two used RTX 3090s.</p><p>That is the real buying question. The RTX 4080 is newer and more efficient, but an upper limit of 16GB of VRAM blocks access to many larger local LLM workflows. Two RTX 3090s are older and messier, but they move the machine into a 48GB total VRAM class.</p><p>For a broader build path, Popular AI&#8217;s guide to a <a href="https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026">budget local AI PC built around a used RTX 3090</a> reaches the same basic premise. Cheap 24GB CUDA cards still matter because model fit often matters more than GPU generation for local LLM users.</p><div><hr></div><h4><em><strong>More on RTX 3090 local AI computers:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;645d33f8-40ad-4cf3-ac7c-9bb5ef1c7f11&quot;,&quot;caption&quot;:&quot;The best first local LLM PC build in 2026 is still refreshingly simple: buy a used RTX 3090 with 24GB of VRAM, pair it with 64GB of system RAM, and run the machine on one clean Linux install.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best budget local AI PC in 2026 starts with a used RTX 3090&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-23T18:41:00.962Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mNHY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a94f7b-e1fc-49d6-8df5-2afc01d93a4d_2400x1437.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191894407,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Who should buy dual RTX 3090s</h3><p>Buy <a href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20">dual RTX 3090s</a> if most of these are true:</p><ul><li><p>You mainly care about local LLM inference.<br></p></li><li><p>You want to run larger quantized models locally instead of paying for API access.<br></p></li><li><p>You are comfortable with Linux, vLLM, llama.cpp, ExLlamaV2, or similar local inference stacks.<br></p></li><li><p>Your motherboard can physically and electrically handle two large GPUs.<br></p></li><li><p>You can handle a high-power, high-heat build.<br></p></li><li><p>You are buying used cards at a price that leaves room for maintenance.<br></p></li><li><p>You value local privacy and account independence enough to accept the hardware friction.<br></p></li></ul><p>This is a strong setup for private chat, local coding assistants, multilingual model testing, local RAG, private document work, and experimenting with 70B-class models.</p><p>It is <strong>not </strong>the best setup for every local AI workload. For ComfyUI, FLUX, SDXL, video generation, and LoRA image training, a faster single 24GB card can be easier to live with. Popular AI&#8217;s <a href="https://www.popularai.org/p/rtx-3090-comfyui-performance-in-2026">RTX 3090 ComfyUI performance guide</a> is still relevant here because image workflows often care about single-GPU VRAM, raw speed, and workflow compatibility more than split multi-GPU memory.</p><div><hr></div><h4><em><strong>More on RTX 3090 performance:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;385af069-816c-451b-8679-7fa3931cfa78&quot;,&quot;caption&quot;:&quot;The RTX 3090 is still one of the most relevant local AI GPUs you can buy in 2026. That sounds strange for a card that launched in 2020, but ComfyUI users care about one thing more than marketing cycles: whether &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;RTX 3090 ComfyUI performance in 2026: is it still worth buying?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-14T14:04:41.689Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Pcq2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6af31000-a08a-4129-944f-2e588c81ff42_2340x1316.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/rtx-3090-comfyui-performance-in-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:193966808,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>VRAM matters first</h3><p>For local LLMs, the first question is usually not which GPU is newer. The first question is whether the model fits.</p><p>A 70B GGUF model can be large even after quantization. One common Meta-Llama-3-70B-Instruct Q4_K_M GGUF is listed at <a href="https://huggingface.co/bartowski/Meta-Llama-3-70B-Instruct-GGUF">42.52GB on Hugging Face</a>, and the model page advises choosing a file 1GB to 2GB smaller than your available RAM or VRAM if you want the model to run as fast as possible in GPU memory.</p><p>That means 16GB is in a different buying tier than 48GB. An <a href="https://www.amazon.com/NVIDIA-GeForce-4080-GDDR6X-Graphics/dp/B0BMZ9TGH1?tag=popularai-20">RTX 4080</a> can run useful local models, but a pair of <a href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20">RTX 3090</a> cards opens the door to a whole different class of local AI experiments.</p><p>A 32B model is a better example of why a single 3090 is already useful. A <a href="https://huggingface.co/bartowski/Qwen_Qwen3-32B-GGUF">Qwen3-32B Q4_K_M GGUF is listed at 19.76GB</a>, which is awkward for a 16GB card but comfortable on a 24GB RTX 3090 with reasonable context settings.</p><h3>Multi-GPU support matters second</h3><p>Two RTX 3090s do not magically become one clean 48GB GPU. CUDA&#8217;s memory model uses a unified virtual address space, but NVIDIA&#8217;s documentation still describes <a href="https://docs.nvidia.com/cuda/cuda-programming-guide/02-basics/understanding-memory.html">CPU memory and each GPU&#8217;s memory as distinct ranges inside that address space</a>.</p><p>The practical result of this is simple. Your software must know how to split the model or workload.</p><p><a href="https://docs.vllm.ai/en/v0.10.2/serving/parallelism_scaling.html">vLLM recommends tensor parallelism</a> when a model is too large for one GPU but fits on a single multi-GPU node. <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/multi-gpu.md">llama.cpp also has official multi-GPU documentation</a> covering split modes and command-line flags for running across more than one GPU. <a href="https://huggingface.co/docs/accelerate/en/usage_guides/big_modeling">Hugging Face Accelerate can dispatch model layers across available devices</a>, filling GPUs first, then CPU, then disk when needed.</p><p>That is good news. It means dual RTX 3090s are usable. It also means beginners should not expect every app to treat the setup as one painless card.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Power and cooling matter more than people expect</h3><p>One RTX 3090 Founders Edition is a <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/">350W card with a 750W recommended system power rating</a> in NVIDIA&#8217;s reference guidance. Two of them put the GPUs alone at 700W, and many used partner cards are physically large, hot, and old enough to need pad or paste maintenance.</p><p>This is where dual 3090 builds stop being a casual upgrade.</p><p>You need:</p><ul><li><p>A quality <a href="https://www.amazon.com/Corsair-RM1200x-Shift-Modular-Supply/dp/B0BP88MYM4?tag=popularai-20">1200W PSU</a> or <a href="https://www.amazon.com/Seasonic-ATX-3-0-TX-1600-SSR-1600TR2/dp/B0C571LRNB?tag=popularai-20">1600W PSU</a>, depending on CPU and card model.<br></p></li><li><p>Enough PCIe power cables, not splitter spaghetti.<br></p></li><li><p>A motherboard with usable slot spacing.<br></p></li><li><p>A case with serious airflow, or an open-frame setup.<br></p></li><li><p>A plan for noise.<br></p></li><li><p>Enough room around both cards for power connectors and heat.<br></p></li></ul><p>A Reddit user in the same <a href="https://www.reddit.com/r/LocalLLaMA/comments/1pvacv8/thoughts_on_picking_up_dual_rtx_3090s_at_this/">dual RTX 3090 discussion</a> reported that a dual 3090 setup was capable but loud, costly after fees and thermal maintenance, and only worth it under the right circumstances. That is community experience rather than a lab benchmark, but it matches the obvious power and cooling math.</p><h3>Dual RTX 3090s versus RTX 4080 for local AI</h3><p>The RTX 4080 is the cleaner card. It is newer, more efficient, and better for a lot of non-LLM work. NVIDIA lists the <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4080-family/">RTX 4080 with 16GB GDDR6X, 9728 CUDA cores, 780 AI TOPS, 320W total graphics power, and no NVLink</a>.</p><p>The RTX 3090 is the better local LLM memory card. NVIDIA lists the <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/">RTX 3090 with 24GB GDDR6X, a 384-bit memory interface, PCIe Gen 4, NVLink support, and 350W graphics card power</a>.</p><p>Keep the <a href="https://www.amazon.com/NVIDIA-GeForce-4080-GDDR6X-Graphics/dp/B0BMZ9TGH1?tag=popularai-20">RTX 4080</a> if you want lower power draw, a simpler single-GPU system, better gaming and creator features, strong image generation performance within 16GB, less used-market risk, and fewer driver, heat, and motherboard headaches.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/NVIDIA-GeForce-4080-GDDR6X-Graphics/dp/B0BMZ9TGH1?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M4AF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg 424w, https://substackcdn.com/image/fetch/$s_!M4AF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg 848w, https://substackcdn.com/image/fetch/$s_!M4AF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!M4AF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M4AF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg" width="458" height="363.63186813186815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1156,&quot;width&quot;:1456,&quot;resizeWidth&quot;:458,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/NVIDIA-GeForce-4080-GDDR6X-Graphics/dp/B0BMZ9TGH1?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M4AF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg 424w, https://substackcdn.com/image/fetch/$s_!M4AF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg 848w, https://substackcdn.com/image/fetch/$s_!M4AF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!M4AF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae2ec779-bfa6-466a-857b-107cc2c50e10_1500x1191.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/NVIDIA-GeForce-4080-GDDR6X-Graphics/dp/B0BMZ9TGH1?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 4080 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/NVIDIA-GeForce-4080-GDDR6X-Graphics/dp/B0BMZ9TGH1?tag=popularai-20"><span>Find RTX 4080 deals on Amazon</span></a></p><p>Move to <a href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20">dual RTX 3090s</a> if you want 48GB total VRAM for local LLM inference, more flexibility for 30B and 70B quantized models, better local model experimentation without API fees, a CUDA-friendly home lab, and a stronger privacy and account-risk fallback.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sj6m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Sj6m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Sj6m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Sj6m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sj6m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg" width="474" height="347.3612637362637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1067,&quot;width&quot;:1456,&quot;resizeWidth&quot;:474,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Sj6m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Sj6m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Sj6m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Sj6m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6105b243-8dbf-4d84-a65d-1a843783ef26_1498x1098.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 3090 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20"><span>Find RTX 3090 deals on Amazon</span></a></p><p>The decision comes down to a clean 16GB single-GPU setup versus a messier 48GB local LLM box.</p><h3>Are dual RTX 3090s good for 70B models?</h3><p>Yes, with caveats.</p><p>A 70B Q4 model can fit across two 24GB cards, but the margin is not huge once you add context, KV cache, server overhead, and the specific runtime&#8217;s memory behavior. A commenter in the <a href="https://www.reddit.com/r/LocalLLaMA/comments/1pvacv8/thoughts_on_picking_up_dual_rtx_3090s_at_this/">Reddit thread about dual RTX 3090s</a> put it bluntly. 70B can &#8220;just barely fit&#8221; on two RTX 3090s including context.</p><p>The model file size backs up the caution. A <a href="https://huggingface.co/bartowski/Meta-Llama-3-70B-Instruct-GGUF">Llama 3 70B Q4_K_M GGUF is 42.52GB</a>. That leaves several gigabytes of headroom across 48GB, but not enough to ignore context length and runtime settings.</p><p>A practical 2x3090 local LLM setup should treat 70B as possible and useful, with tuning required.</p><p>Use 70B when you need better reasoning or language capability than 14B to 32B models give you, you are willing to tune quantization, context, batch size, and runtime, you can tolerate slower startup and more setup friction, and you are running inference rather than expecting easy full fine-tuning.</p><p>Use 32B-class models when you want a smoother daily driver, need lower latency, want more context headroom, or are still learning local inference.</p><p>For multilingual users, larger Qwen models are especially relevant. Qwen says <a href="https://qwenlm.github.io/blog/qwen3/">Qwen3 supports 119 languages and dialects</a> and recommends local usage through tools such as Ollama, LM Studio, MLX, llama.cpp, and KTransformers. That does not prove every Qwen model will satisfy every Korean workflow, but it does explain why a user unhappy with 14B multilingual output would look toward 30B or larger models.</p><h3>Are dual RTX 3090s good for fine-tuning and training?</h3><p>They can be useful, but this is where expectations need to be controlled.</p><p>For beginner LoRA and QLoRA experiments on smaller models, a single <em>RTX 3090</em> is already a strong learning platform. For larger models, dual 3090s give you more room, but training workflows are less forgiving than inference. You may need distributed training, sharding, gradient checkpointing, quantized training, or framework-specific support. Some workflows replicate the model on each GPU instead of combining memory in the way a beginner expects.</p><p>The buying takeaway is simple. Buy dual 3090s for local LLM inference first. Treat fine-tuning and training as bonus capability unless you already know the software stack you plan to use.</p><h3>Are dual RTX 3090s future-proof?</h3><p>No GPU purchase is future-proof. Dual RTX 3090s are still a good local AI value because VRAM remains scarce, and because used 24GB CUDA cards still fill a specific gap.</p><p>The RTX 3090 lacks newer features found on Ada and Blackwell cards. The <a href="https://www.amazon.com/VIPERA-GeForce-Founders-Graphic-Renewed/dp/B0CPB3P36K?tag=popularai-20">RTX 4090</a> has <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4090/">4th-generation Tensor Cores, 24GB VRAM, 450W total graphics power, and no NVLink</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/VIPERA-GeForce-Founders-Graphic-Renewed/dp/B0CPB3P36K?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sxpO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sxpO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sxpO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sxpO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sxpO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg" width="492" height="390.8502994011976" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:796,&quot;width&quot;:1002,&quot;resizeWidth&quot;:492,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/VIPERA-GeForce-Founders-Graphic-Renewed/dp/B0CPB3P36K?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sxpO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sxpO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sxpO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sxpO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F358b820c-e3e9-4496-93ba-cf337a4148bd_1002x796.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/VIPERA-GeForce-Founders-Graphic-Renewed/dp/B0CPB3P36K?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 4090 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/VIPERA-GeForce-Founders-Graphic-Renewed/dp/B0CPB3P36K?tag=popularai-20"><span>Find RTX 4090 deals on Amazon</span></a></p><p>The <a href="https://www.amazon.com/ASUS-Graphics-Military-Grade-Components-Protective/dp/B0DS2X13PH?tag=popularai-20">RTX 5090</a> moves to <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/">32GB GDDR7, Blackwell, 5th-generation Tensor Cores, PCIe Gen 5, and 575W total graphics power</a>, but NVIDIA still lists NVLink support as &#8220;No&#8221; on the RTX 5090 page.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/ASUS-Graphics-Military-Grade-Components-Protective/dp/B0DS2X13PH?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UhI6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UhI6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UhI6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UhI6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UhI6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg" width="448" height="353.2307692307692" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1148,&quot;width&quot;:1456,&quot;resizeWidth&quot;:448,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/ASUS-Graphics-Military-Grade-Components-Protective/dp/B0DS2X13PH?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UhI6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UhI6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UhI6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UhI6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5025030-d08e-4019-b195-e049ac6cc7f3_1500x1183.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/ASUS-Graphics-Military-Grade-Components-Protective/dp/B0DS2X13PH?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 5090 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/ASUS-Graphics-Military-Grade-Components-Protective/dp/B0DS2X13PH?tag=popularai-20"><span>Find RTX 5090 deals on Amazon</span></a></p><p>That means the RTX 3090 is aging in two different ways.</p><p>It is aging badly if you care about newest tensor formats, power efficiency, and single-GPU speed.</p><p>It is aging well if you care about cheap CUDA VRAM.</p><p>The used market confirms why this card stays in the conversation. BestValueGPU&#8217;s U.S. tracker showed a <a href="https://bestvaluegpu.com/history/new-and-used-rtx-3090-price-history-and-specs/">used RTX 3090 around $1050 on eBay as of June 15, 2026</a>, with a current Amazon price around $1488. That does not mean every used card is worth $1050. It means the 3090 is still priced like local AI buyers care about 24GB VRAM.</p><p>At a clean, tested, local price near $700 to $850 per card, dual 3090s can still make sense. At $1000 or more per used card, the value case gets weaker unless 48GB total VRAM is exactly what you need.</p><h3>What about unified memory systems?</h3><p>Unified memory is the main reason some buyers hesitate. The fear is reasonable. A machine with 96GB, 128GB, or more unified memory can load models that do not fit into ordinary GPU VRAM.</p><p>But unified memory does not automatically beat dual RTX 3090s for local LLM use.</p><p>AMD&#8217;s <a href="https://www.amazon.com/GMKtec-ryzen_ai_mini_pc_evo_x2/dp/B0F53MLYQ6?tag=popularai-20">Ryzen AI Max+ 395</a> supports <a href="https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-max-plus-395.html">up to 128GB of LPDDR5x-8000 memory on a 256-bit memory interface, with Radeon 8060S graphics and 40 graphics cores</a>. That memory capacity is attractive, but the theoretical bandwidth from 256-bit LPDDR5x-8000 is about 256GB/s. A single RTX 3090 has 936GB/s memory bandwidth according to the referenced <a href="https://www.nvidia.com/fr-be/geforce/graphics-cards/30-series/rtx-3090-3090ti/">2026 GPU spec sheet</a>, and two cards provide much more aggregate bandwidth when the workload can use both GPUs well.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/GMKtec-ryzen_ai_mini_pc_evo_x2/dp/B0F53MLYQ6?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kWoi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kWoi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kWoi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kWoi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kWoi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg" width="414" height="354.85714285714283" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1248,&quot;width&quot;:1456,&quot;resizeWidth&quot;:414,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/GMKtec-ryzen_ai_mini_pc_evo_x2/dp/B0F53MLYQ6?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kWoi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kWoi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kWoi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kWoi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ece9bd-c639-4941-8e87-db9534458ced_1500x1286.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/GMKtec-ryzen_ai_mini_pc_evo_x2/dp/B0F53MLYQ6?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find Ryzen AI Max+ 395 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/GMKtec-ryzen_ai_mini_pc_evo_x2/dp/B0F53MLYQ6?tag=popularai-20"><span>Find Ryzen AI Max+ 395 deals on Amazon</span></a></p><p>Apple&#8217;s <a href="https://www.amazon.com/-/es/Apple-Studio-n%C3%BAcleos-memoria-unificada/dp/B0FRNJDKTG?tag=popularai-20">Mac Studio with M3 Ultra</a> is a stronger unified-memory example. Apple lists <a href="https://www.apple.com/mac-studio/specs/">819GB/s memory bandwidth for M3 Ultra and a 96GB unified memory starting configuration</a> for that model. That can be excellent for very large models that simply need to load. It is also a different ecosystem with different software tradeoffs, less CUDA compatibility, and usually a higher total platform cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/-/es/Apple-Studio-n%C3%BAcleos-memoria-unificada/dp/B0FRNJDKTG?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l8NS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg 424w, https://substackcdn.com/image/fetch/$s_!l8NS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg 848w, https://substackcdn.com/image/fetch/$s_!l8NS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!l8NS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l8NS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg" width="1398" height="454" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:454,&quot;width&quot;:1398,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29294,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/-/es/Apple-Studio-n%C3%BAcleos-memoria-unificada/dp/B0FRNJDKTG?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l8NS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg 424w, https://substackcdn.com/image/fetch/$s_!l8NS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg 848w, https://substackcdn.com/image/fetch/$s_!l8NS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!l8NS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F071d5134-a91a-4949-b3af-aad3d08015af_1398x454.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/-/es/Apple-Studio-n%C3%BAcleos-memoria-unificada/dp/B0FRNJDKTG?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find Mac Studio M3 Ultra deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/-/es/Apple-Studio-n%C3%BAcleos-memoria-unificada/dp/B0FRNJDKTG?tag=popularai-20"><span>Find Mac Studio M3 Ultra deals on Amazon</span></a></p><p>Unified memory is best when you need one large addressable memory pool and can accept lower or different acceleration characteristics.</p><p>Dual RTX 3090s are best when you want CUDA, high bandwidth, strong local LLM inference, and the best used price per gigabyte of GPU memory.</p><h3>How to avoid buying the wrong dual 3090 setup</h3><p>Before buying two cards, check the build around them as carefully as you check the GPUs.</p><p>Start with the exact card models. Avoid assuming every RTX 3090 is physically manageable. Some are huge triple-slot cards. Some blower cards are loud. Some mining cards have tired fans or memory pads.</p><p>Ask for the exact model name, photos of the card and ports, proof of working under load, a GPU-Z screenshot, temperature screenshots under stress, whether the card has been repadded or repasted, whether the original BIOS is installed, and whether warranty remains.</p><p>Then confirm your motherboard layout. You want enough spacing for airflow and ideally useful PCIe bandwidth to both cards. A dual 3090 LLM setup can work without a perfect workstation board, but bad spacing can turn the build into a heat problem.</p><p>The <a href="https://www.reddit.com/r/LocalLLaMA/comments/1pvacv8/thoughts_on_picking_up_dual_rtx_3090s_at_this/">same Reddit discussion</a> surfaced the right practical warning. Slot spacing and motherboard bifurcation matter if you want a clean two-card build.</p><p>Buy the PSU before the second GPU. A single RTX 3090 already has a <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/">750W system power recommendation</a> in NVIDIA&#8217;s reference guidance. Two cards, a modern CPU, drives, fans, and transient load spikes push this into serious PSU territory.</p><p>For a dual 3090 build, a quality <a href="https://www.amazon.com/Corsair-RM1200x-Shift-Modular-Supply/dp/B0BP88MYM4?tag=popularai-20">1200W power supply</a> should be treated as a realistic floor for modest systems. A high-quality <a href="https://www.amazon.com/Seasonic-ATX-3-0-TX-1600-SSR-1600TR2/dp/B0C571LRNB?tag=popularai-20">1600W power supply</a> is safer if you run a high-end CPU, many drives, or power-hungry partner cards.</p><p>Plan for Linux if this will be a serious local LLM box. Windows can work for local AI, especially with LM Studio, Ollama, and llama.cpp builds. For dual-GPU LLM serving, Linux is usually the cleaner path. vLLM, CUDA tooling, and server-style workflows tend to be happier there.</p><p>Our guide to <a href="https://www.popularai.org/p/dual-gpu-ai-pc-builds-local-llm-2026">dual GPU AI PC builds for local LLMs</a> is the better follow-up if you are planning a dedicated dual-card box rather than upgrading a daily desktop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xPZS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xPZS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!xPZS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!xPZS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!xPZS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xPZS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1955103,&quot;alt&quot;:&quot;Are dual RTX 3090s still the best budget local AI upgrade?&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/202106960?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Are dual RTX 3090s still the best budget local AI upgrade?" title="Are dual RTX 3090s still the best budget local AI upgrade?" srcset="https://substackcdn.com/image/fetch/$s_!xPZS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!xPZS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!xPZS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!xPZS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F415b5c99-1af7-48d2-8b84-af3573a21c1d_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Dual RTX 3090s can still run serious local LLM workloads in 2026, but newer single GPUs and unified memory systems change the tradeoff. &#169; Popular AI</figcaption></figure></div><p>Finally, test the models through an API before spending, if at all possible. The Reddit poster had a specific language-quality problem. You don&#8217;t want to invest in hardware upgrades, only to make your workstation more efficient at delivering subpar results.</p><p>Before buying hardware, test the actual target models through a hosted endpoint, rental GPU, or friend&#8217;s setup. Do not buy 48GB of VRAM because &#8220;70B is better&#8221; in the abstract. Buy it because the specific model you want gives better Korean output, better coding help, or better private document results.</p><p>A small amount of API spending can prevent a bad hardware purchase.</p><h3>What to buy instead</h3><p>A single <a href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20">RTX 3090</a> is the safer stepping stone if you are unsure. It gives you 24GB VRAM, lets you test larger 30B and 32B models, and can later become the first half of a dual-GPU build.</p><p>Our <a href="https://www.popularai.org/p/how-to-choose-the-right-local-llm-for-8gb-12gb-and-24gb-vram">VRAM-tier guide for local LLMs</a> is the right place to start if you are still learning what 8GB, 12GB, 24GB, and 48GB actually change.</p><div><hr></div><h4><em><strong>More on picking the right model for your VRAM:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;d86a1fcb-3c56-4656-8192-af1bd8f801db&quot;,&quot;caption&quot;:&quot;Running a local model sounds wonderfully simple. One box. One model. No API bill. No usage cap. No surprise account lockout.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to choose the right local LLM for 8GB, 12GB, and 24GB VRAM&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-15T14:18:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!CEOc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6a71d4f-7366-4a02-86b4-2d5471da6e55_2560x1507.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/how-to-choose-the-right-local-llm-for-8gb-12gb-and-24gb-vram&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191511400,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><p>An <a href="https://www.amazon.com/GIGABYTE-nVidia-GeForce-GAMING-GDDR6X/dp/B0BH8MK76C?tag=popularai-20">RTX 4090</a> is the better pick if you want the cleanest 24GB experience. It is faster, newer, and more efficient per unit of work than a 3090. NVIDIA lists the RTX 4090 with <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4090/">24GB VRAM, 4th-generation Tensor Cores, 1321 AI TOPS, and 450W total graphics power</a>. It does not support NVLink.</p><p>Buy it if you want one excellent local AI GPU and do not need 48GB total VRAM.</p><p>An <a href="https://www.amazon.com/ASUS-Graphics-Military-Grade-Components-Protective/dp/B0DS2X13PH?tag=popularai-20">RTX 5090</a> is the high-end consumer answer for people who want newer architecture, a single-card setup, 32GB VRAM, and strong AI acceleration. NVIDIA lists it with <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/">32GB GDDR7, 5th-generation Tensor Cores, 3352 AI TOPS, and 575W graphics power</a>.</p><p>Buy it if you want a simpler premium card and can pay for it.</p><p>Do not buy it expecting it to replace 48GB total VRAM for every dual-GPU LLM use case.</p><p>A <a href="https://www.amazon.com/-/es/Apple-Studio-n%C3%BAcleos-memoria-unificada/dp/B0FRNJDKTG?tag=popularai-20">Mac Studio</a> or <a href="https://www.amazon.com/GMKtec-ryzen_ai_mini_pc_evo_x2/dp/B0F53MLYQ6?tag=popularai-20">Ryzen AI Max+ 395 mini PC</a> can be attractive if you want to load larger models into one memory pool. This is especially relevant for huge MoE models or experiments where fitting the model matters more than tokens per second.</p><p>Buy unified memory if you value capacity, power efficiency, and a single memory pool.</p><p>Skip it if CUDA compatibility, mature NVIDIA inference tooling, and maximum local LLM speed per dollar matter more.</p><div><hr></div><h3>FAQ</h3><h4>Are two RTX 3090s better than one RTX 4080 for local LLMs?</h4><blockquote><p>Yes, for larger local LLMs. Two RTX 3090s give you 48GB total VRAM, while the RTX 4080 gives you 16GB. The RTX 4080 is newer and more efficient, but the 16GB limit blocks many larger model workflows. NVIDIA lists the RTX 4080 at <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4080-family/">16GB GDDR6X and 320W total graphics power</a>.</p><div><hr></div></blockquote><h4>Does NVLink make two RTX 3090s act like one 48GB GPU?</h4><blockquote><p>No. NVLink can improve GPU-to-GPU communication in supported workflows, and the RTX 3090 supports NVLink, but software still has to split the workload correctly. NVIDIA&#8217;s CUDA memory documentation distinguishes <a href="https://docs.nvidia.com/cuda/cuda-programming-guide/02-basics/understanding-memory.html">CPU memory and each GPU&#8217;s memory range inside the unified virtual address space</a>.</p><div><hr></div></blockquote><h4>Can dual RTX 3090s run 70B models?</h4><blockquote><p>Yes, many 70B quantized models can run across two RTX 3090s, but context length and runtime overhead matter. A <a href="https://huggingface.co/bartowski/Meta-Llama-3-70B-Instruct-GGUF">Llama 3 70B Q4_K_M GGUF is listed at 42.52GB</a>, which leaves limited headroom inside 48GB total VRAM.</p><div><hr></div></blockquote><h4>Should I sell my RTX 4080 for two RTX 3090s?</h4><blockquote><p>Only if local LLM memory is the main problem you are trying to solve. If you use the RTX 4080 for gaming, video, image generation, or a quiet daily workstation, replacing it with two older hot cards may feel worse. If your real goal is 70B local inference, the trade can make sense.</p><div><hr></div></blockquote><h4>Is a used RTX 3090 risky?</h4><blockquote><p>Yes. Used RTX 3090s can be excellent, but they are old, hot, and often worked hard. Check thermals, fan noise, memory stability, seller history, warranty status, and whether pads or paste were replaced. Leave money in the budget for maintenance.</p><div><hr></div></blockquote><h4>Are dual RTX 3090s still a good value in 2026?</h4><blockquote><p>Yes, if the purchase price is right. BestValueGPU&#8217;s U.S. tracker showed a <a href="https://bestvaluegpu.com/history/new-and-used-rtx-3090-price-history-and-specs/">used RTX 3090 around $1050 on eBay as of June 6, 2026</a>, but many local AI buyers should be more selective than that. Dual 3090s are compelling near bargain used prices. They are less compelling if each card costs nearly as much as newer options.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p><a href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20">Dual RTX 3090s</a> are still worth buying for local AI in 2026 if you are building a local LLM box and can get the cards cheaply. The setup is especially attractive for 30B, 32B, and carefully configured 70B inference, where 48GB total VRAM changes what you can run at home.</p><p>Do not buy them because they are elegant. Buy them because used 24GB CUDA cards still offer a rare combination of meaningful VRAM, strong local software support, and pricing that can beat newer single-card options for large model experimentation.</p><p>For most buyers, the smartest path is to buy one <a href="https://www.amazon.com/NVIDIA-RTX-3090-Founders-Graphics/dp/B08HR6ZBYJ?tag=popularai-20">RTX 3090</a> first, test the exact models you care about, then add the second card only if 24GB is truly holding you back.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/dual-rtx-3090-local-ai-2026/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/dual-rtx-3090-local-ai-2026/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[4x or 8x RTX 3090 local AI servers: still worth building in 2026?]]></title><description><![CDATA[A practical guide to 4x and 8x RTX 3090 local AI servers, covering VRAM, EPYC, Threadripper Pro, power, cooling, NVLink, and value.]]></description><link>https://www.popularai.org/p/4x-8x-rtx-3090-server-local-ai-2026</link><guid isPermaLink="false">https://www.popularai.org/p/4x-8x-rtx-3090-server-local-ai-2026</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Sun, 14 Jun 2026 22:50:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!u-KG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u-KG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u-KG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!u-KG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!u-KG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!u-KG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u-KG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2241062,&quot;alt&quot;:&quot;4x and 8x RTX 3090 local AI servers, including VRAM, EPYC, Threadripper Pro, power, cooling, NVLink, and upgrade value&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/201898156?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="4x and 8x RTX 3090 local AI servers, including VRAM, EPYC, Threadripper Pro, power, cooling, NVLink, and upgrade value" title="4x and 8x RTX 3090 local AI servers, including VRAM, EPYC, Threadripper Pro, power, cooling, NVLink, and upgrade value" srcset="https://substackcdn.com/image/fetch/$s_!u-KG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!u-KG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!u-KG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!u-KG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d219586-b0d4-4437-ab9e-e4b659a2a2d4_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Should you build a 4x or 8x RTX 3090 AI server in 2026? Here is how to judge GPU memory, platform choice, power draw, cooling, and real-world local LLM performance. &#169; Popular AI</figcaption></figure></div><p>A 4x RTX 3090 server can still be worth building for local AI in 2026, but only for the right buyer. Four cards give you 96GB of total GPU memory, mature CUDA support, and enough headroom for serious local LLM, ComfyUI, image generation, and batch inference work.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/4x-8x-rtx-3090-server-local-ai-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/4x-8x-rtx-3090-server-local-ai-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>The hard part is everything around the GPUs. A multi-GPU local AI server lives or dies by the platform, PCIe layout, power delivery, chassis, cooling, noise, used-card condition, and software stack. The cards are only the beginning.</p><p>An 8x RTX 3090 server belongs in another category. That is a loud, hot, power-hungry server project that should be compared against used enterprise GPUs, cloud rental, or two smaller nodes before anyone starts buying parts.</p><p>The <a href="https://www.reddit.com/r/LocalLLaMA/comments/1rozgei/best_way_to_build_a_4_rtx_3090_ai_server_with/">Reddit thread that kicked off this question</a> asks about starting with 4x RTX 3090s, later scaling to 8x, and using the box for coding agents, open-source coding models, ComfyUI, Stable Diffusion, video models, and multi-GPU inference. The poster&#8217;s concerns are exactly the right ones: PCIe bandwidth, power, cooling, NVLink, and framework support.</p><div><hr></div><h4><em><strong>More on the RTX 3090 for local AI builds:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;5626a4cc-dfb4-4c32-9b57-1673af33ccce&quot;,&quot;caption&quot;:&quot;The RTX 3090 is still one of the most relevant local AI GPUs you can buy in 2026. That sounds strange for a card that launched in 2020, but ComfyUI users care about one thing more than marketing cycles: whether &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;RTX 3090 ComfyUI performance in 2026: is it still worth buying?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-14T14:04:41.689Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Pcq2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6af31000-a08a-4129-944f-2e588c81ff42_2340x1316.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/rtx-3090-comfyui-performance-in-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:193966808,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><p><em>Disclosure: This post includes Amazon affiliate links. If you buy through them, Popular AI may earn a small commission at no extra cost to you.</em></p><div><hr></div><h3>Quick verdict</h3><blockquote><p><strong>Best answer for most serious local AI users:</strong> Build <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">2x RTX 3090</a> first. It is far easier to cool, power, debug, and actually use.</p></blockquote><blockquote><p><strong>Best 4x option:</strong> Use <a href="https://www.amazon.com/s?k=AMD+EPYC+processor&amp;tag=popularai-20">EPYC</a> or <a href="https://www.amazon.com/s?k=Threadripper+Pro+workstation&amp;tag=popularai-20">Threadripper Pro</a> with a board and chassis designed for multi-GPU spacing. Four <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090s</a> can be rational if you need 96GB aggregate VRAM for local LLM experiments, batch jobs, or multiple concurrent workers.</p></blockquote><blockquote><p><strong>Best 8x answer:</strong> Skip it unless you already understand server chassis, risers, 240V power, Linux, remote management, and multi-GPU inference stacks. Eight <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090s</a> mean 192GB aggregate VRAM, but the surrounding build is the real product.</p></blockquote><blockquote><p><strong>Best platform for 4x:</strong> Used <a href="https://www.amazon.com/s?k=AMD+EPYC+7003&amp;tag=popularai-20">AMD EPYC 7003</a> if value matters, <a href="https://www.amazon.com/s?k=Threadripper+Pro+WRX90+motherboard&amp;tag=popularai-20">Threadripper Pro WRX90</a> if you want a workstation experience, and modern <a href="https://www.amazon.com/s?k=AMD+EPYC+9004&amp;tag=popularai-20">EPYC 9004</a> or <a href="https://www.amazon.com/s?k=AMD+EPYC+9005&amp;tag=popularai-20">9005</a> if you are building a real server from scratch.</p></blockquote><blockquote><p><strong>Best platform for 8x:</strong> A real GPU server platform, not a normal workstation board. Think <a href="https://www.amazon.com/s?k=AMD+EPYC+server+platform&amp;tag=popularai-20">EPYC</a> or <a href="https://www.amazon.com/s?k=Xeon+GPU+server+chassis&amp;tag=popularai-20">Xeon server chassis</a> with risers, high-airflow fans, redundant power, and a plan for noise.</p></blockquote><blockquote><p><strong>Skip the build if:</strong> You expect the GPUs to behave like one giant 96GB or 192GB card, need quiet office use, want plug-and-play Windows workflows, or plan to run it on a normal household circuit.</p></blockquote><div><hr></div><h3>Who should build a multi-GPU RTX 3090 server?</h3><p>This guide is for local AI users who are past the single-GPU stage and are considering a serious multi-GPU server. A 4x RTX 3090 box can make sense when you already know why 24GB or 48GB of VRAM is too tight for your workflow.</p><p>The strongest use cases are local LLM inference with larger quantized models, coding agents that need private repo access, ComfyUI image workflows with large graphs, batch image generation, some LoRA training and fine-tuning, and local serving stacks such as <code>llama.cpp</code>, vLLM, TabbyAPI, or ExLlamaV2. It can also make sense when you want several smaller models or workers running at the same time.</p><p>It is a poor fit for casual Ollama use, quiet desktop inference, basic Stable Diffusion, or a first local AI PC. Start with <a href="https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026">a single used RTX 3090 local LLM PC</a>, then look at <a href="https://www.popularai.org/p/dual-gpu-ai-pc-builds-local-llm-2026">dual GPU local AI builds</a> before jumping into 4x or 8x territory.</p><p>The biggest mistake is treating a multi-GPU box like a normal desktop with more cards. A serious <a href="https://www.amazon.com/s?k=4x+RTX+3090+24GB&amp;tag=popularai-20">4x RTX 3090</a> local AI server is closer to a home-lab appliance. You will manage drivers, CUDA versions, model placement, remote access, power limits, temperatures, logs, and failed jobs. That can be worth it, but only when the workload justifies the hassle.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Why the RTX 3090 is still tempting</h3><p>The RTX 3090 is old, inefficient by current standards, and awkward to cool in dense builds. But it remains tempting for one reason: <em>24GB of VRAM on a CUDA card</em>.</p><p><a href="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/">NVIDIA&#8217;s official RTX 3090 specs</a> list 10,496 CUDA cores, 24GB of GDDR6X memory, a 384-bit memory interface, and third-generation Tensor Cores. <a href="https://www.gigabyte.com/dk/Graphics-Card/GV-N3090GAMING-OC-24GD/sp">Gigabyte&#8217;s RTX 3090 Gaming OC spec page</a> lists 936GB/s memory bandwidth, PCIe 4.0 x16, 24GB GDDR6X, a 55mm card thickness, a 750W recommended PSU for a single-card system, and 2-way NVLink support.</p><p>That makes the basic VRAM math attractive. One <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> gives you 24GB. Two cards give you 48GB aggregate VRAM. Four cards give you 96GB aggregate VRAM. Eight cards give you 192GB aggregate VRAM.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7aAi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7aAi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7aAi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7aAi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7aAi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg" width="458" height="335.635989010989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1067,&quot;width&quot;:1456,&quot;resizeWidth&quot;:458,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7aAi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7aAi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7aAi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7aAi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a964f25-bbb3-4575-9b3e-614921c05516_1498x1098.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 3090 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20"><span>Find RTX 3090 deals on Amazon</span></a></p><p>The catch is that aggregate VRAM does not behave like one transparent memory pool. Multi-GPU inference can split model weights across cards, but the software stack and interconnect decide whether the result is pleasant or painful.</p><p>The <code>llama.cpp</code><a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/multi-gpu.md"> multi-GPU guide</a> says multi-GPU can help when a model does not fit in one GPU&#8217;s VRAM or when distributing compute improves throughput, while warning that performance depends on split mode and interconnect speed. The <a href="https://docs.vllm.ai/en/stable/serving/parallelism_scaling/">vLLM parallelism and scaling documentation</a> recommends single-GPU inference when the model fits on one GPU, then tensor parallelism when the model is too large for one GPU but still fits inside one multi-GPU node.</p><p>That is the decision in practical terms. Buy multiple RTX 3090s when you need to fit or serve workloads that one 24GB card cannot handle. Do not buy them expecting perfect linear scaling.</p><h3>Is 4x RTX 3090 still worth it?</h3><p>Yes, but only under specific conditions.</p><p>A 4x RTX 3090 server gives you enough aggregate VRAM to experiment with larger local models, run multiple inference workers, and keep heavy image workflows local. It can be a good private AI server if you buy the cards well, power-limit them, and build around airflow rather than desktop looks.</p><p>The 4x build is most defensible when you can buy tested <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090 24GB cards</a> at a good used price, you are comfortable running Linux, you need more than 48GB aggregate VRAM, and you will use the box often enough to justify the power and noise. It also makes more sense if the server can live away from your desk and if you care about keeping private code, documents, prompts, and image workflows off hosted accounts.</p><p>The 4x build gets weaker when you only need one model loaded for personal chat, expect frontier-model quality from local coding agents, need quiet office use, or want easy Windows desktop behavior. It also stops looking smart when used RTX 3090 cards are priced close to newer flagship GPUs.</p><p>Used RTX 3090 pricing is the swing factor. As of June 6, 2026, <a href="https://bestvaluegpu.com/en-eu/history/new-and-used-rtx-3090-price-history-and-specs/">BestValueGPU&#8217;s EU tracker</a> listed used RTX 3090 pricing around &#8364;842.72 on eBay and a much higher current Amazon price, which shows the used-market gap and the danger of overpaying. Recent <a href="https://www.reddit.com/r/LocalLLaMA/comments/1t3oajt/3090_prices_in_2026/">LocalLLaMA users are still debating RTX 3090 prices</a> around $850 local deals and much higher eBay listings.</p><p>A practical U.S. rule: a clean, tested RTX 3090 around $800 to $900 can still be interesting. At $1,100 or above, the argument weakens unless it is a known-good blower, water-cooled card, or warranty-backed unit that solves a specific build problem.</p><h3>Is 8x RTX 3090 still worth it?</h3><p>Usually, no.</p><p>An 8x RTX 3090 server gives you 192GB aggregate VRAM, which sounds fantastic until the rest of the machine appears. Eight reference-class 350W RTX 3090s mean up to 2,800W for GPUs alone before the CPU, motherboard, RAM, storage, fans, networking, and PSU demands even enter the picture. That is a server-room power problem.</p><p>Using the <a href="https://www.eia.gov/electricity/monthly/update/end-use.php">EIA&#8217;s March 2026 all-sector U.S. electricity average</a> of 14.18 cents per kWh as a rough national reference, a system that averages 2kW under sustained load costs about $204 per 30-day month if it runs 24/7. At 3kW, that rises to about $306 before cooling overhead.</p><p>The bigger problems are physical and operational. Eight consumer cards need serious airflow. Most RTX 3090 models are too thick for dense slots. Consumer open-air coolers recycle each other&#8217;s heat in server chassis. Risers add failure points. NVLink will not turn eight cards into one memory pool. Used cards may already have years of thermal stress. Multi-GPU scaling depends heavily on model size, backend, batch size, split strategy, PCIe layout, and interconnect.</p><p>The machine will also be loud. A normal 120V household circuit is the wrong assumption for a fully loaded 8x build. If you are thinking about <strong>8x RTX 3090</strong>, you should also be thinking about a <a href="https://www.amazon.com/s?k=240V+PDU+rackmount&amp;tag=popularai-20">240V PDU</a>, a real rack or open-frame plan, and a room where fan noise is acceptable.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.amazon.com/s?k=240V+PDU+rackmount&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V2DL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg 424w, https://substackcdn.com/image/fetch/$s_!V2DL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg 848w, https://substackcdn.com/image/fetch/$s_!V2DL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!V2DL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V2DL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg" width="624" height="198.85714285714286" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:1456,&quot;resizeWidth&quot;:624,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/s?k=240V+PDU+rackmount&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V2DL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg 424w, https://substackcdn.com/image/fetch/$s_!V2DL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg 848w, https://substackcdn.com/image/fetch/$s_!V2DL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!V2DL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57293be-a3f6-40ee-8d93-16d6e74ce926_1500x478.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=240V+PDU+rackmount&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find 240V PDU deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=240V+PDU+rackmount&amp;tag=popularai-20"><span>Find 240V PDU deals on Amazon</span></a></p><p>At that point, compare the project against alternatives. NVIDIA&#8217;s <a href="https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/">RTX PRO 6000 Blackwell Workstation Edition</a> has 96GB of GDDR7 ECC memory, 1,792GB/s bandwidth, and a 600W max power rating in a dual-slot professional card. A card like <a href="https://www.amazon.com/s?k=RTX+PRO+6000+Blackwell&amp;tag=popularai-20">RTX PRO 6000 Blackwell</a> will not be cheap, but it shows why the 8x RTX 3090 idea gets dangerous. Once chassis, power, cooling, risers, replacement cards, and setup time are included, cheap VRAM can stop being cheap.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=RTX+PRO+6000+Blackwell&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g3Bi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg 424w, https://substackcdn.com/image/fetch/$s_!g3Bi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg 848w, https://substackcdn.com/image/fetch/$s_!g3Bi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!g3Bi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g3Bi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg" width="474" height="309.38037486218303" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/feb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:907,&quot;resizeWidth&quot;:474,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/s?k=RTX+PRO+6000+Blackwell&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g3Bi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg 424w, https://substackcdn.com/image/fetch/$s_!g3Bi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg 848w, https://substackcdn.com/image/fetch/$s_!g3Bi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!g3Bi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb765fa-5cc2-45ab-8e9e-58315d57fa5b_907x592.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=RTX+PRO+6000+Blackwell&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX PRO 6000 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=RTX+PRO+6000+Blackwell&amp;tag=popularai-20"><span>Find RTX PRO 6000 deals on Amazon</span></a></p><p>The 8x RTX 3090 build makes sense only when the build itself is part of the project, you have the power and space, and you are deliberately choosing used consumer GPUs over enterprise hardware.</p><h3>Platform choice: Threadripper, EPYC, or Xeon?</h3><p>For a 4x or 8x RTX 3090 server, the CPU matters less than the platform. You are buying PCIe lanes, slot layout, memory capacity, stability, remote management, and a chassis path that will keep the GPUs alive.</p><p>A fast desktop CPU with a gaming motherboard can run local AI well with one GPU, and sometimes two. Four GPUs changes the conversation. Eight GPUs ends the workstation fantasy unless the board, chassis, risers, power, and cooling were designed for that density.</p><h3>AMD EPYC is the best value for server-style builds</h3><p><a href="https://www.amazon.com/s?k=AMD+EPYC+7003+server+CPU&amp;tag=popularai-20">AMD EPYC</a> is the most practical answer if this machine is a server rather than a desk workstation.</p><p>Used EPYC 7003 platforms are especially attractive because they offer lots of PCIe lanes, ECC memory, and server boards at prices that can make sense for home labs. <a href="https://www.supermicro.com/en/products/motherboard/H12SSL-i">Supermicro&#8217;s H12SSL-i</a> supports a single AMD EPYC 7003 or 7002 CPU, up to 2TB registered ECC DDR4, and has five PCIe 4.0 x16 slots plus two PCIe 4.0 x8 slots. A <a href="https://www.amazon.com/s?k=Supermicro+H12SSL-i&amp;tag=popularai-20">Supermicro H12SSL-i</a> is not automatically a perfect 4x RTX 3090 board because physical GPU thickness still matters, but it shows why EPYC is attractive. The platform has the lanes and server features.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=AMD+EPYC+7003+server+CPU&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S-u5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg 424w, https://substackcdn.com/image/fetch/$s_!S-u5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg 848w, https://substackcdn.com/image/fetch/$s_!S-u5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!S-u5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S-u5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg" width="299" height="287.8648275862069" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:725,&quot;resizeWidth&quot;:299,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/s?k=AMD+EPYC+7003+server+CPU&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S-u5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg 424w, https://substackcdn.com/image/fetch/$s_!S-u5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg 848w, https://substackcdn.com/image/fetch/$s_!S-u5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!S-u5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2b9b717-574c-44a1-80d6-58bd6c0b458e_725x698.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=AMD+EPYC+7003+server+CPU&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find AMD EPYC 7003 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=AMD+EPYC+7003+server+CPU&amp;tag=popularai-20"><span>Find AMD EPYC 7003 deals on Amazon</span></a></p><p>Modern EPYC 9004 and 9005 platforms push the server route further. Supermicro lists <a href="https://www.supermicro.com/en/support/resources/cpu-amd-epyc-9005-9004-7003">EPYC 9005 processors</a> with up to 160 PCIe Gen 5 lanes and 12 DDR5 channels, while EPYC 9004 models list 128 PCIe Gen 5 lanes and 12 DDR5 channels. That is the kind of platform to consider when 8 GPUs are a serious plan.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=AMD+EPYC+9005+server+CPU&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2qTU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2qTU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2qTU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2qTU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2qTU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg" width="348" height="307.2523636363636" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1214,&quot;width&quot;:1375,&quot;resizeWidth&quot;:348,&quot;bytes&quot;:155534,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/s?k=AMD+EPYC+9005+server+CPU&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2qTU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2qTU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2qTU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2qTU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea1e9902-91e4-4a71-8a19-fa82b6bb8123_1375x1214.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=AMD+EPYC+9005+server+CPU&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find AMD EPYC 9005 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=AMD+EPYC+9005+server+CPU&amp;tag=popularai-20"><span>Find AMD EPYC 9005 deals on Amazon</span></a></p><p>Use EPYC if you want a headless Linux server, care about PCIe lanes, need <a href="https://www.amazon.com/s?k=ECC+RDIMM+server+memory&amp;tag=popularai-20">ECC RAM</a>, want IPMI, and can tolerate server noise and setup friction.</p><h3>Threadripper Pro is the clean workstation path</h3><p><a href="https://www.amazon.com/s?k=AMD+Threadripper+Pro+7995WX&amp;tag=popularai-20">Threadripper Pro</a> is the better fit if the machine needs to behave like a powerful workstation rather than a rack server.</p><p>AMD&#8217;s <a href="https://www.amd.com/en/support/downloads/drivers.html/processors/ryzen-threadripper-pro/amd-ryzen-threadripper-pro-7000-wx-series/amd-ryzen-threadripper-pro-7995wx.html">Threadripper Pro 7995WX page</a> lists PCIe 5.0 support, 148 native PCIe lanes with 128 usable PCIe 5.0 lanes, 8 memory channels, and DDR5 RDIMM support up to 5200 MT/s. AMD&#8217;s <a href="https://www.amd.com/en/products/processors/workstations/ryzen-threadripper.html">workstation platform guidance</a> also distinguishes TRX50 with up to 80 PCIe lanes and 4-channel memory from WRX90 with 8-channel memory and enterprise-class expandability.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=AMD+Threadripper+Pro+7995WX&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C5l1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C5l1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C5l1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C5l1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C5l1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg" width="351" height="332.67857142857144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1380,&quot;width&quot;:1456,&quot;resizeWidth&quot;:351,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/s?k=AMD+Threadripper+Pro+7995WX&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C5l1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C5l1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C5l1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C5l1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56605708-e4e0-4263-9026-7bc37ae162d0_1500x1422.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=AMD+Threadripper+Pro+7995WX&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find Threadripper Pro 7995WX on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=AMD+Threadripper+Pro+7995WX&amp;tag=popularai-20"><span>Find Threadripper Pro 7995WX on Amazon</span></a></p><p>For a 4x RTX 3090 workstation, <a href="https://www.amazon.com/s?k=WRX90+Threadripper+Pro+motherboard&amp;tag=popularai-20">WRX90</a> is the cleaner Threadripper Pro choice. <strong>TRX50</strong> can be excellent for one or two GPUs, but four or more GPUs push you into lane and slot-layout compromises.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=WRX90+Threadripper+Pro+motherboard&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8BYM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8BYM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8BYM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8BYM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8BYM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg" width="399" height="423.5668789808917" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1500,&quot;width&quot;:1413,&quot;resizeWidth&quot;:399,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/s?k=WRX90+Threadripper+Pro+motherboard&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8BYM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8BYM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8BYM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8BYM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a21723-c62b-4bef-b70a-196f95f480af_1413x1500.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=WRX90+Threadripper+Pro+motherboard&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find WRX90 motherboard deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=WRX90+Threadripper+Pro+motherboard&amp;tag=popularai-20"><span>Find WRX90 motherboard deals on Amazon</span></a></p><p>Use Threadripper Pro if you want a high-end workstation, strong single-thread performance, modern platform support, and less server friction than EPYC.</p><h3>Intel Xeon W can work, but compare it hard</h3><p><a href="https://www.amazon.com/s?k=Intel+Xeon+W+workstation&amp;tag=popularai-20">Xeon W</a> is a legitimate workstation platform, especially if you are buying a prebuilt system or working inside a vendor-certified environment. Xeon W-3400 and W-2400 systems can offer workstation-class memory and PCIe expansion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/dp/B0BYSVKFCR&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JC8E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg 424w, https://substackcdn.com/image/fetch/$s_!JC8E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg 848w, https://substackcdn.com/image/fetch/$s_!JC8E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!JC8E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JC8E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg" width="554" height="331.66761095372993" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:634,&quot;width&quot;:1059,&quot;resizeWidth&quot;:554,&quot;bytes&quot;:37991,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/dp/B0BYSVKFCR&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JC8E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg 424w, https://substackcdn.com/image/fetch/$s_!JC8E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg 848w, https://substackcdn.com/image/fetch/$s_!JC8E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!JC8E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8627e15-1868-471d-be29-fddcba14bb7e_1059x634.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/dp/B0BYSVKFCR&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find Intel Xeon W-3400 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/dp/B0BYSVKFCR&amp;tag=popularai-20"><span>Find Intel Xeon W-3400 deals on Amazon</span></a></p><p>The problem is value and ecosystem. For this specific multi-RTX-3090 build, AMD EPYC and Threadripper Pro usually offer a cleaner path. Xeon W makes more sense if you already have Intel workstation parts, need a certified vendor workstation, or are buying a complete system with known thermals and support.</p><h3>Start with the motherboard and chassis, not the CPU</h3><p>Do not buy the CPU first. Start with the GPU layout.</p><p>For 4x RTX 3090, the motherboard and chassis need to answer basic physical questions before the spec sheet matters. Can four cards physically fit? Are the cards open-air, blower, or water-cooled? Do the slots provide useful PCIe bandwidth? Will the cards pull fresh air, or will they recycle each other&#8217;s exhaust? Is there clearance for power connectors? Can the motherboard boot headless? Does it have IPMI or another remote-management path? Does the chassis actually support the motherboard and GPU layout?</p><p>For 8x RTX 3090, a normal tower is the wrong mental model. Use at least a <a href="https://www.amazon.com/s?k=4U+GPU+server+chassis&amp;tag=popularai-20">4U GPU server chassis</a>, riser backplane, or open-frame lab setup with a real airflow plan. If the build has to live near humans, do not build 8x.<br></p><h4>1. Best overall: SilverStone RM52 5U rackmount chassis</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/SilverStone-Technology-Rackmount-Capability-SST-RM52/dp/B0CJ49KJ7L?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jBq5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jBq5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jBq5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jBq5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jBq5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg" width="539" height="267.27884615384613" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:722,&quot;width&quot;:1456,&quot;resizeWidth&quot;:539,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/SilverStone-Technology-Rackmount-Capability-SST-RM52/dp/B0CJ49KJ7L?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jBq5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jBq5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jBq5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jBq5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58cb6b08-424d-4e13-a860-9e610dc28e82_1500x744.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/SilverStone-Technology-Rackmount-Capability-SST-RM52/dp/B0CJ49KJ7L?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find SilverStone RM52 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/SilverStone-Technology-Rackmount-Capability-SST-RM52/dp/B0CJ49KJ7L?tag=popularai-20"><span>Find SilverStone RM52 deals on Amazon</span></a></p><p>Look for this option first. It is a 5U rackmount case, which gives your server more breathing room than a cramped 4U chassis. SilverStone lists <a href="https://www.silverstonetek.com/en/product/info/server-nas/rm52/">support for SSI-EEB motherboards, dual 360mm radiators, and 8 PCI expansion slots</a>. That makes it one of the cleaner options for a serious 4x RTX 3090 build, especially if you are considering water cooling or a workstation-style rack build.</p><h4>2. Best high-end alternative: SilverStone RM53-502 5U rackmount chassis</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/Silverstone-Technology-RM53-502-Rackmount-SST-RM53-502/dp/B0FXBM7685?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9TGq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9TGq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9TGq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9TGq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9TGq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg" width="489" height="428.5467032967033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1276,&quot;width&quot;:1456,&quot;resizeWidth&quot;:489,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/Silverstone-Technology-RM53-502-Rackmount-SST-RM53-502/dp/B0FXBM7685?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9TGq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9TGq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9TGq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9TGq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a214bd0-a476-494b-83fe-c7b1623fd46d_1500x1315.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/Silverstone-Technology-RM53-502-Rackmount-SST-RM53-502/dp/B0FXBM7685?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find SilverStone RM53-502 deals (Amazon)&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/Silverstone-Technology-RM53-502-Rackmount-SST-RM53-502/dp/B0FXBM7685?tag=popularai-20"><span>Find SilverStone RM53-502 deals (Amazon)</span></a></p><p>This is the better &#8220;more serious build&#8221; alternative if you want a newer 5U layout with dual PSU support. SilverStone&#8217;s product material <a href="https://www.avadirect.com/RM53-502-5U-Rackmount-Chassis-360mm-AIO-Support-2x-5-25-1x-3-5-5x-2-5-Black/Product/19368468?srsltid=AfmBOooG-d2b1Fp91cJ_JHmIa6ok1XZHdQPaOlJQMY_mmKg08vH3BE6a">lists SSI-EEB support, 8 PCI expansion slots, 360mm radiator support, dual PSU support, and additional cooling support for graphics cards</a>.</p><h4>3. Best cheaper 4U option: SilverStone RM44 4U rackmount chassis</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/SilverStone-Technology-Rackmount-Capability-SST-RM44/dp/B0BFC8FZ8B?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HFCo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HFCo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HFCo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HFCo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HFCo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg" width="432" height="500.7727975270479" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1500,&quot;width&quot;:1294,&quot;resizeWidth&quot;:432,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/SilverStone-Technology-Rackmount-Capability-SST-RM44/dp/B0BFC8FZ8B?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HFCo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HFCo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HFCo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HFCo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3aad21c2-e396-4780-816a-e010299ba623_1294x1500.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/SilverStone-Technology-Rackmount-Capability-SST-RM44/dp/B0BFC8FZ8B?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find SilverStone RM44 4U deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/SilverStone-Technology-Rackmount-Capability-SST-RM44/dp/B0BFC8FZ8B?tag=popularai-20"><span>Find SilverStone RM44 4U deals on Amazon</span></a></p><p>This is the cleaner budget pick for an 4x RTX 3090 build because it has 8 PCI expansion slots, SSI-EEB support, and 360mm radiator support. SilverStone&#8217;s own RM44 material says <a href="https://www.silverstonetek.com/it/press-release/RM44">it has 8 PCI expansion slots and can support up to four dual-slot graphics cards</a>, which is exactly the kind of caveat you need for a build like this. It is not a great fit for four thick open-air RTX 3090s, but it can still make sense with dual-slot, blower, or water-cooled cards.</p><p>Risers deserve extra caution. Cheap <a href="https://www.amazon.com/s?k=PCIe+4.0+riser+cable&amp;tag=popularai-20">PCIe risers</a> can turn a good build into a random failure generator. For local AI, a flaky GPU link is worse than a slightly slower one because long inference or training jobs can fail after hours of work. Buy known-good risers, keep cable runs short, and test each card under load before trusting the server.<br><br>For a normal open-frame or workstation-style GPU relocation, use a known-brand PCIe 4.0 x16 riser rather than a cheap mining riser. The <a href="https://www.amazon.com/Thermaltake-Premium-Flexible-Extender-AC-058-CO1OTN-C1/dp/B096T9JCNG?tag=popularai-20">Thermaltake TT Premium PCI-E 4.0 300mm riser cable</a> is the safer bet because it is a specific PCIe 4.0 x16 extender, not a generic riser search page.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/Thermaltake-Premium-Flexible-Extender-AC-058-CO1OTN-C1/dp/B096T9JCNG?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vI71!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vI71!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vI71!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vI71!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vI71!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg" width="430" height="345.83104395604397" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1171,&quot;width&quot;:1456,&quot;resizeWidth&quot;:430,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/Thermaltake-Premium-Flexible-Extender-AC-058-CO1OTN-C1/dp/B096T9JCNG?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vI71!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vI71!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vI71!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vI71!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f486ce-145b-4e7b-a74d-31268bfcc08b_1500x1206.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/Thermaltake-Premium-Flexible-Extender-AC-058-CO1OTN-C1/dp/B096T9JCNG?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find PCIe riser deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/Thermaltake-Premium-Flexible-Extender-AC-058-CO1OTN-C1/dp/B096T9JCNG?tag=popularai-20"><span>Find PCIe riser deals on Amazon</span></a></p><h3>Cooling is the part people underestimate</h3><p>RTX 3090 cards were not designed for quiet, dense 8-GPU inference boxes.</p><p>Many consumer RTX 3090s are thick open-air cards. The <a href="https://www.gigabyte.com/dk/Graphics-Card/GV-N3090GAMING-OC-24GD/sp">Gigabyte RTX 3090 Gaming OC</a> is listed at 320 x 129 x 55mm. The <a href="https://storage-asset.msi.com/datasheet/original/vga/global/GeForce-RTX-3090-GAMING-X-TRIO-24G.pdf">MSI RTX 3090 Gaming X Trio</a> is listed at 323 x 140 x 56mm with 370W board power. That is roughly a triple-slot problem before airflow is considered.</p><p>For 4x, the practical approaches are two-slot blower 3090s in a server chassis, water-cooled 3090s with enough radiator capacity, an open-frame layout with aggressive directed airflow, or fewer GPUs per node. A <a href="https://www.amazon.com/s?k=blower+RTX+3090+24GB&amp;tag=popularai-20">blower RTX 3090</a> can make more sense than a nicer-looking open-air card in dense layouts because it moves heat in a more predictable direction. A <a href="https://www.amazon.com/s?k=water+cooled+RTX+3090&amp;tag=popularai-20">water-cooled RTX 3090</a> can also work, but only if the radiator space and pump layout are planned before buying parts.</p><p>For 8x, random triple-slot open-air cards are a maintenance trap. They can work in an open-frame lab rig if you accept noise, dust, cable clutter, and hands-on maintenance. They are the wrong choice for a tidy office workstation.</p><h3>NVLink is useful in narrow cases</h3><p>RTX 3090 is one of the rare consumer GeForce cards with NVLink support. Some board-partner specs list 2-way NVLink support, including the <a href="https://www.gigabyte.com/dk/Graphics-Card/GV-N3090GAMING-OC-24GD/sp">Gigabyte RTX 3090 Gaming OC</a>.</p><p>That does not make 4x or 8x RTX 3090 behave like one big GPU. <a href="https://www.amazon.com/s?k=RTX+3090+NVLink+bridge&amp;tag=popularai-20">NVLink bridges</a> can help certain workloads and card pairs, but modern local LLM workflows more often rely on tensor parallelism, layer splitting, or multiple workers. vLLM and <code>llama.cpp</code> can use multiple GPUs without NVLink, but performance depends on the workload and communication pattern.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.amazon.com/dp/B08S1RYPP6?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DxOq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png 424w, https://substackcdn.com/image/fetch/$s_!DxOq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png 848w, https://substackcdn.com/image/fetch/$s_!DxOq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png 1272w, https://substackcdn.com/image/fetch/$s_!DxOq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DxOq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png" width="399" height="225.22169811320754" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1272,&quot;resizeWidth&quot;:399,&quot;bytes&quot;:301856,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.amazon.com/dp/B08S1RYPP6?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/201898156?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DxOq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png 424w, https://substackcdn.com/image/fetch/$s_!DxOq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png 848w, https://substackcdn.com/image/fetch/$s_!DxOq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png 1272w, https://substackcdn.com/image/fetch/$s_!DxOq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f409a33-2501-4a4c-9370-ddc12ca08098_1272x718.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/dp/B08S1RYPP6?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX NVLink bridge deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/dp/B08S1RYPP6?tag=popularai-20"><span>Find RTX NVLink bridge deals on Amazon</span></a></p><p>Treat NVLink as a bonus for specific two-card cases. Do not make it the foundation of the build.</p><h3>RAM and storage recommendations</h3><p>For a 4x RTX 3090 local AI server, start with <a href="https://www.amazon.com/s?k=256GB+ECC+RDIMM+DDR4&amp;tag=popularai-20">256GB ECC RAM</a> if the budget allows. You can run with less, but a serious local AI server tends to collect model files, containers, vector databases, datasets, temporary outputs, and CPU-side services over time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/NEMIX-RAM-Registered-Compatible-Motherboard/dp/B0DC8HCNJQ?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8yOf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8yOf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8yOf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8yOf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8yOf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg" width="424" height="364.010989010989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:424,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/NEMIX-RAM-Registered-Compatible-Motherboard/dp/B0DC8HCNJQ?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8yOf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8yOf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8yOf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8yOf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6fd39f6-991d-40fc-bb79-da63e0a5f59e_1500x1288.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/NEMIX-RAM-Registered-Compatible-Motherboard/dp/B0DC8HCNJQ?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find 256GB ECC RAM deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/NEMIX-RAM-Registered-Compatible-Motherboard/dp/B0DC8HCNJQ?tag=popularai-20"><span>Find 256GB ECC RAM deals on Amazon</span></a></p><p>For 8x, <a href="https://www.amazon.com/s?k=512GB+ECC+RDIMM+server+memory&amp;tag=popularai-20">512GB ECC RAM</a> should be the practical starting point, with 1TB or more making sense for heavier experiments.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/Registered-NEMIX-RAM-Compatible-Motherboard/dp/B0GSRSN46R?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZH5_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZH5_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZH5_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZH5_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZH5_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg" width="463" height="397.49313186813185" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:463,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/Registered-NEMIX-RAM-Compatible-Motherboard/dp/B0GSRSN46R?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZH5_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZH5_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZH5_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZH5_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11382e6d-652d-4b88-a2c4-6e3718d791c5_1500x1288.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/Registered-NEMIX-RAM-Compatible-Motherboard/dp/B0GSRSN46R?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find 512GB ECC RAM deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/Registered-NEMIX-RAM-Compatible-Motherboard/dp/B0GSRSN46R?tag=popularai-20"><span>Find 512GB ECC RAM deals on Amazon</span></a></p><p>System RAM does not replace VRAM. It helps when loading, serving, caching, preprocessing, and avoiding system-level choking. Once a model spills from GPU memory into system RAM, performance can collapse. Popular AI&#8217;s guide to <a href="https://www.popularai.org/p/why-ollama-and-llama-cpp-crawl-when-models-spill-into-ram-and-how-to-fix-it">why Ollama and llama.cpp crawl when models spill into RAM</a> explains that failure mode in more detail.</p><p>Storage should be boring and generous. A 2TB NVMe drive is the minimum for a 4x experimentation box, while a <a href="https://www.amazon.com/s?k=4TB+NVMe+SSD&amp;tag=popularai-20">4TB NVMe SSD</a> is much more comfortable. Go to <a href="https://www.amazon.com/s?k=8TB+NVMe+SSD&amp;tag=popularai-20">8TB NVMe storage</a> or more if you store many models, datasets, generated images, video outputs, and checkpoints. Add separate backup storage if the server holds work you cannot recreate.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.amazon.com/SAMSUNG-Computing-Workstations-MZ-V9P4T0B-AM/dp/B0CHGT1KFJ?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sYiC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sYiC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sYiC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sYiC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sYiC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg" width="513" height="142.69574175824175" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:1456,&quot;resizeWidth&quot;:513,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/SAMSUNG-Computing-Workstations-MZ-V9P4T0B-AM/dp/B0CHGT1KFJ?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sYiC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sYiC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sYiC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sYiC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80bc80a9-56c6-42c2-87b3-a607f3dbfdcc_1500x417.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/SAMSUNG-Computing-Workstations-MZ-V9P4T0B-AM/dp/B0CHGT1KFJ?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find 4TB NVMe SSD deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/SAMSUNG-Computing-Workstations-MZ-V9P4T0B-AM/dp/B0CHGT1KFJ?tag=popularai-20"><span>Find 4TB NVMe SSD deals on Amazon</span></a></p><h3>Power and noise are build-defining constraints</h3><p>For 4x RTX 3090, assume the GPUs alone can demand around 1,400W at stock. For 8x, assume around 2,800W for GPUs alone.</p><p>That does not mean the machine will constantly sit at maximum draw. It means the build must be safe when it does.</p><p>Power-limit the 3090s. For inference-focused use, 250W to 300W per card can be a good target. Use high-quality <a href="https://www.amazon.com/s?k=1600W+power+supply+80+plus+platinum&amp;tag=popularai-20">1600W power supplies</a>, <a href="https://www.amazon.com/s?k=server+power+supply+GPU+mining+2400W&amp;tag=popularai-20">server power supplies</a>, or a carefully planned multi-PSU arrangement with enough headroom. Avoid cheap adapters. Plan cable routing before buying parts. Measure wall power after the build is running.</p><p>For a more conventional 1600W ATX build, the safest reader-facing option is the <a href="https://www.amazon.com/quiet-Titanium-Regulation-Individually-BN501/dp/B0C6FY4JXF?tag=popularai-20">be quiet! Dark Power Pro 13 1600W</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/quiet-Titanium-Regulation-Individually-BN501/dp/B0C6FY4JXF?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YfpN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YfpN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YfpN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YfpN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YfpN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg" width="460" height="411.0287081339713" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:747,&quot;width&quot;:836,&quot;resizeWidth&quot;:460,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/quiet-Titanium-Regulation-Individually-BN501/dp/B0C6FY4JXF?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YfpN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YfpN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YfpN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YfpN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14235076-c4c0-45b9-8c53-fa30038561a0_836x747.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/quiet-Titanium-Regulation-Individually-BN501/dp/B0C6FY4JXF?tag=popularai-20&quot;,&quot;text&quot;:&quot;See Dark Power Pro 13 PSU deals (Amazon)&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/quiet-Titanium-Regulation-Individually-BN501/dp/B0C6FY4JXF?tag=popularai-20"><span>See Dark Power Pro 13 PSU deals (Amazon)</span></a></p><p>For open-frame lab builds or server chassis work, a used <a href="https://www.amazon.com/s?k=HPE+P38997-B21+1600W+Flex+Slot+Platinum+Power+Supply&amp;tag=popularai-20">HPE 1600W Flex Slot PSU</a> can be cheap power, but it is <strong>not</strong> a normal desktop PSU and generally requires 200 to 240V input plus the right breakout hardware.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=HPE+P38997-B21+1600W+Flex+Slot+Platinum+Power+Supply&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!McYh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg 424w, https://substackcdn.com/image/fetch/$s_!McYh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg 848w, https://substackcdn.com/image/fetch/$s_!McYh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!McYh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!McYh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg" width="515" height="352.0059435364042" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:460,&quot;width&quot;:673,&quot;resizeWidth&quot;:515,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/s?k=HPE+P38997-B21+1600W+Flex+Slot+Platinum+Power+Supply&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!McYh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg 424w, https://substackcdn.com/image/fetch/$s_!McYh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg 848w, https://substackcdn.com/image/fetch/$s_!McYh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!McYh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083a43a2-d04f-4fd6-abcb-781f6aefae03_673x460.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=HPE+P38997-B21+1600W+Flex+Slot+Platinum+Power+Supply&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find HPE 1600W Flex Slot deals (Amazon)&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=HPE+P38997-B21+1600W+Flex+Slot+Platinum+Power+Supply&amp;tag=popularai-20"><span>Find HPE 1600W Flex Slot deals (Amazon)</span></a></p><p>Power limiting is one reason RTX 3090 servers still survive in local AI communities. You give up some peak speed, but you reduce heat, noise, and electrical stress. A stable, slightly slower server is more useful than a fast one that throttles, crashes, or trips a breaker.</p><p>Noise matters too. A real GPU server chassis can sound like network closet hardware. That may be fine in a basement, garage, lab, or rack room. It is miserable next to a desk. If silence is a requirement, build a smaller dual-GPU workstation or use fewer higher-VRAM cards.</p><h3>Best 4x RTX 3090 build direction</h3><p>A sensible 4x build starts with either a used EPYC 7003 server board or a Threadripper Pro WRX90 workstation platform. The goal is 96GB aggregate VRAM without turning the system into a fragile science project.</p><p>For GPUs, choose four tested <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090 24GB cards</a> that physically fit the chassis. Blower cards, water-cooled cards, or models with known spacing compatibility are better than random thick gaming cards. Test each card alone before installing all four.</p><p>For memory, target <a href="https://www.amazon.com/s?k=256GB+ECC+RAM+server&amp;tag=popularai-20">256GB ECC RAM</a>, with 512GB if you run many services, larger datasets, or CPU-side workloads. For storage, use a <a href="https://www.amazon.com/s?k=4TB+NVMe+SSD&amp;tag=popularai-20">4TB NVMe SSD</a> for models and active work, plus backup storage.</p><p>For software, use Linux. <code>llama.cpp</code> is useful for flexible local inference. vLLM is strong for serving and tensor parallelism. TabbyAPI or ExLlamaV2 can be excellent for fast quantized model serving where appropriate. ComfyUI remains the image workflow hub for many local AI users.</p><p>For chassis, use a <a href="https://www.amazon.com/s?k=4U+server+chassis+GPU&amp;tag=popularai-20">4U server chassis</a>, open-frame lab rig, or workstation chassis designed around the exact GPU layout. This is a local AI appliance, not a casual desktop.</p><h3>Best 8x RTX 3090 build direction</h3><p>A sensible 8x build starts by questioning itself.</p><p>If you still want it, the realistic direction is EPYC 9004, EPYC 9005, or a proven multi-GPU server platform. The chassis should be a purpose-built <a href="https://www.amazon.com/s?k=4U+8+GPU+server+chassis&amp;tag=popularai-20">4U GPU server chassis</a> or an equivalent open-frame lab setup. The GPUs should be eight two-slot blower or water-cooled cards. Avoid random triple-slot open-air cards unless the layout is built around them.</p><p>For memory, <a href="https://www.amazon.com/s?k=512GB+ECC+server+memory&amp;tag=popularai-20">512GB ECC RAM</a> is the minimum target, with 1TB or more if the workloads justify it. For power, plan on server-grade delivery, likely 240V, with measured draw and thermal monitoring. For networking, use at least a <a href="https://www.amazon.com/s?k=10GbE+network+card&amp;tag=popularai-20">10GbE network card</a> if the box serves other machines or stores datasets on a NAS.</p><p>Most readers who just need a simple 10GbE RJ45 upgrade should start with the <a href="https://www.amazon.com/TP-Link-TX401-Ethernet-Supports-Including/dp/B08D71PVXG?tag=popularai-20">TP-Link TX401</a>. It is cheap, widely available, and avoids the fake-server-card lottery.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RP0G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RP0G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RP0G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RP0G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RP0G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RP0G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg" width="494" height="340.8546824542519" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:641,&quot;width&quot;:929,&quot;resizeWidth&quot;:494,&quot;bytes&quot;:68837,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RP0G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RP0G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RP0G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RP0G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597e20e6-ac47-4290-911d-7aee8e99e101_929x641.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/TP-Link-TX401-Ethernet-Supports-Including/dp/B08D71PVXG?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find TP-Link TX401 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/TP-Link-TX401-Ethernet-Supports-Including/dp/B08D71PVXG?tag=popularai-20"><span>Find TP-Link TX401 deals on Amazon</span></a></p><p>Linux is the only sane operating system choice here. Use containers, monitoring, reproducible model-serving configs, and remote management from day one.</p><p>For a Linux AI server, a single-port Intel X550-based card such as this <a href="https://www.amazon.com/10Gtek-X550-T1-Converged-Network-Compatible/dp/B076P9PPWN?tag=popularai-20">10Gtek X550-T1-style adapter</a> is the cleaner pick than a bargain-bin no-name NIC.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/10Gtek-X550-T1-Converged-Network-Compatible/dp/B076P9PPWN?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_dWW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_dWW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_dWW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_dWW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_dWW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg" width="502" height="311.57466666666664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:931,&quot;width&quot;:1500,&quot;resizeWidth&quot;:502,&quot;bytes&quot;:174319,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/10Gtek-X550-T1-Converged-Network-Compatible/dp/B076P9PPWN?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_dWW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_dWW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_dWW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_dWW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a542f1-1a77-426d-a168-f9d35e326eb9_1500x931.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/10Gtek-X550-T1-Converged-Network-Compatible/dp/B076P9PPWN?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find Intel X550-AT2 card deals (Amazon)&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/10Gtek-X550-T1-Converged-Network-Compatible/dp/B076P9PPWN?tag=popularai-20"><span>Find Intel X550-AT2 card deals (Amazon)</span></a></p><p>If you need two 10GbE RJ45 ports, use a dual-port Intel X550-based card like the <a href="https://www.amazon.com/10Gtek-X550-T2-Converged-Network-Compatible/dp/B074Z27TXB?tag=popularai-20">10Gtek X550-T2-style adapter</a>, but do not buy dual-port just because it looks more serious.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/10Gtek-X550-T2-Converged-Network-Compatible/dp/B074Z27TXB?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PBUU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PBUU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PBUU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PBUU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PBUU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg" width="524" height="317.8933333333333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1500,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:144117,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/10Gtek-X550-T2-Converged-Network-Compatible/dp/B074Z27TXB?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PBUU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PBUU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PBUU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PBUU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62868d57-2cab-4900-a6e0-7cd0078be590_1500x910.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/10Gtek-X550-T2-Converged-Network-Compatible/dp/B074Z27TXB?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find dual-port X550-AT2 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/10Gtek-X550-T2-Converged-Network-Compatible/dp/B074Z27TXB?tag=popularai-20"><span>Find dual-port X550-AT2 deals on Amazon</span></a></p><p>The honest alternative is two 4x nodes. That can be easier to power, cool, move, maintain, and recover when one card or riser fails.</p><h3>How RTX 3090 servers compare with newer GPUs</h3><p>Newer GPUs are faster and more efficient. That does not automatically make them better buys for this specific job.</p><p>The <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4090/">RTX 4090</a> still has 24GB of GDDR6X like the RTX 3090, but it offers much higher performance and better efficiency. NVIDIA&#8217;s RTX 4090 page lists 24GB GDDR6X and 16,384 CUDA cores. A <a href="https://www.amazon.com/s?k=GeForce+RTX+4090&amp;tag=popularai-20">GeForce RTX 4090</a> is better if you need speed, but it does not solve the 24GB ceiling.</p><p>The <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/">RTX 5090</a> moves to 32GB GDDR7, which is a real improvement for single-GPU local AI headroom. NVIDIA lists the RTX 5090 with 32GB of GDDR7 memory. A <a href="https://www.amazon.com/s?k=GeForce+RTX+5090&amp;tag=popularai-20">GeForce RTX 5090</a> is attractive for a high-end single-GPU or dual-GPU box, but it does not create a cheap 96GB VRAM server.</p><p>Professional GPUs change the discussion. A single <a href="https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/">RTX PRO 6000 Blackwell</a> gives 96GB ECC VRAM at 600W. That is the same memory capacity as 4x RTX 3090 in one professional card, with ECC and a cleaner form factor. The price will decide whether it is realistic, but for businesses and professional labs, it may be less painful than maintaining a used 4x RTX 3090 rig.</p><h3>What to buy</h3><blockquote><p>Buy <a href="https://www.amazon.com/s?k=4x+RTX+3090+24GB&amp;tag=popularai-20">4x RTX 3090</a> if you need a serious local AI server, can buy the cards cheaply, and are ready to build the machine like a server. It remains a viable 2026 route to 96GB aggregate VRAM.</p></blockquote><blockquote><p>Buy <a href="https://www.amazon.com/s?k=2x+RTX+3090+24GB&amp;tag=popularai-20">2x RTX 3090</a> if you want the best balance of value, sanity, and local capability. This is the stronger recommendation for most power users.</p></blockquote><blockquote><p>Buy <a href="https://www.amazon.com/s?k=GeForce+RTX+5090&amp;tag=popularai-20">1x RTX 5090</a> if you want a cleaner high-end desktop and 32GB VRAM is enough.</p></blockquote><blockquote><p>Buy <a href="https://www.amazon.com/s?k=RTX+PRO+6000+Blackwell&amp;tag=popularai-20">RTX PRO 6000 Blackwell</a> or <a href="https://www.amazon.com/s?k=NVIDIA+enterprise+GPU+used&amp;tag=popularai-20">used enterprise GPUs</a> if uptime, ECC, dense deployment, warranty, and fewer moving parts matter more than bargain hunting.</p></blockquote><blockquote><p>Skip <a href="https://www.amazon.com/s?k=8x+RTX+3090+GPU+server&amp;tag=popularai-20">8x RTX 3090</a> unless you know exactly why you need it and where it will live.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MPUM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MPUM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!MPUM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!MPUM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!MPUM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MPUM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2281124,&quot;alt&quot;:&quot;RTX 3090 local AI server guide: 4 GPUs make sense, 8 rarely do&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/201898156?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 3090 local AI server guide: 4 GPUs make sense, 8 rarely do" title="RTX 3090 local AI server guide: 4 GPUs make sense, 8 rarely do" srcset="https://substackcdn.com/image/fetch/$s_!MPUM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!MPUM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!MPUM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!MPUM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f176f00-74bc-4e6b-a5e1-9ddbd5bbd9bd_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A 4x RTX 3090 server can still be smart for local AI, but 8x is a serious power and cooling project. Here is what to buy. &#169; Popular AI</figcaption></figure></div><h3>FAQ</h3><h4>Can 4x RTX 3090 run a 70B model?</h4><blockquote><p>Yes, depending on quantization, context length, backend, and how the model is split. A 70B model that is uncomfortable or impossible on one 24GB card can become practical across multiple cards. Do not expect the experience to feel like one giant GPU.</p><div><hr></div></blockquote><h4>Can 8x RTX 3090 run huge models locally?</h4><blockquote><p>It can run larger workloads than 4x, but &#8220;can load&#8221; and &#8220;pleasant to serve&#8221; are different things. For very large models, interconnect, tensor parallelism, KV cache, context length, and backend support matter as much as aggregate VRAM.</p><div><hr></div></blockquote><h4>Do RTX 3090s need NVLink for local LLMs?</h4><blockquote><p>No. Multi-GPU local inference can work without NVLink in tools such as llama.cpp and vLLM. NVLink can help specific workloads, but it is not a magic memory-pooling switch.</p><div><hr></div></blockquote><h4>Is EPYC better than <a href="https://www.amazon.com/s?k=AMD+Threadripper+Pro&amp;tag=popularai-20">Threadripper Pro</a> for this build?</h4><blockquote><p>EPYC is usually better for a server. Threadripper Pro is usually better for a workstation. For 4x GPUs, both can work. For 8x, EPYC or a real GPU server platform is the cleaner answer.</p><div><hr></div></blockquote><h4>Is PCIe x8 enough for RTX 3090 local AI?</h4><blockquote><p>Often yes for inference, especially when the model stays resident on the GPUs. PCIe bandwidth matters more when the workload constantly moves data between CPU and GPU or when multi-GPU communication becomes the bottleneck. Do not assume gaming PCIe benchmarks answer this.</p><div><hr></div></blockquote><h4>Should this run Windows or Linux?</h4><blockquote><p>Linux. Windows can work for smaller local AI setups, but a 4x or 8x GPU server should be built around Linux, remote management, reproducible environments, and stable CUDA tooling.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>A <a href="https://www.amazon.com/s?k=4x+RTX+3090+GPU+server&amp;tag=popularai-20">4x RTX 3090 server</a> is still worth building for local AI in 2026 if you are deliberately buying cheap VRAM rather than chasing a polished workstation. Build it on <a href="https://www.amazon.com/s?k=AMD+EPYC+server+platform&amp;tag=popularai-20">EPYC</a> or <a href="https://www.amazon.com/s?k=Threadripper+Pro+workstation&amp;tag=popularai-20">Threadripper Pro</a>, use Linux, choose cards that can actually be cooled, power-limit the GPUs, and stop at 4x unless you have a real reason to go further.</p><p>An <a href="https://www.amazon.com/s?k=8x+RTX+3090+GPU+server&amp;tag=popularai-20">8x RTX 3090 server</a> is usually the wrong next step. If 4x is not enough, compare two smaller nodes, newer <a href="https://www.amazon.com/s?k=32GB+GPU+AI&amp;tag=popularai-20">32GB consumer GPUs</a>, <a href="https://www.amazon.com/s?k=used+enterprise+GPU+NVIDIA&amp;tag=popularai-20">used enterprise cards</a>, <a href="https://www.amazon.com/s?k=RTX+PRO+6000+Blackwell&amp;tag=popularai-20">RTX PRO 6000-class hardware</a>, or burst cloud rental before committing to a 3kW used-GPU heat machine.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/4x-8x-rtx-3090-server-local-ai-2026/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/4x-8x-rtx-3090-server-local-ai-2026/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Why Claude AI wants this man in prison]]></title><description><![CDATA[Claude AI reversed two Dutch legal phrases in a high-stakes translation. Here is what went wrong and how to verify AI legal output.]]></description><link>https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove</link><guid isPermaLink="false">https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Sat, 13 Jun 2026 17:37:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fGV6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fGV6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fGV6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!fGV6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!fGV6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!fGV6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fGV6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2101763,&quot;alt&quot;:&quot;Claude, Dries Van Langenhove, and the AI translation risk&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/201883048?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Claude, Dries Van Langenhove, and the AI translation risk" title="Claude, Dries Van Langenhove, and the AI translation risk" srcset="https://substackcdn.com/image/fetch/$s_!fGV6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!fGV6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!fGV6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!fGV6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5415504d-d52c-4d54-a498-a4f29700fb87_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Dries Van Langenhove says Claude flipped key legal wording. The bigger issue is AI translation, hallucination, and model opacity. Image credit: <a href="https://x.com/DVanLangenhove">Dries Van Langenhove/X</a>. Avatar used in collage for editorial context.</figcaption></figure></div><p>Claude allegedly flipped the meaning of two Dutch legal phrases. That may sound like a small translation problem, but in a legal context, a single reversed negation can change the entire meaning of a document.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Dries Van Langenhove, a Belgian remigration activist and anti-establishment political activist, <a href="https://x.com/DVanLangenhove/status/2065538783991205902">says Claude failed a simple Dutch-to-English legal translation in a way that made the text say the opposite of what it meant</a>. In a post on <a href="https://x.com/DVanLangenhove/status/2065538783991205902">X</a>, he said he gave Claude a paragraph from a Dutch legal document and asked for a translation. According to his account, Claude turned a sentence meaning there was no indication of guilt into a claim that strong evidence existed against him, then turned &#8220;no reason to prosecute&#8221; into &#8220;no reason not to prosecute.&#8221;</p><p>That is the kind of AI error that looks small until you imagine it happening in a court filing, a journalist&#8217;s source document, a policy memo, or a private legal dispute. Translations like these can be grammatically smooth, legally polished, and completely wrong where they matter most.</p><p>The underlying legal document was not included with the public post, so the exact translation failure cannot be independently reproduced from the screenshot alone. But the reported failure pattern fits several known AI weaknesses: hallucination, negation failure, translation drift, model priors, and the opacity of hosted AI systems.</p><div><hr></div><h4><em><strong>More on legal AI applications:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9f0bd406-fe26-4a2f-b81f-83a79851a9ab&quot;,&quot;caption&quot;:&quot;Imagine a world where speech is regulated by people who can&#8217;t distinguish truth from fiction, and who don&#8217;t even care to. That world arrived in a Minnesota courtroom last fall, when a professor of &#8220;misinformation studies&#8221; used ChatGPT to invent legal precedent supporting a law that criminalizes deepfakes.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;&#8220;Misinformation expert&#8220; cites AI hallucinations as legal precedent&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362091076,&quot;name&quot;:&quot;Ben Geudens&quot;,&quot;bio&quot;:&quot;The one guy who reads the methodology section. &#127963;&#65039; Philosophy &#129504;Logic &#128220; History &#128396;&#65039; Art &#9889; Technology &#128509; Freedom &#128200; Economics &#129304;Rock 'n' Roll&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/417e99a9-0ecb-4a9e-8776-708770d1cd0c_324x324.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-12-03T15:29:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!DTqT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd56260b-20bf-4ee3-88cb-594fe35b4820_1312x736.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/misinformation-expert-cites-ai-hallucinations-as-legal-precedent&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:168956358,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Quick takeaways</h3><blockquote><p>The reported error fits several documented AI failure modes, including hallucination, negation failure, translation hallucination, and model priors overriding source text.</p></blockquote><blockquote><p>There is no public evidence that Claude was specifically programmed to invert this translation because of Van Langenhove&#8217;s politics.</p></blockquote><blockquote><p>The more serious issue is that a fluent hosted model can silently change legal meaning, then explain itself afterward in a way that sounds convincing.</p></blockquote><blockquote><p>For legal translation, AI output should be treated as a draft with a source-aligned audit trail, never as a final authority.</p></blockquote><blockquote><p>Local AI alternatives help with privacy, repeatability, and account independence, but they do not eliminate hallucinations.</p></blockquote><div><hr></div><h3>Who Dries Van Langenhove is, and why the case is getting attention</h3><p>Readers outside Belgium may not know Van Langenhove. He is a Belgian remigration activist, and his <a href="https://www.instagram.com/dvanlangenhove/?hl=en">Instagram profile</a> presents him to followers in that political context.</p><p>That matters because the alleged translation error did not happen in a neutral school exercise. It involved a legal document, a politically active public figure, and an AI system whose output could influence how non-Dutch speakers understand the document.</p><p>Van Langenhove says he gave Claude a single paragraph from a Dutch legal text and asked for a translation. He says the only extra context was that the translation mattered for a European Parliament presentation. According to his post, Claude inverted the meaning of two key phrases.</p><p>The first phrase, &#8220;<em>geen aanwijzing bestaat van schuld,</em>&#8221; means that no indication of guilt exists. Claude allegedly rendered it as <em>overwhelming evidence existing against him</em>. The second phrase, &#8220;<em>geen reden is tot vervolging,</em>&#8221; means there is no reason for prosecution. Claude allegedly added the opposite force by inserting &#8220;not.&#8221;</p><p>That kind of mistake is dangerous because it&#8217;s not readily apparent that there is a mistake at all. The translation may look perfectly fluent and coherent. A reader who does not know Dutch may never notice that the English version has reversed what the source originally claimed.</p><p>The screenshot also appears to show Claude acknowledging the mistake afterward. That is useful as a product artifact, but it should not be treated as a forensic explanation of what happened inside the model. Anthropic&#8217;s own <a href="https://platform.claude.com/docs/en/build-with-claude/extended-thinking">extended thinking documentation</a> describes thinking features as offering varying levels of transparency into reasoning behavior. In other words, a displayed explanation can help users debug a mistake, but it is not guaranteed to be a complete record of the internal cause.</p><h3>What the translation needed to preserve</h3><p>The core issue is negation.</p><p>A legal translator must preserve who is accusing whom, what the document says, and whether the sentence affirms or denies a legal basis. The difference between &#8220;no indication of guilt&#8221; and &#8220;evidence of guilt&#8221; is not a mere stylistic issue. It is the meaning of the sentence.</p><p>The same is true for &#8220;<em>no reason to prosecute</em>&#8221; and &#8220;<em>no reason not to prosecute</em>.&#8221; In ordinary writing, that would be a bad error. In a legal document, it can alter the reader&#8217;s understanding of whether a case is being dismissed, supported, questioned, or escalated.</p><p>This is why AI translation failures deserve more attention than awkward wording. A model can preserve grammar, tone, and legal style while corrupting the core claim the reader actually needs.</p><p>Legal translation is unforgiving because the smallest words often carry the highest stakes. &#8220;No,&#8221; &#8220;not,&#8221; &#8220;unless,&#8221; &#8220;except,&#8221; &#8220;without,&#8221; and &#8220;failed to&#8221; can decide whether a sentence says someone did something, did not do something, may have done something, or should not be accused of doing something.</p><p>A fluent model can fail in exactly that spot.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>The best explanation is a stack of known AI failure modes</h3><p>The strongest evidence-based explanation is not that Claude decided to target Van Langenhove personally because it has been hard-coded to do so. The more likely explanation is that the task combined several known large language model weaknesses in one high-stakes place.</p><p>Claude can hallucinate. Anthropic says so directly. The company&#8217;s <a href="https://support.claude.com/en/articles/8525154-claude-is-providing-incorrect-or-misleading-responses-what-s-going-on">Claude Help Center</a> warns that Claude can produce incorrect or misleading responses, and that users should avoid relying on it as a single source of truth for high-stakes advice.</p><p>Anthropic&#8217;s <a href="https://www.anthropic.com/legal/consumer-terms">consumer terms</a> go further. They warn that outputs can contain material inaccuracies even when they appear accurate because of detail or specificity. That warning matters here because legal translation often feels trustworthy when it sounds precise.</p><p>The reported failure also looks like a classic negation problem. The model appears to have mishandled &#8220;geen,&#8221; the Dutch word for &#8220;no&#8221; or &#8220;not any,&#8221; then produced an English version that reversed the legal meaning.</p><p>That is not an obscure edge case. Research on machine translation has found that negation can significantly reduce translation quality, with some translation directions showing quality reductions of more than 60 percent. The paper <a href="https://arxiv.org/abs/2010.05432">&#8220;It&#8217;s not a Non-Issue: Negation as a Source of Error in Machine Translation&#8221;</a> focused directly on this problem.</p><p>More recent research on large language models reaches a similar conclusion. The paper <a href="https://arxiv.org/html/2503.22395v2">&#8220;Negation: A Pink Elephant in the Large Language Models&#8217; Room?&#8221;</a> describes negation as a persistent reliability challenge. The authors note that LLMs can struggle to distinguish facts from their negations, misunderstand negative particles, and fail to handle negation robustly even after instruction tuning.</p><p>That matches the reported Claude failure pattern. The model did not merely choose a weak synonym. It appears to have lost the logical polarity of the sentence.</p><h3>Translation hallucinations are especially hard to catch</h3><p>Large translation models can produce fluent text that departs from the source. A paper on <a href="https://arxiv.org/abs/2303.16104">hallucinations in large multilingual translation models</a> warned that hallucinated translations can undermine trust and create safety concerns when these systems are deployed in real-world settings.</p><p>Legal translation raises the stakes even higher. A 2024 article in the <em>International Journal of Language &amp; Law</em> on <a href="https://www.languageandlaw.eu/jll/article/view/172">applying large language models in legal translation</a> notes that specialized translation remains difficult to automate, especially when terminology and legal precision matter.</p><p>That is the trap. Claude may be impressive at many translation tasks. It may produce a clean, natural English paragraph. It may even use legal vocabulary better than a casual bilingual speaker.</p><p>Still, none of that makes it a certified legal translator.</p><p>A bad machine translation is easiest to notice when it sounds broken. The more dangerous version sounds polished. The reader does not see missing source alignment. He is only presented with confident English and assumes the model must have preserved the meaning.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove/comments"><span>Leave a comment</span></a></p><h3>Training data may have influenced the output</h3><p>The most sensitive part of the analysis is whether Van Langenhove&#8217;s identity mattered.</p><p>There is no available evidence that Claude was instructed or hard-coded to intentionally mistranslate legal text against him, so we&#8217;ll avoid speculation on that front.</p><p>Yet an alternative explanation may be even more damning: a model may carry prior associations around a public figure&#8217;s name. Van Langenhove is an anti-establishment political activist and could be considered a controversial figure. As such, his name appears broadly in political commentary, social media posts, legal reporting, and argument-heavy online material. A model trained on large amounts of internet text may have encountered those associations.</p><p>That does not prove conscious intent on the AI model&#8217;s part, or that of Anthropic developers. The model likely recognized a plausible path to &#8220;complete&#8221; a narrative that existed in its training data, overriding the incentive to strictly translate the words in front of it.</p><p>Research on <a href="https://arxiv.org/abs/2404.16032">context-memory conflicts in large language models</a> shows that models can fail to update their answers when provided context conflicts with their internal knowledge. Another paper on <a href="https://arxiv.org/abs/2603.09654">the interplay between parametric and contextual knowledge</a> summarizes the problem plainly: models often need to integrate provided context with knowledge stored in their weights, but they can ignore context when it conflicts with what the model has already learned.</p><p>That gives a better explanation than &#8220;Claude hates this person.&#8221; If the model&#8217;s training method results in learned associations around someone&#8217;s name, pointing toward accusation, prosecution, or legal conflict, those associations could make the model pull a translation toward an expected narrative. From the outside, a result like that may look the model itself is ideologically biased. And, in a way, it is. Cases like these may prove that overwhelming ideological bias in a model&#8217;s training data, context handling, and alignment layers mechanically result in statistical failure modes.</p><p>The practical problem is the same either way. The user receives a fluent legal translation that may have been influenced by narratives in its training data: information outside the provided snippet itself.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TK4a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TK4a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!TK4a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!TK4a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!TK4a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TK4a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2066092,&quot;alt&quot;:&quot;Claude AI legal translation error: why one alleged flip matters&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/201883048?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Claude AI legal translation error: why one alleged flip matters" title="Claude AI legal translation error: why one alleged flip matters" srcset="https://substackcdn.com/image/fetch/$s_!TK4a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!TK4a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!TK4a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!TK4a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0986207-bf03-4a4a-abdd-2dc4cdf260ef_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A reported Claude legal translation error shows why fluent AI output can be dangerous when negation, law, and politics collide. &#169; Popular AI</figcaption></figure></div><h3>Alignment shapes Claude, but did it cause this error?</h3><p>Anthropic has been unusually public about the idea that Claude is shaped by a model constitution. In its post on <a href="https://www.anthropic.com/news/claude-new-constitution">Claude&#8217;s new constitution</a>, the company says the constitution is part of the training process and directly shapes Claude&#8217;s behavior.</p><p>That matters because Claude is not a neutral dictionary. It is a trained, aligned, policy-shaped system. Its behavior is governed by model weights, training data, human feedback, system instructions, safety policy, product updates, and deployment choices.</p><p>Those layers can make the model safer and more useful. They can also make failure modes harder to audit.</p><p>The evidence here does not support the stronger claim that Claude has a public, documented instruction to mistranslate legal text against remigration activists or anti-establishment political activists. We can&#8217;t honestly prove or disprove whether that is the case at all. However, what we can definitely say is this: users cannot inspect the full training data, model weights, hidden system prompts, classifier behavior, or model update history.</p><p>When a hosted model fails in a politically charged legal task, the user sees the output and simply cannot inspect the machinery that produced it.</p><h3>If it happens once, it&#8217;s a coincidence&#8230;</h3><p>One flipped negation could be a one-off sloppy translation error. Two polarity flips in the same direction suggest a stronger pattern.</p><p>The likely pattern is narrative completion. Claude may have treated the paragraph less like a source text to translate and more like a legal story to complete. Once the name and legal setting activated a prosecution frame, the model may have smoothed both sentences into the kind of English it expected to see.</p><p>That is how large language models often fail. They do not always fail by producing gibberish. They often fail by producing the most plausible-looking wrong answer.</p><p>This also explains why the error is hard to catch. A human translator who misunderstands &#8220;geen&#8221; would probably produce awkward or inconsistent output. An LLM can produce polished legal English that hides the break with the original meaning.</p><p>For readers, that is the key warning here. Don&#8217;t confuse fluency with fidelity. A translation can read beautifully while betraying the source.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove/comments"><span>Leave a comment</span></a></p><h3>Was this a &#8216;woke&#8217; political hallucination?</h3><p>That depends on what the phrase is meant to claim.</p><p>If it means Claude has a visible public rule telling it to mistranslate documents against Van Langenhove, there is no public evidence for that.</p><p>If it means Claude may carry political, reputational, safety, and training-data priors that can distort outputs around public political figures, there is definitely reason to believe that. LLMs encode patterns from training data. They are shaped further by human feedback, safety policies, system instructions, model updates, and refusal rules.</p><p>Those layers can improve behavior, but they can also create opaque failure modes where, for example, one-sided curation of training data bakes ideological bias into the model.</p><p>The practical point remains that a cloud model can make a politically consequential legal error while sounding calm, competent, and certain. That is enough reason to demand verification.</p><h3>Hosted AI is useful, but it is rented capability</h3><p>The control lever here is hosted model behavior.</p><p>With Claude, the user does not control the model version in the same way he controls a local file. He cannot inspect the training corpus. He cannot freeze every hidden instruction. He cannot verify whether a product update changed translation behavior. He cannot independently reproduce the exact model state later if the service changes.</p><p>Anthropic&#8217;s <a href="https://www.anthropic.com/legal/consumer-terms">consumer terms</a> say the company may change, add, or remove features, change limits, or stop offering services. That does not make Claude useless. It means hosted AI is rented capability.</p><p>For casual tasks, that tradeoff is usually fine. For legal translation, evidence review, politically sensitive work, journalism, court filings, source protection, or internal investigations, the lack of an auditable local record becomes a real workflow risk.</p><p>A user can save the prompt and output. That is useful. It is still not the same as preserving the full model, system prompt, safety stack, inference settings, and version history.</p><h3>What this means for everyday users</h3><p>The immediate lesson is simple: never trust a single AI translation for high-stakes text.</p><p>A safer workflow starts with a literal translation. Ask the model to translate sentence by sentence, preserve word order where possible, and avoid smoothing the text into a polished legal conclusion.</p><p>Then require source alignment. Each English sentence should sit next to the original source sentence. Every negation word should be marked. If the source says &#8220;no,&#8221; &#8220;not,&#8221; &#8220;without,&#8221; or &#8220;nothing,&#8221; the translation should make that visible.</p><p>Users should also explicitly tell the model not to use outside knowledge about any person, event, case, or political context named in the text. That instruction does not guarantee compliance, but it reduces the chance that the model will fill gaps from memory.</p><p>A better prompt for legal translation would be:</p><pre><code><code>Translate the Dutch text into English literally.

Rules:
- Do not infer legal context.
- Do not use outside knowledge about any person named in the text.
- Preserve every negation exactly.
- For each sentence, show:
  1. Original Dutch
  2. Literal English translation
  3. Negation words in the Dutch sentence
  4. Whether the sentence affirms or denies guilt, prosecution, or evidence
  5. Any wording that is ambiguous

If you are unsure, say so. Do not smooth the sentence into a legal conclusion.</code></code></pre><p>After that, compare the result with a second model or a dedicated translation tool. For legal use, a human translator or lawyer should verify the final wording before it appears in a presentation, filing, testimony, article, or public statement.</p><p>That may sound tedious. It is less tedious than discovering that an AI system reversed the meaning after the translation has already been used.</p><h3>What this means for courts and lawyers</h3><p>AI hallucinations in legal work are no longer theoretical. Reuters reported in 2025 that <a href="https://www.reuters.com/technology/artificial-intelligence/ai-hallucinations-court-papers-spell-trouble-lawyers-2025-02-18/">AI-generated legal fiction had led courts to question or discipline lawyers</a> in multiple cases over two years. The lesson from those cases was direct: lawyers using AI must verify their filings.</p><p>The problem has continued. Reuters reported in June 2026 that a federal judge in Mississippi <a href="https://www.reuters.com/legal/litigation/judge-rules-both-sides-lawsuit-misused-ai-disqualifies-lawyers-2026-06-09/">disqualified attorneys on both sides of a lawsuit after unverified AI-generated research led to fabricated legal citations</a> in court filings. The judge said lawyers may use AI tools, but they must verify material submitted to the court.</p><p>Translation adds another layer of risk. Fake case citations can sometimes be detected by searching legal databases. A bad translation can be harder to spot because it may require knowledge of the source language. If the English output is polished, readers may assume the source said what the AI says it said.</p><p>Courts, lawyers, journalists, and policymakers should not accept AI translations unless they include source text, sentence alignment, and human verification. If the translation changes rights, guilt, liability, intent, prosecution status, or legal exposure, it needs a human check.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Popular AI&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Popular AI</span></a></p><h3>Cloud AI also raises privacy and retention questions</h3><p>Cloud AI services have three practical weaknesses in this kind of workflow.</p><p>First, the model is not fully auditable. You can save the prompt and response, but you cannot fully inspect why the model produced that response.</p><p>Second, data handling matters. Anthropic&#8217;s privacy center says consumer chats and coding sessions may be used to improve Claude if the user allows it, if conversations are flagged for safety review, or if the user otherwise opts in. The same <a href="https://privacy.claude.com/en/articles/10023580-is-my-data-used-for-model-training">Claude privacy page on model training</a> says chat and coding session data used for improvement can include the full related conversation.</p><p>Third, retention rules matter. Anthropic&#8217;s <a href="https://privacy.claude.com/en/articles/10023548-how-long-do-you-store-my-data">data retention page</a> says deleted consumer conversations are removed from chat history immediately and deleted from back-end storage within 30 days. It also says data may be retained in de-identified form for up to five years if the user allows model improvement, and that inputs and outputs flagged by trust and safety classifiers may be retained for up to two years, with classification scores retained for up to seven years.</p><p>For a restaurant recommendation, that may be acceptable. For a legal document, political strategy memo, client file, source material, internal investigation, or unpublished reporting, it is a serious design question.</p><p>The issue is not that no one should use cloud AI. The issue is that users need to understand the tradeoff before uploading sensitive documents.</p><h3>What local AI alternatives change</h3><p>Local AI does not make models magically truthful. A local model can hallucinate, mistranslate, mishandle negation, and produce confident nonsense.</p><p>What local AI changes is control.</p><p>A local workflow can keep sensitive documents off a hosted account. It can preserve the exact model file, prompt, system instructions, and version used for a translation. It can be tested repeatedly on the same source text. It can be compared against other local models without sending the document to another company.</p><p>For users deciding whether owned hardware is worth it, Popular AI has covered the privacy, control, offline access, and cost tradeoffs in its guide to <a href="https://www.popularai.org/p/is-local-ai-hardware-worth-it-2026">buying local AI hardware in 2026</a>.</p><p>The local path can also be combined with file-aware workflows. Popular AI&#8217;s <a href="https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent">GGUF Loader Agentic Mode guide</a> covers a local coding and agent workflow without relying on cloud accounts.</p><p>For translation, the practical setup is usually hybrid. Use a hosted frontier model when speed and quality matter and the document is not sensitive. Use a local LLM or local translation engine when privacy, repeatability, or account independence matters. Use two independent systems when the text is high-stakes. Bring in a human expert before publication, filing, testimony, or political use.</p><p>This doesn&#8217;t mean you have to stop using Claude, or cloud-based AI altogether. The point is to stop confusing its signature linguistic fluency as proof of a job well done.</p><div><hr></div><h4><em><strong>More on local AI versus cloud-based AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;533c0fe5-a85e-4928-ac85-052cde1aada7&quot;,&quot;caption&quot;:&quot;This year, the local AI hardware question finally got serious. A recent r/LocalLLaMA Reddit thread asked the question many newcomers are quietly thinking: why spend real money on local AI hardware when a&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Should you buy local AI hardware in 2026? The honest answer&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-12T14:42:57.114Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!g2y0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ba02143-e5b9-477a-95fe-9d37ba7d41be_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/is-local-ai-hardware-worth-it-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:197354970,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>The deeper lesson for power users</h3><p>The Van Langenhove example is useful because it is easy to understand. Two legal negations appear to have been flipped. The mistake is visible, consequential, and politically charged.</p><p>But the same failure mode can happen in quieter settings.</p><p>A contract clause may say a company is not liable. A medical note may say there is no evidence of a condition. A compliance memo may say a firm is not under investigation. A source document may say a person did not do something. A financial report may say a risk did not materialize.</p><p>In each case, the dangerous error may be small. One word. One polarity flip. One confident sentence that sounds right.</p><p>The most dangerous AI errors are often not wild inventions. They are small reversals inside fluent prose.</p><p>Power users should build workflows that assume this can happen. Keep the source document. Demand sentence-level grounding. Save prompts and outputs. Compare models. Mark negation words. Use local tools for sensitive files. Bring in human review when the stakes are real.</p><p>Rented intelligence is useful. Auditable capability is safer when the consequences matter.</p><div><hr></div><h3>FAQ</h3><p><strong>Did Claude intentionally mistranslate Dries Van Langenhove&#8217;s legal document?</strong></p><blockquote><p>There is no public evidence that Claude intentionally targeted Van Langenhove or was specifically programmed to invert the translation. The more supportable explanation is a mix of known LLM failure modes: attempts to complete a narrative that was repeated in its training data, hallucination, negation failure, translation hallucination, and possible prior associations around a public political figure.</p><div><hr></div></blockquote><p><strong>Is this only a Claude problem?</strong></p><blockquote><p>No. Claude is the model in the reported incident, but negation failures, hallucinations, context-memory conflicts, and legal AI mistakes are broader LLM problems. Anthropic deserves scrutiny because Claude is the tool involved, but the workflow lesson applies to ChatGPT, Gemini, Perplexity, local LLMs, and specialized legal AI tools too.</p><div><hr></div></blockquote><p><strong>Can prompt engineering prevent this?</strong></p><blockquote><p>Prompt engineering can reduce the risk, especially when the prompt forces literal translation, sentence alignment, source quotes, negation marking, and no outside knowledge. It cannot replace human verification in legal work.</p><div><hr></div></blockquote><p><strong>Are local models better for legal translation?</strong></p><blockquote><p>Local models are better for privacy, repeatability, and control. They are not automatically better at translation quality. A strong hosted model may produce a better first draft. A local workflow is valuable when the document is sensitive, when cloud retention is unacceptable, or when the user needs a reproducible audit trail.</p><div><hr></div></blockquote><p><strong>Should courts allow AI translations?</strong></p><blockquote><p>AI can help prepare drafts, but courts should require source text, sentence alignment, translator verification, and responsibility from a human professional. A fluent AI translation should not be treated as evidence unless it can be audited.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>Treat this incident as a warning, not as proof of an automatic AI blacklist against certain political activists.</p><p>If the screenshot accurately reflects what happened, Claude made a severe legal translation error by reversing two negated statements. The best documented explanation is a stack of known problems: possible prior narratives around a public political figure in the training data, hallucination, weak negation handling, translation drift, and the opacity of hosted AI systems.</p><p>Use Claude for legal translation drafts only with strict grounding. Do not use it as the final authority. For sensitive work, build a workflow that includes source-aligned output, independent checks, human review, and a local fallback.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/claude-ai-legal-translation-error-dries-van-langenhove/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Should you buy the RTX 5060 Ti 16GB or RX 9070 XT for local AI?]]></title><description><![CDATA[Choosing a GPU for local LLMs? Here&#8217;s when to buy the RTX 5060 Ti 16GB, when the RX 9070 XT makes sense, and what to avoid.]]></description><link>https://www.popularai.org/p/rtx-5060-ti-16gb-vs-rx-9070-xt-local-ai</link><guid isPermaLink="false">https://www.popularai.org/p/rtx-5060-ti-16gb-vs-rx-9070-xt-local-ai</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Tue, 09 Jun 2026 13:57:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8I-C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8I-C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8I-C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!8I-C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!8I-C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!8I-C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8I-C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2147363,&quot;alt&quot;:&quot;RTX 5060 Ti 16GB vs RX 9070 XT: the better GPU for local AI&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200932637?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5060 Ti 16GB vs RX 9070 XT: the better GPU for local AI" title="RTX 5060 Ti 16GB vs RX 9070 XT: the better GPU for local AI" srcset="https://substackcdn.com/image/fetch/$s_!8I-C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!8I-C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!8I-C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!8I-C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f26b214-ad49-4856-a55e-95ae05312f8e_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">RTX 5060 Ti 16GB vs RX 9070 XT for local AI, gaming, ComfyUI, Ollama, CUDA, ROCm, and first-time local LLM buyers. &#169; Popular AI</figcaption></figure></div><p>A recent <a href="https://www.reddit.com/r/LocalLLaMA/comments/1rla0dx/9070xt_560_or_5060_ti_16gb_520_for_local_llm/">r/LocalLLaMA thread about the RTX 5060 Ti 16GB and RX 9070 XT</a> captured the exact GPU dilemma many first-time local AI buyers are facing in 2026.</p><p>The choice looks simple at first. One card is an <strong>RTX 5060 Ti 16GB around $520</strong>. The other is an <strong>RX 9070 XT around $560</strong>. Both have 16GB of VRAM. Both can run local LLMs. Both can handle gaming. Both are close enough in price that the wrong choice can feel painful.</p><p>The real decision is deeper than the spec sheet.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/rtx-5060-ti-16gb-vs-rx-9070-xt-local-ai?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/rtx-5060-ti-16gb-vs-rx-9070-xt-local-ai?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>The RTX 5060 Ti 16GB is the safer local AI card because it gives you CUDA, broad tutorial support, and fewer software surprises. The RX 9070 XT is the stronger gaming card and has more memory bandwidth, but it asks more from the buyer, especially if you are on Windows and want every local AI tool to work with minimal setup.</p><p>For a first local LLM build, the best default pick is still the <strong>RTX 5060 Ti 16GB</strong>. If you are ready to buy, compare a specific <a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">PNY RTX 5060 Ti 16GB on Amazon</a> while checking current street prices elsewhere.</p><p>The <strong>RX 9070 XT</strong> becomes more interesting if gaming matters as much as inference, if you are Linux-friendly, or if you find it at a meaningful discount. For that route, compare a specific <a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">GIGABYTE Radeon RX 9070 XT 16GB on Amazon</a> against local pricing before buying.</p><div><hr></div><h4><em><strong>More on local AI GPUs:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b55c2673-454b-4364-85c4-d8483a16ae0a&quot;,&quot;caption&quot;:&quot;Running a local model sounds wonderfully simple. One box. One model. No API bill. No usage cap. No surprise account lockout.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to choose the right local LLM for 8GB, 12GB, and 24GB VRAM&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-15T14:18:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!CEOc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6a71d4f-7366-4a02-86b4-2d5471da6e55_2560x1507.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/how-to-choose-the-right-local-llm-for-8gb-12gb-and-24gb-vram&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191511400,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>The quick buying answer</h3><blockquote><p><a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">Buy the </a><strong><a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">RTX 5060 Ti 16GB</a></strong> if this is your first local AI GPU, especially if you are using Windows. CUDA remains the smoother path through Ollama, llama.cpp, PyTorch, ComfyUI, Stable Diffusion workflows, AI coding models, and the many GitHub projects that quietly assume NVIDIA first.</p></blockquote><blockquote><p><a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">Buy the </a><strong><a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">RX 9070 XT</a></strong> if gaming performance matters as much as local LLM inference, if you are comfortable dealing with AMD&#8217;s software stack, and if the price is close enough to make the stronger hardware appealing.</p></blockquote><blockquote><p>For image generation, the safer pick is still the <strong>RTX 5060 Ti 16GB</strong>. The current <a href="https://docs.comfy.org/installation/system_requirements">ComfyUI system requirements</a> include AMD ROCm paths and experimental RX 9000 support, but NVIDIA CUDA remains the lower-friction option for most ComfyUI users.</p></blockquote><blockquote><p>For Linux tinkerers, the <strong>RX 9070 XT</strong> deserves more respect than older Radeon cards. AMD&#8217;s <a href="https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html">ROCm Linux system requirements</a> list the RX 9070 XT as supported, and Ollama&#8217;s <a href="https://docs.ollama.com/gpu">GPU hardware support documentation</a> lists RX 9070 XT support through ROCm on Linux.</p></blockquote><blockquote><p>The card to be careful with in this exact price fight is the <strong>RTX 5070 12GB</strong>. It is a faster gaming and compute card than the RTX 5060 Ti, but local LLM buyers should be cautious about paying similar money for less VRAM. NVIDIA&#8217;s <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5070-family/">RTX 5070 family specs</a> list 12GB of GDDR7, while the RTX 5060 Ti is available in 16GB and 8GB versions. If you are tempted anyway, compare a specific <a href="https://www.amazon.com/dp/B0DS6V1YSY?tag=popularai-20">ASUS Prime RTX 5070 on Amazon</a>, then ask whether 12GB is enough for the models you actually want to run.</p></blockquote><h3>Who should use this guide</h3><p>This guide is for someone buying one GPU for a mixed local AI and gaming PC.</p><p>That means local LLMs in Ollama, LM Studio, or llama.cpp. It also means ComfyUI, Stable Diffusion style image generation, light LoRA experiments, AI coding models, Whisper-style transcription, private document chat, RAG experiments, and gaming on the same machine.</p><p>It is aimed at the buyer who wants to try local AI seriously, but does not want to build a full multi-GPU workstation yet. If you already know you need 24GB or more of VRAM, this comparison changes. You should also be looking at a used <a href="https://www.amazon.com/dp/B08J6GMWCQ?tag=popularai-20">RTX 3090 on Amazon</a>, an <a href="https://www.amazon.com/dp/B0BH8MK76C?tag=popularai-20">RTX 4090 on Amazon</a>, an <a href="https://www.amazon.com/dp/B0DT7GBNWQ?tag=popularai-20">RTX 5090 on Amazon</a>, or a professional AMD card such as a <a href="https://www.amazon.com/dp/B0FXTRGHL9?tag=popularai-20">Radeon AI PRO R9700 on Amazon</a>.</p><p>Popular AI has a separate guide to the <a href="https://www.popularai.org/p/best-budget-gpus-local-llms-2026">best budget GPUs for local LLMs in 2026</a>, which is worth reading if you are comparing used cards, 24GB cards, and budget AI builds more broadly.</p><h3>VRAM matters more than gaming charts</h3><p>For local AI, GPU buying rules are different from normal gaming buying rules.</p><p>Gaming benchmarks usually reward raw frame rates, raster performance, ray tracing, power efficiency, and price per frame. Local AI cares about those things too, but it starts with a harder limit: VRAM.</p><p>If the model does not fit in VRAM, the rest of the card&#8217;s performance matters much less. You may be forced into a smaller model, a heavier quantization, CPU offload, shorter context, or a slower workflow.</p><p>That is why both the RTX 5060 Ti 16GB and RX 9070 XT are in the conversation. Sixteen gigabytes is the practical entry point for a modern local AI hobbyist who wants more flexibility than an 8GB or 12GB card can offer.</p><p>It is still a compromise. A 16GB GPU will not make huge local models feel effortless. You will still care about quantization. You will still make choices about context size. You will still bump into limits if you try to run larger models, big image-generation workflows, or multiple AI tools at once.</p><p>But 16GB is enough to learn, experiment, and build useful workflows. That makes the RTX 5060 Ti 16GB and RX 9070 XT much more attractive than cheaper 8GB cards for local AI.</p><h3>Software support is where NVIDIA still wins</h3><p>After VRAM, the next deciding factor is software support. This is where NVIDIA still has the clearest advantage.</p><p>PyTorch&#8217;s <a href="https://pytorch.org/get-started/locally/">local install guidance</a> separates NVIDIA CUDA and AMD ROCm paths, and that split matters in daily use. A huge amount of local AI software, documentation, troubleshooting, and community advice starts with the assumption that you have an NVIDIA GPU.</p><p>That does not mean AMD cannot run local AI. The RX 9070 XT is much more credible than older Radeon cards, especially on Linux. It means a beginner is more likely to hit fewer strange errors on the NVIDIA path.</p><p>CUDA is boring in the best possible way. It is the default path that many projects test first. When a tutorial says &#8220;install the CUDA version,&#8221; a new user with an RTX 5060 Ti 16GB is usually following the path of least resistance.</p><p>ROCm is improving fast, and AMD deserves credit for that. But even when ROCm works, the user often needs to think harder about operating system support, backend choice, driver versions, model compatibility, and whether a specific tool&#8217;s AMD support is mature or experimental.</p><p>For a first local AI GPU, fewer decisions can be worth more than a spec-sheet win.</p><h3>RTX 5060 Ti 16GB: the safer first local AI GPU</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z4WK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg 424w, https://substackcdn.com/image/fetch/$s_!z4WK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg 848w, https://substackcdn.com/image/fetch/$s_!z4WK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!z4WK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z4WK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg" width="546" height="390.6046511627907" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:646,&quot;width&quot;:903,&quot;resizeWidth&quot;:546,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z4WK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg 424w, https://substackcdn.com/image/fetch/$s_!z4WK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg 848w, https://substackcdn.com/image/fetch/$s_!z4WK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!z4WK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85a4348c-2b2b-4553-b0a3-980851318536_903x646.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 5060 Ti 16GB deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20"><span>Find RTX 5060 Ti 16GB deals on Amazon</span></a></p><p>The RTX 5060 Ti 16GB is not the most exciting graphics card on paper. Its 128-bit memory bus makes it look less impressive than the RX 9070 XT, and it is not the card most gamers would pick if gaming performance were the only goal.</p><p>For local AI beginners, though, it has the three things that matter most: 16GB of VRAM, CUDA, and lower power draw.</p><p>NVIDIA lists the RTX 5060 Ti family with fifth-generation Tensor cores, PCIe Gen 5, CUDA capability 12.0, and 16GB or 8GB GDDR7 configurations in its <a href="https://www.nvidia.com/en-eu/geforce/graphics-cards/50-series/rtx-5060-family/">RTX 5060 family specifications</a>. PNY&#8217;s <a href="https://www.pny.com/geforce-rtx-5060-ti-16gb-models">GeForce RTX 5060 Ti 16GB listing</a> lists 16GB of GDDR7, a 128-bit bus, 448 GB/s of memory bandwidth, and a 600W recommended system power figure on its model page.</p><p>The important part is the user experience. Most local AI tutorials still assume NVIDIA first. Many ComfyUI workflows are tested on CUDA first. Many PyTorch examples are written around CUDA. Many GitHub issues have more NVIDIA answers than AMD answers.</p><p>That support gap matters when you are new. A beginner does not just need theoretical performance. A beginner needs the model to load, the driver to behave, the Python environment to work, and the tool to use the GPU without turning the weekend into a driver hunt.</p><p>The RTX 5060 Ti 16GB is the better choice if you are on Windows, if you want the least annoying first Ollama or llama.cpp setup, if you care about ComfyUI, if you expect to follow tutorials, and if you want a lower-power card.</p><p>It is less compelling if gaming is the main goal, if you can find a clean used RTX 3090 near the same budget, if you are comfortable with Linux and ROCm, or if you know you need 24GB or more VRAM.</p><blockquote><p>For buyers who want the safest 16GB local AI card from this comparison, start by checking a specific <a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">RTX 5060 Ti 16GB Amazon listing</a> and compare it against Newegg, Micro Center, Best Buy, and used-market pricing.</p></blockquote><h3>RX 9070 XT: stronger hardware with more setup risk</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A116!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A116!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A116!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A116!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A116!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg" width="546" height="269.15492957746477" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:595,&quot;width&quot;:1207,&quot;resizeWidth&quot;:546,&quot;bytes&quot;:117107,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A116!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A116!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A116!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A116!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a22980c-5e28-465f-8bc5-263c7b757181_1207x595.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RX 9070 XT deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20"><span>Find RX 9070 XT deals on Amazon</span></a></p><p>The RX 9070 XT is the stronger gaming GPU and the more muscular piece of hardware in this comparison.</p><p>AMD lists the <a href="https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html">Radeon RX 9070 XT</a> with 16GB of GDDR6, a 256-bit memory interface, up to 640 GB/s of memory bandwidth, 64 compute units, 128 AI accelerators, and 304W typical board power. On paper, that gives the AMD card a clear bandwidth and gaming-performance advantage over the RTX 5060 Ti 16GB.</p><p>That is why the RX 9070 XT is tempting for local LLM inference. In the Reddit thread, commenters often framed the RTX 5060 Ti 16GB as the safer LLM pick, while the original poster eventually bought the RX 9070 XT after finding it for less than the 5060 Ti and focusing more on inference than image generation.</p><p>That is a reasonable decision under the right conditions.</p><p>The RX 9070 XT makes sense if you also care about gaming, if you mainly want text generation rather than image generation, if you are willing to run Linux for the better ROCm path, or if the AMD card is cheaper than the RTX 5060 Ti 16GB in your region.</p><p>It makes less sense as a first local AI card on Windows. The tools exist, but the path can be less predictable. Ollama documents AMD support and Vulkan support in its <a href="https://docs.ollama.com/gpu">GPU hardware support page</a>, but the cleanest beginner path remains NVIDIA CUDA.</p><blockquote><p>If you want the AMD route, compare a specific <a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">RX 9070 XT Amazon listing</a> with local retail and make sure your preferred local AI tools support the backend you plan to use.</p></blockquote><h3>The real price problem</h3><p>At the Reddit prices, the decision is close.</p><p>The RTX 5060 Ti 16GB was around $520. The RX 9070 XT was around $560. The RTX 5070 was also around $560.</p><p>A $40 gap is small enough that software support should decide the local AI purchase. For most first-time local LLM buyers, that points to the RTX 5060 Ti 16GB.</p><p>Current U.S. retail pricing can look very different from a single Reddit snapshot. In early June 2026, <a href="https://www.newegg.com/p/pl?d=rtx+5060+ti+16gb&amp;srsltid=AfmBOooTfdCZ-AQmlY1fdjh2apuDdsh9Xc1LNRFa29Q-_5Q3KVf23-Q3">Newegg search results for RTX 5060 Ti 16GB cards</a> showed listings that had already moved above the original $520 reference point. RX 9070 XT pricing can also swing widely depending on region, model, stock, and seller.</p><p>That means the rule is simple.</p><p>If the RTX 5060 Ti 16GB and RX 9070 XT are close in price, buy the RTX 5060 Ti 16GB for local AI.</p><p>If the RX 9070 XT is meaningfully cheaper and you are mainly doing LLM inference, AMD becomes defensible.</p><p>If the RX 9070 XT costs much more, do not buy it for local AI alone. At that point, you are paying for gaming performance and stronger hardware, not the easiest AI experience.</p><p>If RTX 5060 Ti 16GB pricing climbs too close to used 24GB NVIDIA cards, stop and compare again. A good used RTX 3090 can be a better local LLM buy because VRAM matters so much.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>CUDA versus ROCm for beginners</h3><p>This whole debate comes down to CUDA versus ROCm.</p><p>CUDA gives the RTX 5060 Ti 16GB its biggest practical advantage. It is the platform most local AI software expects. It is where tutorials, GitHub issues, install commands, and troubleshooting answers are easiest to find.</p><p>ROCm gives AMD a serious path into local AI, and the RX 9070 XT benefits from that progress. AMD&#8217;s ROCm support is much better than it was during the old &#8220;avoid Radeon for AI&#8221; era. The RX 9070 XT appearing in AMD&#8217;s current Linux support documentation is a meaningful step forward.</p><p>llama.cpp also helps AMD users because it supports multiple acceleration backends. The <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md">llama.cpp build documentation</a> covers backends such as CUDA, HIP, Vulkan, Metal, OpenCL, and others. That flexibility makes the AMD option more credible than it would be in a CUDA-only world.</p><p>Even so, the two cards are not equally easy for beginners.</p><p>The NVIDIA card gives up some raw hardware strength in exchange for a smoother software path.</p><p>The AMD card gives you stronger gaming performance and better listed bandwidth, but it asks you to care more about backend support, driver maturity, Linux versus Windows behavior, and tool-specific compatibility.</p><p>For experienced users, that tradeoff can be worth it. For a first local LLM setup, the easier stack is often the better buy.</p><h3>ComfyUI and image generation still favor NVIDIA</h3><p>If image generation matters, the safest advice is to buy NVIDIA unless the AMD deal is too good to ignore.</p><p>ComfyUI&#8217;s current <a href="https://docs.comfy.org/installation/system_requirements">system requirements page</a> lists NVIDIA with stable PyTorch CUDA support, AMD Linux with ROCm support, and experimental AMD Windows and Linux support for RDNA 3, RDNA 3.5, and RDNA 4 hardware, including RX 9000 series cards.</p><p>That is real progress. It means an RX 9070 XT can be part of a ComfyUI setup. It does not mean the experience is as simple as buying an NVIDIA card and following the most common CUDA instructions.</p><p>AMD also publishes <a href="https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/advanced/advancedryz/linux/comfyui/installcomfyui.html">ComfyUI installation steps for Radeon and Ryzen through ROCm</a>, including a Python virtual environment, PyTorch ROCm wheels, cloning ComfyUI, and launching the app locally.</p><p>That path is fine for someone who likes to tinker. It is less ideal for someone who wants to install ComfyUI, download a workflow, and start generating images without thinking about backend details.</p><p>For image generation, the RTX 5060 Ti 16GB remains the safer 16GB pick. It may not be the fastest card in every workload, but it gives you the compatibility advantage that matters most when running community workflows, custom nodes, and tutorials.</p><h3>Ollama, llama.cpp, and LM Studio are more flexible</h3><p>For simple local LLM inference, both cards can make sense.</p><p>Ollama supports NVIDIA broadly and lists RX 9070 XT support through ROCm on Linux in its <a href="https://docs.ollama.com/gpu">hardware support documentation</a>. It also points to Vulkan as another path for additional GPU support on Windows and Linux.</p><p>llama.cpp is especially important because it gives users many backend options. That makes AMD more practical for inference than it would be if every tool required CUDA.</p><p>LM Studio and other local AI apps can also hide some of the setup complexity, depending on the build, operating system, and backend support available at the time you install them.</p><p>The difference is predictability. With NVIDIA, you are more likely to use the default path. With AMD, you may need to choose between ROCm, HIP, Vulkan, a specific build, or a specific operating system.</p><p>That can be perfectly fine for power users. It can be frustrating for a first card.</p><h3>The RTX 5070 problem</h3><p>The RTX 5070 was part of the original Reddit comparison, and it is easy to see why buyers are tempted by it. Around the same price, it looks like a stronger NVIDIA card than the RTX 5060 Ti.</p><p>For gaming and some compute workloads, that may be true. For local LLMs, the 12GB VRAM limit is the problem.</p><p>NVIDIA&#8217;s <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5070-family/">RTX 5070 specs</a> list 12GB of GDDR7 on the RTX 5070. The RTX 5060 Ti 16GB gives you 4GB more VRAM, and that matters more than many new buyers expect.</p><p>For local LLMs, choose 16GB over 12GB unless you already know your models, context sizes, quantizations, and workloads fit comfortably inside 12GB.</p><p>The RTX 5070 can still be a good gaming card. It can still run AI workloads. But if your goal is a first local LLM machine, do not ignore VRAM just because the GPU tier number is higher.</p><p>If you are considering the 5070 anyway, compare a specific <a href="https://www.amazon.com/dp/B0DS6V1YSY?tag=popularai-20">RTX 5070 buy page on Amazon</a> and weigh it against 16GB and 24GB alternatives before deciding.</p><h3>Used RTX 3090: the alternative that can beat both</h3><p>The best alternative to both cards is often a clean used RTX 3090.</p><p>The reason is simple: 24GB of VRAM plus CUDA.</p><p>For local LLMs, that extra 8GB over a 16GB card can matter more than a newer architecture. It opens up larger models, more comfortable context sizes, and fewer compromises. It also keeps you in the NVIDIA CUDA ecosystem, which is still the easiest path for most local AI users.</p><p>Popular AI&#8217;s guide to a <a href="https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026">budget local AI PC built around a used RTX 3090</a> frames the RTX 3090 as the classic first serious local LLM build when the price is right.</p><p>The catch is the used market. RTX 3090 prices are often too high, and used GPUs carry risk. You need to think about seller history, return policy, warranty, thermals, mining history, and whether the card fits your case and power supply.</p><p>If a used RTX 3090 is close to the price of a new RTX 5060 Ti 16GB, the 3090 can be the better local AI card. If it costs much more or looks risky, the RTX 5060 Ti 16GB becomes easier to recommend.</p><p>For comparison shopping, start with a specific <a href="https://www.amazon.com/dp/B08J6GMWCQ?tag=popularai-20">RTX 3090 Amazon listing</a>, then compare against reputable used-market listings with strong buyer protection.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8H8Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8H8Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!8H8Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!8H8Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!8H8Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8H8Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2234970,&quot;alt&quot;:&quot;RTX 5060 Ti 16GB or RX 9070 XT? Best GPU for local LLMs&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200932637?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="RTX 5060 Ti 16GB or RX 9070 XT? Best GPU for local LLMs" title="RTX 5060 Ti 16GB or RX 9070 XT? Best GPU for local LLMs" srcset="https://substackcdn.com/image/fetch/$s_!8H8Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!8H8Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!8H8Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!8H8Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcbce22c-28d2-40f1-86ea-d41450c95fc9_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">NVIDIA CUDA or AMD ROCm? Compare the RTX 5060 Ti 16GB and RX 9070 XT for local AI, image generation, and gaming in 2026. &#169; Popular AI</figcaption></figure></div><h3>What about RTX 4090, RTX 5090, Radeon AI PRO, and Intel Arc?</h3><p>If you can spend more, the RTX 4090 and RTX 5090 move you into a different class of card.</p><p>The <a href="https://www.amazon.com/dp/B0BH8MK76C?tag=popularai-20">RTX 4090 on Amazon</a> can still be attractive for serious local AI users because it has 24GB of VRAM and CUDA, but pricing can be brutal. The <a href="https://www.amazon.com/dp/B0DT7GBNWQ?tag=popularai-20">RTX 5090 on Amazon</a> pushes even higher with 32GB-class flagship pricing, which makes it a very different purchase from a midrange 16GB card.</p><p>AMD&#8217;s workstation options are also worth watching. A <a href="https://www.amazon.com/dp/B0FXTRGHL9?tag=popularai-20">Radeon AI PRO R9700 on Amazon</a> gives you a professional AMD route with more VRAM than the RX 9070 XT, but it is not the same kind of simple beginner recommendation as a CUDA card.</p><p>Intel Arc B-series cards also deserve attention, especially when price and VRAM are attractive. An <a href="https://www.amazon.com/dp/B0DNV4NWF7?tag=popularai-20">Intel Arc B580 on Amazon</a> can be interesting for budget buyers, particularly through Vulkan and llama.cpp. For a beginner who wants the least friction, though, Intel is still not the default recommendation over NVIDIA.</p><p>These alternatives matter because they keep the RTX 5060 Ti 16GB and RX 9070 XT in perspective. A 16GB card is a useful starting point. It is not the comfort tier for serious local LLM work.</p><h3>Recommended buying rules</h3><blockquote><p><a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">Buy the </a><strong><a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">RTX 5060 Ti 16GB</a></strong> if local AI is the main goal. This is the safest answer for a beginner because CUDA support saves time, especially on Windows.</p></blockquote><blockquote><p><a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">Buy the </a><strong><a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">RX 9070 XT</a></strong> if gaming and LLM inference are both important. This card makes more sense when local AI is one workload among several and you are comfortable checking backend support.</p></blockquote><blockquote><p>Buy the <strong>RX 9070 XT</strong> if it is cheaper and you are Linux-friendly. At equal prices, NVIDIA wins for convenience. At a meaningful AMD discount, the value math changes.</p></blockquote><blockquote><p>Skip both if you can afford a good 24GB card. For serious local LLM use, 24GB is a much better tier. A clean used RTX 3090 can beat both cards for local AI practicality when the price is right.</p></blockquote><blockquote><p>Do not overpay for 16GB. The RTX 5060 Ti 16GB is useful, but it is still a compromise. If prices climb too close to used 24GB NVIDIA cards, compare again before buying.</p></blockquote><h3></h3><div><hr></div><h3>FAQ</h3><h4>Is the RTX 5060 Ti 16GB good for local LLMs?</h4><blockquote><p>Yes. The RTX 5060 Ti 16GB is a good first local LLM card because it combines 16GB of VRAM with CUDA support. You will still use quantized models, and you will still make tradeoffs, but the software path is easier than most non-NVIDIA options.</p><div><hr></div></blockquote><h4>Is the RX 9070 XT bad for local AI?</h4><blockquote><p>No. The RX 9070 XT is much more credible for local AI than older AMD gaming cards, especially on Linux with ROCm. AMD&#8217;s ROCm documentation lists the RX 9070 XT in its current Linux support material, and Ollama documents RX 9070 XT support through ROCm on Linux.</p><p>The issue is not whether it can work. The issue is whether it is the easiest first GPU for a beginner. For most Windows users, NVIDIA still wins that part.</p><div><hr></div></blockquote><h4>Which card is better for ComfyUI?</h4><blockquote><p>The RTX 5060 Ti 16GB is the safer pick. ComfyUI supports AMD paths, including experimental support for RX 9000 series hardware, but NVIDIA CUDA remains the easier default for most image-generation workflows.</p><div><hr></div></blockquote><h4>Which card is better for gaming?</h4><blockquote><p>The RX 9070 XT is the better gaming card. AMD lists a 256-bit memory interface and up to 640 GB/s of bandwidth for the RX 9070 XT, while PNY&#8217;s RTX 5060 Ti 16GB listing shows a 128-bit bus and 448 GB/s bandwidth.</p><p>If gaming matters as much as AI, that stronger Radeon hardware becomes a real reason to consider the AMD card.</p><div><hr></div></blockquote><h4>Is 16GB enough for local AI in 2026?</h4><blockquote><p>It is enough to start, and it is much better than 8GB or 12GB for local LLM flexibility. It is not the ideal comfort tier. If you plan to run larger models often, 24GB or more should be your long-term target.</p><div><hr></div></blockquote><h4>Should you buy AMD for local AI on Windows?</h4><blockquote><p>Only if you are comfortable troubleshooting. Ollama documents AMD and Vulkan support paths, and AMD support is improving, but the cleaner beginner path on Windows remains NVIDIA CUDA.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p><a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">Buy the </a><strong><a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">RTX 5060 Ti 16GB</a></strong> if your main goal is trying local LLMs for the first time, especially on Windows. It is the safer CUDA choice, and that matters more than raw specs when you are still learning the local AI stack.</p><p><a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">Buy the </a><strong><a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">RX 9070 XT</a></strong> if you also care about gaming, if you are comfortable with Linux or AMD backend tinkering, and if the price is close enough to make the stronger hardware appealing.</p><p>At <strong>$520 for the RTX 5060 Ti 16GB versus $560 for the RX 9070 XT</strong>, the first-time local AI answer is still NVIDIA. At <strong>RX 9070 XT cheaper than the RTX 5060 Ti 16GB</strong>, the AMD card becomes a reasonable inference-first gamble.</p><p>For current U.S. buying options, compare the <a href="https://www.amazon.com/dp/B0F4Y6N6PW?tag=popularai-20">RTX 5060 Ti 16GB</a>, the <a href="https://www.amazon.com/dp/B0DS2QG2KW?tag=popularai-20">RX 9070 XT</a>, the <a href="https://www.amazon.com/dp/B0DS6V1YSY?tag=popularai-20">RTX 5070</a>, and a used-market <a href="https://www.amazon.com/dp/B08J6GMWCQ?tag=popularai-20">RTX 3090</a> before you buy. Prices change fast, and the best local AI GPU is often the one that gives you enough VRAM, the least software friction, and the fewest regrets for your actual workload.</p></div><p><em>Disclosure: Amazon affiliate links are included in this guide. Popular AI may earn from qualifying purchases.</em></p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/rtx-5060-ti-16gb-vs-rx-9070-xt-local-ai/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/rtx-5060-ti-16gb-vs-rx-9070-xt-local-ai/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[ComfyUI extra_model_paths.yaml not working? Here’s the fix]]></title><description><![CDATA[ComfyUI not finding checkpoints, LoRAs, VAEs, or upscalers? Here&#8217;s how to repair extra_model_paths.yaml and confirm it works.]]></description><link>https://www.popularai.org/p/fix-comfyui-extra-model-paths-yaml-not-working</link><guid isPermaLink="false">https://www.popularai.org/p/fix-comfyui-extra-model-paths-yaml-not-working</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Mon, 08 Jun 2026 14:03:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OswA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OswA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OswA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!OswA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!OswA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!OswA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OswA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1761435,&quot;alt&quot;:&quot;Fix ComfyUI model paths without moving your entire library&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200922223?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Fix ComfyUI model paths without moving your entire library" title="Fix ComfyUI model paths without moving your entire library" srcset="https://substackcdn.com/image/fetch/$s_!OswA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!OswA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!OswA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!OswA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5706dad9-34b8-4938-8340-cf093f45e266_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fix ComfyUI <code>extra_model_paths.yaml</code> errors by checking the right config file, YAML structure, folder keys, logs, and Desktop path issues. &#169; Popular AI</figcaption></figure></div><p>If ComfyUI <code>extra_model_paths.yaml</code> is not working, do not start by reinstalling ComfyUI or moving hundreds of gigabytes of models. The usual cause is much simpler: the file is in the wrong place, the YAML structure is wrong, the Desktop app is reading a different config file, or a node is looking for a more specific folder key than the one you mapped.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/fix-comfyui-extra-model-paths-yaml-not-working?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/fix-comfyui-extra-model-paths-yaml-not-working?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>The fix is to confirm which config file ComfyUI is actually loading, use the official nested YAML format, map each model type to the exact folder key ComfyUI expects, then restart and check the logs for <code>Adding extra search path</code>.</p><div><hr></div><h3>Quick answer</h3><p>The fastest fix is:</p><ol><li><p>Find the config file ComfyUI is actually using.</p></li><li><p>Back it up.</p></li><li><p>Make sure the file has a top-level profile name like <code>comfyui:</code>.</p></li><li><p>Put <code>base_path</code> and folder mappings under that profile.</p></li><li><p>Use exact folder keys, such as <code>checkpoints</code>, <code>loras</code>, <code>vae</code>, <code>text_encoders</code>, <code>diffusion_models</code>, <code>upscale_models</code>, and <code>latent_upscale_models</code>.</p></li><li><p>Restart ComfyUI.</p></li><li><p>Check the startup log for <code>Adding extra search path</code>.</p></li></ol><p>ComfyUI&#8217;s own example says to rename <code>extra_model_paths.yaml.example</code> to <code>extra_model_paths.yaml</code> and edit it to set model search paths. The same example also shows that <code>base_path</code> belongs under a named section, not at the top level by itself <a href="https://github.com/Comfy-Org/ComfyUI/blob/master/extra_model_paths.yaml.example">in the official sample file</a>.</p><div><hr></div><h3>What this error means</h3><p>ComfyUI does not simply scan every model file on your drive. It uses folder keys to decide where each loader should look. A checkpoint loader checks <code>checkpoints</code>. A LoRA loader checks <code>loras</code>. A latent upscaler may check <code>latent_upscale_models</code>.</p><p>That detail is what trips people up. A model can be on the right disk, inside the right general model library, and still be invisible to the node that needs it. The file has to sit in a path registered under the folder key that the loader actually uses.</p><p>The current ComfyUI path loader reads the YAML file, loops through each top-level config section, pulls settings such as <code>base_path</code> and <code>is_default</code>, then registers the mapped folders through <code>folder_paths.add_model_folder_path</code> <a href="https://github.com/Comfy-Org/ComfyUI/blob/master/utils/extra_config.py">inside </a><code>utils/extra_config.py</code>. If the YAML shape is wrong, ComfyUI can fail at startup, ignore the path you expected, or leave a node unable to find the model.</p><h3>Common causes</h3><p>The most common causes are:</p><ol><li><p>The file is still named <code>extra_model_paths.yaml.example</code>.</p></li><li><p>The file is in the wrong folder.</p></li><li><p>ComfyUI Desktop is using <code>extra_models_config.yaml</code> from an app data directory instead of the file you edited.</p></li><li><p>The YAML is missing a top-level profile name.</p></li><li><p>The indentation is wrong.</p></li><li><p>You mapped a model folder under the wrong key.</p></li><li><p>You used <code>upscale_models</code> when the node wants <code>latent_upscale_models</code>.</p></li><li><p>The model download UI saved the file to your browser downloads folder instead of your configured server path.</p></li><li><p>A custom node or workflow is using a folder key that your config does not define.</p></li></ol><div><hr></div><h4><em><strong>More on ComfyUI troubleshooting:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0c53729a-f62c-47b4-94ee-f1428c9c5b80&quot;,&quot;caption&quot;:&quot;ComfyUI update problems feel chaotic when you are the one living through them. One day a workflow runs fine. The next day you update ComfyUI, install a custom node, or refresh a Python package be&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Why ComfyUI updates break workflows and how to fix them&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-17T15:33:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!W1JS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7610438-53b9-49c6-b9f6-9f7ece4fa14f_2560x1567.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/why-comfyui-updates-break-workflows-and-how-to-fix-them&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191513180,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Fix 1: Confirm which config file ComfyUI is loading</h3><p>For the Windows portable or manual install, the normal path is:</p><pre><code><code>ComfyUI/extra_model_paths.yaml</code></code></pre><p>The official ComfyUI README says that, in the standalone Windows build, users can find the config file in the ComfyUI directory, rename it to <code>extra_model_paths.yaml</code>, and edit it to set model search paths <a href="https://github.com/Comfy-Org/ComfyUI#how-do-i-share-models-between-another-ui-and-comfyui">for sharing models between UIs</a>.</p><p>For ComfyUI Desktop, do not assume it is reading the same file. In one Desktop issue report, the launch arguments showed ComfyUI being started with an explicit <code>--extra-model-paths-config</code> value pointing to:</p><pre><code><code>C:\Users\[name]\AppData\Roaming\ComfyUI\extra_models_config.yaml</code></code></pre><p>That path was visible in the user&#8217;s startup arguments <a href="https://github.com/Comfy-Org/ComfyUI/issues/11242">in ComfyUI issue #11242</a>. The same report involved a latent upscale loader error where ComfyUI was looking for a model in <code>latent_upscale_models</code>, which is a useful reminder that Desktop path issues and folder-key issues can overlap.</p><p>Look in your startup log for something like:</p><pre><code><code>--extra-model-paths-config C:\Users\...\extra_models_config.yaml</code></code></pre><p>If that appears, edit that file instead of guessing. This is especially important after Desktop updates, because the app wrapper can control the launch arguments and user directory.</p><h3>Fix 2: Use the correct YAML structure</h3><p>A common broken version looks like this:</p><pre><code><code>base_path: D:/AI/
checkpoints: models/checkpoints/
loras: models/loras/
vae: models/vae/</code></code></pre><p>That looks reasonable, but it is wrong for ComfyUI&#8217;s expected config shape. A user reported this kind of top-level structure causing:</p><pre><code><code>TypeError: string indices must be integers, not 'str'</code></code></pre><p>The public bug report shows ComfyUI failing while loading <code>extra_model_paths.yaml</code> because the config was shaped like direct key-value pairs instead of a nested section <a href="https://github.com/Comfy-Org/ComfyUI/issues/11404">in issue #11404</a>.</p><p>Use this structure instead:</p><pre><code><code>comfyui:
  base_path: D:/AI/
  is_default: true

  checkpoints: models/checkpoints/
  loras: models/loras/
  vae: models/vae/

  text_encoders: |
    models/text_encoders/
    models/clip/

  diffusion_models: |
    models/diffusion_models/
    models/unet/

  upscale_models: models/upscale_models/
  latent_upscale_models: models/latent_upscale_models/
  clip_vision: models/clip_vision/
  controlnet: models/controlnet/
  embeddings: models/embeddings/
  audio_encoders: models/audio_encoders/
  model_patches: models/model_patches/</code></code></pre><p>The top-level name can be <code>comfyui</code>, <code>a1111</code>, <code>my_models</code>, or another label. The important part is that <code>base_path</code> and the folder keys sit under that label.</p><p>That nesting is what lets ComfyUI treat the first level as a profile, then read the folder mappings inside it. Without that profile layer, values like <code>base_path</code> can be treated like sections, which leads to confusing startup errors.</p><h3>Fix 3: Use exact folder keys</h3><p>ComfyUI has added and renamed folder conventions over time. The official example currently includes <code>text_encoders</code>, while still noting the legacy <code>models/clip/</code> location. It also includes <code>diffusion_models</code>, with <code>models/unet</code> listed as a supported older location <a href="https://github.com/Comfy-Org/ComfyUI/blob/master/extra_model_paths.yaml.example">in the example config</a>.</p><p>That means this is safer:</p><pre><code><code>text_encoders: |
  models/text_encoders/
  models/clip/</code></code></pre><p>And this is safer:</p><pre><code><code>diffusion_models: |
  models/diffusion_models/
  models/unet/</code></code></pre><p>Do not rely on old folder names alone when a newer workflow expects the newer key. Keeping both the current and legacy folder paths under the same key can make older workflows and newer workflows easier to run from one shared model library.</p><p>This also helps when a workflow was downloaded from someone else&#8217;s setup. Their node may expect a newer folder key even if your existing model library still uses an older folder name.</p><h3>Fix 4: Separate <code>latent_upscale_models</code> from <code>upscale_models</code></h3><p>This one is easy to miss.</p><p>A user reported that the LTX-2 latent upscaler was not being found even though their logs showed ComfyUI adding model paths. The fix in the issue comments was to stop placing the latent upscaler folder under <code>upscale_models</code> and define a separate <code>latent_upscale_models</code> key instead <a href="https://github.com/Comfy-Org/ComfyUI/issues/12004">in issue #12004</a>.</p><p>Use this:</p><pre><code><code>upscale_models: models/upscale_models/
latent_upscale_models: models/latent_upscale_models/</code></code></pre><p>Do not use this for latent upscalers:</p><pre><code><code>upscale_models: |
  models/upscale_models/
  models/latent_upscale_models/</code></code></pre><p>That may register both folders as regular upscale model paths, but a node that calls <code>latent_upscale_models</code> still will not find what it needs.</p><p>The key lesson is simple: a folder path can be valid and still be registered under the wrong category. When a loader says a model is missing, check the exact folder name in the error message. That folder name is usually the key you need in <code>extra_model_paths.yaml</code>.</p><h3>Fix 5: Use stable path formatting</h3><p>On Windows, use forward slashes in YAML paths. They work in Python paths and avoid accidental escape problems.</p><p>Use this:</p><pre><code><code>base_path: D:/AI/</code></code></pre><p>Avoid this unless you know exactly how YAML is parsing it:</p><pre><code><code>base_path: "D:\AI\"</code></code></pre><p>Also check:</p><ul><li><p>Use spaces, not tabs.<br></p></li><li><p>Keep indentation consistent.<br></p></li><li><p>Add trailing slashes for readability.<br></p></li><li><p>Do not put comments after paths inside multi-line blocks.<br></p></li><li><p>Make sure every mapped folder exists.<br></p></li><li><p>Make sure the model file extension is supported by that loader.<br></p></li></ul><p>YAML is strict about indentation, and path typos are easy to miss. If a folder is supposed to be relative to <code>base_path</code>, confirm that the combined path exists exactly as written. For example, <code>base_path: D:/AI/</code> plus <code>checkpoints: models/checkpoints/</code> points to <code>D:/AI/models/checkpoints/</code>.</p><h3>Fix 6: Restart and check the logs</h3><p>After editing the config, restart ComfyUI completely. Browser refresh is not enough.</p><p>For Windows portable, run from the portable folder:</p><pre><code><code>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build</code></code></pre><p>For a manual install:</p><pre><code><code>python main.py</code></code></pre><p>During startup, look for lines like:</p><pre><code><code>Adding extra search path checkpoints D:\AI\models\checkpoints
Adding extra search path loras D:\AI\models\loras
Adding extra search path latent_upscale_models D:\AI\models\latent_upscale_models</code></code></pre><p>If those lines do not appear, ComfyUI is not reading your file, the YAML is invalid, or you edited the wrong config.</p><p>Those startup lines are the most useful test because they prove ComfyUI loaded the config and registered the folder key. A model missing from the dropdown after those lines appear usually points to a category mismatch, unsupported file type, filename issue, or node-specific loader behavior.</p><h3>Fix 7: Rule out custom nodes only after the path file is correct</h3><p>If the config file is valid and the logs show the right paths, then test without custom nodes. ComfyUI&#8217;s official troubleshooting guide recommends starting ComfyUI with:</p><pre><code><code>python main.py --disable-all-custom-nodes</code></code></pre><p>For the Windows portable build, the docs show this form:</p><pre><code><code>.\python_embeded\python.exe -s ComfyUI\main.py --disable-all-custom-nodes</code></code></pre><p>ComfyUI&#8217;s documentation says that if the issue disappears with custom nodes disabled, a custom node is probably involved. If the issue persists, it is likely not a custom node issue <a href="https://docs.comfy.org/troubleshooting/custom-node-issues">according to the troubleshooting guide</a>.</p><p>This step belongs after the config check, not before it. Otherwise, you can waste time blaming a custom node when the real problem is a missing folder key or the Desktop app reading a different YAML file.</p><h3>Fix 8: Do not trust the missing-model download button to place files correctly</h3><p>If a workflow says models are missing and offers a download button, check where the file actually went.</p><p>An open ComfyUI issue reports that the Workflow Overview missing-models panel can trigger a browser download into the OS downloads folder instead of saving the model to the server-side folder configured in <code>extra_model_paths.yaml</code> <a href="https://github.com/Comfy-Org/ComfyUI/issues/13676">in issue #13676</a>.</p><p>The workaround from that issue is simple: copy the URL and download the file directly into the correct model folder.</p><p>Example:</p><pre><code><code>wget -P /mnt/ai/models/diffusion_models "MODEL_URL_HERE"</code></code></pre><p>On Windows, download manually into the mapped folder, such as:</p><pre><code><code>D:\AI\models\diffusion_models</code></code></pre><p>Then restart ComfyUI or refresh the model list if the node supports it.</p><p>This matters most for remote setups, network storage, NAS paths, and headless servers. In those cases, a browser download may land on your local machine while ComfyUI is running somewhere else.</p><h3>Fix 9: Be careful with ComfyUI Desktop updates</h3><p>ComfyUI Desktop and ComfyUI Core use different version numbers. As of June 6, 2026, ComfyUI Core&#8217;s latest GitHub release page showed <code>v0.24.0</code> dated June 3, 2026 <a href="https://github.com/Comfy-Org/ComfyUI/releases">on the core releases page</a>, while ComfyUI Desktop&#8217;s releases page showed <code>v0.9.4</code> dated May 28, 2026 and bundling ComfyUI <code>v0.22.3</code> <a href="https://github.com/Comfy-Org/desktop/releases">on the Desktop releases page</a>.</p><p>That matters because a Desktop update can change the bundled core, launch arguments, user directory, base directory, or config location.</p><p>ComfyUI&#8217;s frontend team also acknowledged in March 2026 that a number of releases had regressions, with workflows breaking and previously working behavior failing, then announced a stability-focused pause and stricter release gates <a href="https://github.com/Comfy-Org/ComfyUI_frontend/issues/10585">in issue #10585</a>. That does not prove every <code>extra_model_paths.yaml</code> issue is caused by a regression, but it does make one habit worth keeping: always save a copy of a working config before updating.</p><p>A backed-up config is especially useful if a Desktop update changes where the app looks for <code>extra_models_config.yaml</code>. When a setup breaks after an update, compare the new startup log with the last known-good startup log before moving any model folders.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Working example for a shared model folder</h3><p>Use this if your models live in <code>D:\AI\models</code>:</p><pre><code><code>shared_ai_models:
  base_path: D:/AI/
  is_default: true

  checkpoints: models/checkpoints/
  configs: models/configs/
  vae: models/vae/
  loras: models/loras/
  embeddings: models/embeddings/
  clip_vision: models/clip_vision/
  controlnet: models/controlnet/

  text_encoders: |
    models/text_encoders/
    models/clip/

  diffusion_models: |
    models/diffusion_models/
    models/unet/

  upscale_models: models/upscale_models/
  latent_upscale_models: models/latent_upscale_models/
  audio_encoders: models/audio_encoders/
  model_patches: models/model_patches/</code></code></pre><p>Folder layout:</p><pre><code><code>D:\AI
&#9492;&#9472;&#9472; models
    &#9500;&#9472;&#9472; checkpoints
    &#9500;&#9472;&#9472; configs
    &#9500;&#9472;&#9472; vae
    &#9500;&#9472;&#9472; loras
    &#9500;&#9472;&#9472; embeddings
    &#9500;&#9472;&#9472; clip_vision
    &#9500;&#9472;&#9472; controlnet
    &#9500;&#9472;&#9472; text_encoders
    &#9500;&#9472;&#9472; clip
    &#9500;&#9472;&#9472; diffusion_models
    &#9500;&#9472;&#9472; unet
    &#9500;&#9472;&#9472; upscale_models
    &#9500;&#9472;&#9472; latent_upscale_models
    &#9500;&#9472;&#9472; audio_encoders
    &#9492;&#9472;&#9472; model_patches</code></code></pre><p>This layout keeps the model library outside the ComfyUI install folder while still giving ComfyUI a clean map for each loader type. It also makes upgrades less painful because the app can be replaced without moving your checkpoints, LoRAs, VAEs, diffusion models, and upscalers.</p><h3>How to confirm it is fixed</h3><p>Your setup is fixed when all three are true:</p><ol><li><p>ComfyUI starts without a YAML traceback.</p></li><li><p>The startup log shows <code>Adding extra search path</code> for the folders you mapped.</p></li><li><p>The relevant loader node can see the model in its dropdown.</p></li></ol><p>If the log shows the path but the model still does not appear, check whether the model is in the right folder key for that exact node.</p><p>A <code>Load Checkpoint</code> node will not care about a model stored only under <code>diffusion_models</code>. A latent upscaler will not care about a model stored only under <code>upscale_models</code>. The folder key has to match the node.</p><p>When in doubt, read the error message literally. If it says the model is missing from <code>latent_upscale_models</code>, add or fix the <code>latent_upscale_models</code> mapping. If it says a checkpoint is missing, check <code>checkpoints</code>. The log and the loader name usually tell you where to look next.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1LUg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1LUg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!1LUg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!1LUg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!1LUg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1LUg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa080853-b340-4aed-ac8c-96908becd411_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1829391,&quot;alt&quot;:&quot;Why ComfyUI can&#8217;t find your models and how to fix it&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200922223?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Why ComfyUI can&#8217;t find your models and how to fix it" title="Why ComfyUI can&#8217;t find your models and how to fix it" srcset="https://substackcdn.com/image/fetch/$s_!1LUg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!1LUg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!1LUg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!1LUg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa080853-b340-4aed-ac8c-96908becd411_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Solve ComfyUI model path problems with the right YAML format, exact folder keys, Desktop config checks, and startup log validation. &#169; Popular AI</figcaption></figure></div><h3>Privacy, security, and local-control notes</h3><p>Local ComfyUI setups are valuable because they keep model files, workflows, and outputs on hardware you control. The ComfyUI README says the core can work fully offline and will not download anything unless the user chooses to, while optional API nodes can access paid external providers and can be disabled with <code>--disable-api-nodes</code> <a href="https://github.com/Comfy-Org/ComfyUI">in the ComfyUI README</a>.</p><p>For production or client work:</p><ul><li><p>Keep a backup of <code>extra_model_paths.yaml</code>.<br></p></li><li><p>Keep a copy outside the ComfyUI install folder.<br></p></li><li><p>Avoid editing <code>folder_paths.py</code> unless you are debugging or developing.<br></p></li><li><p>Prefer config files over source patches.<br></p></li><li><p>Save a known-good ComfyUI version before updating.<br></p></li><li><p>Keep your models in a dedicated folder outside app-managed directories.<br></p></li><li><p>Disable API nodes if you want to avoid accidental hosted-model calls.<br></p></li></ul><p>A clean model path setup is also easier to audit. If every model type has a clear folder, it is much simpler to back up configs, sync model libraries between machines, and recover after a failed app update.</p><div><hr></div><h3>FAQ</h3><h4>Where should <code>extra_model_paths.yaml</code> go?</h4><blockquote><p>For portable and manual installs, put it in the main ComfyUI folder. For ComfyUI Desktop, check the startup log for <code>--extra-model-paths-config</code>, because Desktop may point to a different config file.</p><div><hr></div></blockquote><h4>Why does ComfyUI say <code>TypeError: string indices must be integers, not 'str'</code>?</h4><blockquote><p>The most likely cause is invalid YAML structure. Put <code>base_path</code> and the folder keys under a top-level profile name like <code>comfyui:</code>.</p><div><hr></div></blockquote><h4>Why does the model appear in one node but not another?</h4><blockquote><p>Different nodes search different folder keys. A regular upscaler and a latent upscaler can look in different registered folders. Put latent upscalers under <code>latent_upscale_models</code>.</p><div><hr></div></blockquote><h4>Can I use symlinks instead?</h4><blockquote><p>Yes, symlinks can work, but they add another failure point. Use <code>extra_model_paths.yaml</code> first if you want a cleaner setup that survives moving or sharing model folders.</p><div><hr></div></blockquote><h4>Should I edit <code>folder_paths.py</code>?</h4><blockquote><p>Usually no. Use <code>extra_model_paths.yaml</code> unless you are debugging ComfyUI itself. Source edits can be overwritten by updates and make future troubleshooting harder.</p><div><hr></div></blockquote><h4>Why did this break after updating ComfyUI Desktop?</h4><blockquote><p>Desktop releases can change bundled ComfyUI versions and launch arguments. Check the Desktop release notes, check which config file is being passed at startup, and compare your current config with your last working copy.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>Fix <code>extra_model_paths.yaml</code> at the config level first. Do not move your whole model library and do not patch ComfyUI source unless you have confirmed a real core bug.</p><p>The safest working setup is a dedicated model folder, a backed-up YAML file, exact folder keys, and a startup-log check after every update. That keeps your ComfyUI workflow local, portable, and easier to recover when a desktop wrapper or core update changes behavior.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/fix-comfyui-extra-model-paths-yaml-not-working/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/fix-comfyui-extra-model-paths-yaml-not-working/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[How to fix Ollama CPU offloading and slow inference]]></title><description><![CDATA[Ollama suddenly slow? Learn how to diagnose CPU offloading, KV cache bloat, context size, and GPU layer splits.]]></description><link>https://www.popularai.org/p/fix-ollama-cpu-offloading-slow-inference</link><guid isPermaLink="false">https://www.popularai.org/p/fix-ollama-cpu-offloading-slow-inference</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Sun, 07 Jun 2026 14:14:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Dvbf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Dvbf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dvbf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!Dvbf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!Dvbf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!Dvbf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dvbf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2304239,&quot;alt&quot;:&quot;Fix slow Ollama inference by checking GPU, VRAM, and context&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200919624?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Fix slow Ollama inference by checking GPU, VRAM, and context" title="Fix slow Ollama inference by checking GPU, VRAM, and context" srcset="https://substackcdn.com/image/fetch/$s_!Dvbf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!Dvbf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!Dvbf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!Dvbf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceaf617d-f635-4907-bc6d-a4e6cb5295ed_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Diagnose Ollama CPU offloading, context bloat, GPU memory pressure, and slow local LLM performance before upgrading your hardware. &#169; Popular AI</figcaption></figure></div><p>If Ollama starts fast and then drops to painfully slow inference, the problem is usually not &#8220;Ollama is broken.&#8221; The likely cause is that the model, its context window, or the runtime KV cache no longer fits cleanly in GPU memory, so work spills into CPU and system RAM. That turns a local LLM from usable to miserable.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/fix-ollama-cpu-offloading-slow-inference?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/fix-ollama-cpu-offloading-slow-inference?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>The fix is to confirm the processor split with <code>ollama ps</code>, reduce context length, use a smaller or more efficient quantization, and check Ollama logs before reinstalling anything. <a href="https://docs.ollama.com/context-length">Ollama&#8217;s own context-length docs</a> say larger context windows require more memory and recommend avoiding CPU offload for best performance.</p><div><hr></div><h4><em><strong>More on bottlenecks with local LLMs:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;98337a8f-1e3f-48f6-afbf-3445e711c3ea&quot;,&quot;caption&quot;:&quot;Local inference sounds simple on paper. Download a model, point Ollama or llama.cpp at your GPU, and start chatting. Then the trap shows up. The model loads, but replies dribble out one token at a time, the first t&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Why Ollama and llama.cpp crawl when models spill into RAM, and how to fix it&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-16T15:15:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!fHgx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91019711-eaa2-4daf-b3f9-6b77a7229c81_2560x1369.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/why-ollama-and-llama-cpp-crawl-when-models-spill-into-ram-and-how-to-fix-it&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191486166,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Quick answer</h3><p>Run this first:</p><pre><code><code>ollama ps</code></code></pre><p>Look at the <code>PROCESSOR</code> column. If it does not say <code>100% GPU</code>, you are offloading at least part of the workload to CPU. <a href="https://docs.ollama.com/context-length">Ollama&#8217;s docs</a> specifically recommend checking the processor split with <code>ollama ps</code> when diagnosing context length and model offloading.</p><p>Then try the safe fixes in this order:</p><ol><li><p>Lower the context length.</p></li><li><p>Use a smaller quantized model.</p></li><li><p>Stop other GPU-heavy apps.</p></li><li><p>Check logs with debug enabled.</p></li><li><p>Update your GPU driver.</p></li><li><p>Only then consider more VRAM, more unified memory, or a different model.</p></li></ol><p>Do not start by reinstalling Ollama. Most slowdowns come from memory pressure, GPU discovery problems, or context settings.</p><div><hr></div><h3>What this problem means</h3><p>Ollama runs local models by loading model weights and runtime data into available hardware memory. When enough GPU memory is available, inference can stay on the GPU. When the model or context cannot fit, Ollama may run part of the workload on CPU or system RAM.</p><p>That fallback is functional, but slow.</p><p>The key source of confusion is that a model can fit at first, then slow down later as the context grows. A short chat with an 8K context may run well. A long coding session, RAG workflow, or agent task can push memory use higher because the model needs to keep more tokens available in memory.</p><p><a href="https://docs.ollama.com/context-length">Ollama&#8217;s current context-length page</a> says context length is the number of tokens the model can access in memory, and that increasing context length increases the memory required to run the model.</p><h3>Common causes</h3><h4>The model does not fully fit in VRAM</h4><p>A model that is too large for your GPU may still run, but not fully on the GPU. That means slower generation.</p><h4>The context window is too large</h4><p><a href="https://docs.ollama.com/context-length">Ollama now defaults context length based on available VRAM</a>: 4K context below 24 GiB VRAM, 32K context for 24 to 48 GiB, and 256K context for 48 GiB or more. The same page says large-context tasks like agents, web search, and coding tools should use at least 64K tokens, but it also warns that larger context length requires more memory.</p><p>That is the tradeoff. More context gives the model more working memory, but it can push the workload out of GPU memory.</p><h4>GPU discovery failed</h4><p>If Ollama cannot properly detect or initialize your GPU, it may fall back to CPU. <a href="https://docs.ollama.com/troubleshooting">Ollama&#8217;s troubleshooting docs</a> say it inventories GPUs at startup and recommends current NVIDIA drivers when discovery fails.</p><h4>AMD driver or ROCm support is wrong</h4><p><a href="https://docs.ollama.com/gpu">Ollama&#8217;s hardware support page</a> says AMD support depends on ROCm and lists supported AMD GPUs. Its <a href="https://docs.ollama.com/troubleshooting">troubleshooting page</a> also says AMD driver mismatches can cause GPU discovery failures and CPU fallback.</p><h4>The slowdown is hidden in logs</h4><p><a href="https://github.com/ollama/ollama/issues/14258">A GitHub issue opened on February 14, 2026</a> describes user confusion around GPU-to-CPU fallback, including cases where debug logs are the only clear indication that GPU layers failed to fit or were reduced.</p><p>That issue is not an official fix by itself. It is useful evidence of the failure mode users are seeing.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Fix 1: Check whether Ollama is actually using the GPU</h3><p>Run:</p><pre><code><code>ollama ps</code></code></pre><p>Expected good result:</p><pre><code><code>NAME             ID              SIZE      PROCESSOR    CONTEXT
gemma3:latest    a2af6cc3eb7f    6.6 GB    100% GPU     65536</code></code></pre><p><a href="https://docs.ollama.com/context-length">Ollama&#8217;s docs</a> show this exact kind of output and say to verify model offloading under <code>PROCESSOR</code>.</p><p>If you see a CPU/GPU split, or CPU only, the model is not fully running on the GPU.</p><h4>What to do next</h4><p>If the model is partially on CPU, reduce context length first. If it is CPU only, check GPU discovery and driver support.</p><h3>Fix 2: Reduce context length</h3><p>Context is useful, but it is not free. Large context windows can quietly eat memory and push the model into CPU offload.</p><p>Try a lower context length:</p><pre><code><code>OLLAMA_CONTEXT_LENGTH=4096 ollama serve</code></code></pre><p>Or use a more moderate setting:</p><pre><code><code>OLLAMA_CONTEXT_LENGTH=8192 ollama serve</code></code></pre><p>For heavier workflows, try stepping up gradually:</p><pre><code><code>OLLAMA_CONTEXT_LENGTH=16384 ollama serve</code></code></pre><p>Then run the model again and check:</p><pre><code><code>ollama ps</code></code></pre><p>If the <code>PROCESSOR</code> column improves from mixed CPU/GPU to <code>100% GPU</code>, context length was the issue.</p><p><a href="https://docs.ollama.com/context-length">Ollama documents</a> <code>OLLAMA_CONTEXT_LENGTH=64000 ollama serve</code> as a way to set context length when serving, but it also warns that larger context increases memory requirements.</p><h3>Fix 3: Create a lower-context Modelfile</h3><p>For a persistent model profile, create a <code>Modelfile</code>:</p><pre><code><code>FROM llama3.1:8b
PARAMETER num_ctx 4096</code></code></pre><p>Then create a custom model:</p><pre><code><code>ollama create llama3.1-8b-4k -f ./Modelfile
ollama run llama3.1-8b-4k</code></code></pre><p><a href="https://docs.ollama.com/modelfile">Ollama&#8217;s Modelfile reference</a> says <code>PARAMETER num_ctx</code> sets the context window used to generate the next token, and its example shows <code>PARAMETER num_ctx 4096</code>.</p><p>Use this when you want a safe &#8220;fast profile&#8221; for daily work.</p><h3>Fix 4: Use a smaller or more efficient model</h3><p>If lowering context length does not solve the slowdown, the model itself may be too large for your hardware.</p><p>Try a smaller model:</p><pre><code><code>ollama pull llama3.1:8b
ollama run llama3.1:8b</code></code></pre><p>Or choose a smaller quantized variant from the model page you are using.</p><p>The practical rule is simple: fit matters before speed. A smaller model running fully on GPU often feels better than a larger model split across GPU and CPU.</p><h3>Fix 5: Enable debug logs</h3><p><a href="https://docs.ollama.com/troubleshooting">Ollama&#8217;s troubleshooting docs</a> recommend checking logs when Ollama does not behave as expected. They list different log locations for macOS, Linux, Docker, and Windows.</p><h4>macOS</h4><pre><code><code>cat ~/.ollama/logs/server.log</code></code></pre><h4>Linux with systemd</h4><pre><code><code>journalctl -u ollama --no-pager --follow --pager-end</code></code></pre><h4>Docker</h4><pre><code><code>docker ps
docker logs &lt;container-name&gt;</code></code></pre><h4>Windows</h4><p>Open Run with <code>Win + R</code>, then use:</p><pre><code><code>explorer %LOCALAPPDATA%\Ollama</code></code></pre><p>Check the latest <code>server.log</code>.</p><p>To enable debug logging on Windows, quit the Ollama app from the tray menu, then run:</p><pre><code><code>$env:OLLAMA_DEBUG="1"
&amp; "ollama app.exe"</code></code></pre><p>Ollama documents <a href="https://docs.ollama.com/troubleshooting">this Windows debug process</a> in its troubleshooting page.</p><p>Look for messages about GPU discovery, insufficient VRAM, CPU fallback, CUDA, ROCm, Metal, or library selection.</p><h3>Fix 6: Check GPU support and drivers</h3><h4>NVIDIA</h4><p><a href="https://docs.ollama.com/gpu">Ollama&#8217;s hardware support page</a> says NVIDIA GPUs need compute capability 5.0 or newer, with driver version 531 or newer. It also notes that compute capability 5.0 through 6.2 requires driver version 570 or newer.</p><p>Check your GPU:</p><pre><code><code>nvidia-smi</code></code></pre><p>On Linux, also try:</p><pre><code><code>sudo nvidia-modprobe -u</code></code></pre><p>If GPU discovery fails after suspend or resume, Ollama says reloading the NVIDIA UVM driver can help:</p><pre><code><code>sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm</code></code></pre><p>Ollama <a href="https://docs.ollama.com/gpu">lists this as a workaround</a> for a Linux suspend and resume driver bug.</p><h4>AMD</h4><p>On Linux, <a href="https://docs.ollama.com/troubleshooting">Ollama says AMD GPU access</a> can require <code>video</code> and <code>render</code> group permissions for <code>/dev/kfd</code>, and that <code>OLLAMA_DEBUG=1</code> can help during GPU discovery.</p><p>Check device permissions:</p><pre><code><code>ls -lnd /dev/kfd /dev/dri /dev/dri/*</code></code></pre><p>For driver mismatch problems, <a href="https://docs.ollama.com/troubleshooting">Ollama says ROCm 7 Linux libraries</a> require a compatible ROCm 7 kernel driver, and older drivers can cause GPU discovery to hang and fall back to CPU.</p><h3>Fix 7: Limit which GPU Ollama uses</h3><p>On multi-GPU NVIDIA systems, you may want Ollama to use a specific GPU.</p><p>First list GPUs:</p><pre><code><code>nvidia-smi -L</code></code></pre><p>Then set:</p><pre><code><code>CUDA_VISIBLE_DEVICES=GPU-UUID-HERE ollama serve</code></code></pre><p><a href="https://docs.ollama.com/gpu">Ollama&#8217;s hardware support docs</a> say <code>CUDA_VISIBLE_DEVICES</code> can limit Ollama to a subset of NVIDIA GPUs, and that UUIDs are more reliable than numeric IDs because ordering can vary.</p><p>This helps when one GPU has more free VRAM than another.</p><h3>Fix 8: Reduce batch and generation load</h3><p>Ollama&#8217;s API supports advanced runtime options, including <code>num_ctx</code>, <code>num_batch</code>, <code>num_gpu</code>, <code>main_gpu</code>, and <code>num_thread</code>. The <a href="https://github.com/ollama/ollama/blob/main/docs/api.md">legacy GitHub API docs</a> show these options in a request body.</p><p>For API workflows, test a smaller context and batch:</p><pre><code><code>curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Write a short test response.",
  "stream": false,
  "options": {
    "num_ctx": 4096,
    "num_batch": 1
  }
}'</code></code></pre><p>Then compare tokens per second from the final response. <a href="https://github.com/ollama/ollama/blob/main/docs/api.md">Ollama&#8217;s API docs</a> say token speed can be calculated by dividing <code>eval_count</code> by <code>eval_duration</code> and multiplying by <code>10^9</code>, since durations are returned in nanoseconds.</p><h3>Fix 9: Stop competing GPU workloads</h3><p>Before assuming Ollama is at fault, close anything else using VRAM:</p><ul><li><p>Games<br></p></li><li><p>ComfyUI<br></p></li><li><p>Stable Diffusion WebUI<br></p></li><li><p>DaVinci Resolve<br></p></li><li><p>Blender<br></p></li><li><p>Browser AI features<br></p></li><li><p>Other local model servers<br></p></li><li><p>Docker containers using GPU<br></p></li></ul><p>Then reload the model:</p><pre><code><code>ollama stop &lt;model-name&gt;
ollama run &lt;model-name&gt;</code></code></pre><p>Check again:</p><pre><code><code>ollama ps</code></code></pre><p>If the processor split improves, VRAM pressure from other apps was part of the problem.</p><h3>Fix 10: Use a realistic hardware target</h3><p>A larger GPU does not fix bad settings, but too little VRAM creates a hard ceiling.</p><p>Practical local LLM targets:</p><p><strong>8GB VRAM:</strong> Good for smaller 7B and 8B models at modest context. Avoid large context and heavy coding agents.</p><p><strong>12GB VRAM:</strong> Better for 8B and some 14B class models with careful quantization. Still watch context size.</p><p><strong>16GB VRAM:</strong> More comfortable for mid-size models, but large context can still trigger offload.</p><p><strong>24GB VRAM:</strong> Strong local AI baseline for larger quantized models and longer sessions.</p><p><strong>48GB or more:</strong> Better for large context, heavier RAG, agents, and larger models.</p><p><a href="https://docs.ollama.com/context-length">Ollama&#8217;s own context defaults</a> reflect this broad split: below 24 GiB gets 4K context, 24 to 48 GiB gets 32K, and 48 GiB or more gets 256K.</p><div><hr></div><h4><em><strong>More on GPUs for local LLMs:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;456ee949-1834-4bb7-9fc0-21579b824ac2&quot;,&quot;caption&quot;:&quot;For anyone building a cheap local AI box in 2026, the first rule has not changed. VRAM matters more than gamer marketing. A Llama 3.1 8B Q4 build in Ollama is 4.9GB. A Gemma 3 12B Q4 build lands at 8.1GB, while its Q8 &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best budget GPUs for local LLMs in 2026: 5 smart buys for Ollama&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-21T13:31:02.258Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!vIue!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ed22ac-c47a-4628-85f2-763942f38049_2303x1478.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-budget-gpus-local-llms-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:194906880,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>How to test whether the fix worked</h3><p>Use a repeatable prompt:</p><pre><code><code>curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Write 300 words explaining why VRAM matters for local LLMs.",
  "stream": false
}'</code></code></pre><p>Then check:</p><pre><code><code>ollama ps</code></code></pre><p>Look for:</p><pre><code><code>PROCESSOR    CONTEXT
100% GPU     4096</code></code></pre><p>Also compare token speed using <code>eval_count</code> and <code>eval_duration</code> from the final API response. Ollama documents those fields and the <a href="https://github.com/ollama/ollama/blob/main/docs/api.md">token-per-second calculation</a> in its API docs</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Q36!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Q36!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!8Q36!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!8Q36!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!8Q36!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Q36!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1569680,&quot;alt&quot;:&quot;Ollama suddenly slow? How to stop CPU offload&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200919624?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Ollama suddenly slow? How to stop CPU offload" title="Ollama suddenly slow? How to stop CPU offload" srcset="https://substackcdn.com/image/fetch/$s_!8Q36!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!8Q36!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!8Q36!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!8Q36!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d6c4525-a157-4cd7-97df-4919e78140e8_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Slow Ollama performance often comes from VRAM limits, large context windows, or CPU fallback. Here&#8217;s how to find and fix it. &#169; Popular AI</figcaption></figure></div><p>.</p><h3>Common mistakes</h3><h4>Mistake: setting context to the model maximum</h4><blockquote><p>Many users see a model supports a huge context window and immediately set Ollama to that number. That is a good way to destroy performance if the hardware cannot hold the working set.</p><p>Use the smallest context that solves the task.</p><div><hr></div></blockquote><h4>Mistake: assuming &#8220;model size&#8221; equals total memory required</h4><blockquote><p>The model file is only part of the memory story. Runtime context and KV cache can push a previously working setup over the edge.</p><div><hr></div></blockquote><h4>Mistake: ignoring <code>ollama ps</code></h4><blockquote><p>Guessing wastes time. <code>ollama ps</code> tells you whether the model is fully on GPU, partially offloaded, or CPU-bound.</p><div><hr></div></blockquote><h4>Mistake: blaming the CPU first</h4><blockquote><p>A fast CPU does not make CPU offload feel like GPU inference. CPU fallback can keep the model usable, but it is not the performance target.</p><div><hr></div></blockquote><h4>Mistake: buying a faster low-VRAM GPU</h4><blockquote><p>For local LLMs, more VRAM can matter more than raw gaming speed. A faster card with too little memory can still fall into CPU offload.</p><div><hr></div></blockquote><h3>Privacy and account-risk notes</h3><p>This is one reason Ollama is worth fixing rather than abandoning at the first slowdown. A working local setup keeps prompts, documents, code, meeting notes, and internal workflows on your machine unless you deliberately connect external services.</p><p>That privacy benefit disappears if the local setup becomes too slow and forces you back to a hosted model for every serious task.</p><p>The practical goal is not to run the largest model possible. It is to run the largest useful model that stays fast and predictable on hardware you control.</p><div><hr></div><h3>FAQ</h3><h4>Why is Ollama suddenly slow?</h4><blockquote><p>The most likely reason is that the model or context no longer fits fully in GPU memory, so Ollama is using CPU or system RAM. Run <code>ollama ps</code> and check the <code>PROCESSOR</code> column.</p><div><hr></div></blockquote><h4>How do I know if Ollama is using my GPU?</h4><blockquote><p>Run:</p></blockquote><pre><code><code>ollama ps</code></code></pre><blockquote><p>If the <code>PROCESSOR</code> column says <code>100% GPU</code>, the loaded model is fully on GPU. If it shows CPU or a CPU/GPU split, you are offloading.</p><div><hr></div></blockquote><h4>Does increasing context length make Ollama slower?</h4><blockquote><p>It can. <a href="https://docs.ollama.com/context-length">Ollama says larger context length increases memory requirements</a>. If the larger context pushes the model out of VRAM, performance can drop sharply.</p><div><hr></div></blockquote><h4>Should I set Ollama to 64K context?</h4><blockquote><p>Only if your task needs it and your hardware can handle it. <a href="https://docs.ollama.com/context-length">Ollama says large-context tasks</a> like agents, web search, and coding tools should use at least 64K tokens, but it also warns to make sure enough VRAM is available.</p><div><hr></div></blockquote><h4>What is KV cache bloat?</h4><blockquote><p>KV cache is runtime memory used so the model can keep track of previous tokens during generation. As context grows, memory use grows. If that runtime memory pushes past available VRAM, performance can collapse into CPU offload.</p><div><hr></div></blockquote><h4>Is CPU offload bad?</h4><blockquote><p>It is useful as a fallback, but bad for performance. CPU offload can keep a model running when it does not fit in VRAM, but the price is slower generation.</p><div><hr></div></blockquote><h4>Should I buy more RAM or more VRAM?</h4><blockquote><p>For Ollama performance, VRAM usually matters first. System RAM helps avoid crashes when workloads spill over, but it does not make CPU offload as fast as GPU inference.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>Start with measurement, not guesswork.</p><p>Run <code>ollama ps</code>, lower context length, test again, and check the logs. If your model becomes <code>100% GPU</code> after reducing context, the problem was memory pressure from context or KV cache growth. If Ollama still cannot use the GPU, move to driver and GPU discovery troubleshooting.</p><p>Only buy hardware after you know the failure mode. For local LLMs, the best upgrade is usually more usable GPU memory, not a faster card with the same cramped VRAM.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/fix-ollama-cpu-offloading-slow-inference/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/fix-ollama-cpu-offloading-slow-inference/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[How to fix ComfyUI Desktop “unrecognized arguments: --normalvram” crashes]]></title><description><![CDATA[ComfyUI Desktop crashing with &#8220;unrecognized arguments: --normalvram&#8221;? Learn how to remove bad launch flags, update safely, check logs, and run ComfyUI directly.]]></description><link>https://www.popularai.org/p/fix-comfyui-desktop-normalvram-crash</link><guid isPermaLink="false">https://www.popularai.org/p/fix-comfyui-desktop-normalvram-crash</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Sat, 06 Jun 2026 17:57:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pnMQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pnMQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pnMQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!pnMQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!pnMQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!pnMQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pnMQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1759998,&quot;alt&quot;:&quot;ComfyUI Desktop crashing on launch? Fix the --normalvram error&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200912708?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ComfyUI Desktop crashing on launch? Fix the --normalvram error" title="ComfyUI Desktop crashing on launch? Fix the --normalvram error" srcset="https://substackcdn.com/image/fetch/$s_!pnMQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!pnMQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!pnMQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!pnMQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a471454-84e5-423b-9fcf-ae01378eba3a_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fix ComfyUI Desktop startup crashes by cleaning up stale launch arguments, checking logs, updating Desktop, and running ComfyUI directly. &#169; Popular AI</figcaption></figure></div><p>If ComfyUI Desktop crashes on startup with this error:</p><pre><code><code>unrecognized arguments: --normalvram</code></code></pre><p>the problem is almost certainly a bad launch argument being passed into ComfyUI. The desktop wrapper, a saved config file, a shortcut, a script, or an old launcher profile is trying to start ComfyUI with <code>--normalvram</code>, but the current ComfyUI backend does not recognize that flag.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/fix-comfyui-desktop-normalvram-crash?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/fix-comfyui-desktop-normalvram-crash?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>The fastest fix is to remove <code>--normalvram</code> from the startup arguments, update ComfyUI Desktop, then relaunch. <a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">ComfyUI&#8217;s current command-line argument list</a> includes VRAM flags such as <code>--gpu-only</code>, <code>--highvram</code>, <code>--lowvram</code>, <code>--novram</code>, and <code>--cpu</code>, but not <code>--normalvram</code>.</p><div><hr></div><h4><em><strong>More on ComfyUI troubleshooting:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;eb3ed2a6-f7b8-4508-93ab-0bf6d3b5a957&quot;,&quot;caption&quot;:&quot;When ComfyUI starts flashing &#8220;Failed to save workflow draft&#8221; every time you drag a node, paste a group, or tweak a setting, the obvious assumption is that saving is broken. For a lot of users, that diagnosis sends them in the wrong direction.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to fix ComfyUI&#8217;s failed to save workflow draft error&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-27T00:59:25.969Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!D-KI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ac7c120-7a05-4514-879d-ba0670f54ef8_2400x1549.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/how-to-fix-comfyuis-failed-to-save&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192085374,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Quick fix</h3><p>Close ComfyUI Desktop completely.</p><p>Then check these places for <code>--normalvram</code> and remove it:</p><pre><code><code>--normalvram</code></code></pre><p>Replace it with nothing. In most cases, ComfyUI&#8217;s default dynamic VRAM behavior is what you want.</p><p>If you need a manual VRAM mode, use one of the supported flags:</p><pre><code><code>--highvram
--lowvram
--novram
--gpu-only
--cpu</code></code></pre><p>Do not guess here. <code>--cpu</code> will run everything on CPU and will be slow. <a href="https://github.com/comfyanonymous/ComfyUI">ComfyUI&#8217;s own README</a> says CPU mode works, but labels it slow.</p><h3>Why this crash happens</h3><p>ComfyUI Desktop is a packaged app that starts a bundled ComfyUI server in the background. <a href="https://github.com/Comfy-Org/desktop">The desktop repository</a> says the app includes ComfyUI source code, ComfyUI-Manager, Electron, Chromium binaries, and node modules, then starts the ComfyUI server for you.</p><p>That wrapper is convenient, but it also means there are two layers that can break:</p><ol><li><p>The desktop launcher.</p></li><li><p>The ComfyUI Python backend.</p></li></ol><p>The <code>unrecognized arguments</code> message comes from Python&#8217;s argument parser. It means ComfyUI started, read the command-line flags it was given, and rejected one of them.</p><p>In this case, the rejected flag is:</p><pre><code><code>--normalvram</code></code></pre><p>That points to a stale configuration, launcher shortcut, extension, or desktop wrapper setting.</p><h3>Why <code>--normalvram</code> is suspicious</h3><p>Current ComfyUI has supported VRAM-related flags, but <code>--normalvram</code> is not one of them. <a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">The current argument parser</a> defines <code>--gpu-only</code>, <code>--highvram</code>, <code>--lowvram</code>, <code>--novram</code>, and <code>--cpu</code> as mutually exclusive VRAM modes. It also includes newer memory options such as <code>--reserve-vram</code>, <code>--enable-dynamic-vram</code>, <code>--disable-dynamic-vram</code>, and async offload controls.</p><p>So when ComfyUI Desktop passes <code>--normalvram</code>, the backend has no valid setting to map it to.</p><p>That is why the fix is not to reinstall models or delete workflows. The model files are probably fine. The crash is happening before your workflow gets a chance to run.</p><h3>Step 1: Update ComfyUI Desktop</h3><p><a href="https://github.com/Comfy-Org/desktop/releases">ComfyUI Desktop v0.9.2 was released on May 20, 2026</a> and bundled ComfyUI 0.22.0. Newer desktop releases followed quickly, with v0.9.3 on May 23 and v0.9.4 on May 28. The v0.9.4 release bumped bundled ComfyUI to v0.22.3.</p><p>That matters because this type of bug often comes from a mismatch between the desktop launcher and the bundled backend.</p><p>Update to the latest ComfyUI Desktop release first. Then start it again.</p><p>If it still crashes, continue with the manual cleanup below.</p><h3>Step 2: Find the ComfyUI Desktop logs</h3><p><a href="https://github.com/Comfy-Org/desktop">ComfyUI Desktop documents its log locations</a> in the desktop repository. It says Electron main process logs are stored as <code>main.log</code>, and ComfyUI server logs are stored as <code>comfyui_&lt;date&gt;.log</code>.</p><h4>Windows</h4><p>Open Run with <code>Win + R</code>, then enter:</p><pre><code><code>%AppData%\ComfyUI\logs</code></code></pre><p>Look for:</p><pre><code><code>main.log
comfyui_&lt;date&gt;.log</code></code></pre><h4>macOS</h4><p>Open Terminal:</p><pre><code><code>open ~/Library/Logs/ComfyUI</code></code></pre><h4>Linux</h4><p>Open Terminal:</p><pre><code><code>xdg-open ~/.config/ComfyUI/logs</code></code></pre><p>If your desktop package uses a slightly different app name, check:</p><pre><code><code>ls ~/.config/*/logs</code></code></pre><p>Search the logs for:</p><pre><code><code>normalvram</code></code></pre><p>If you see it, you have confirmed the crash source.</p><h3>Step 3: Check the Desktop config file</h3><p>ComfyUI Desktop stores user configuration in a platform-specific config file. <a href="https://github.com/Comfy-Org/desktop">The desktop repository lists these locations</a>:</p><h4>Windows</h4><pre><code><code>%APPDATA%\ComfyUI\config.json</code></code></pre><h4>macOS</h4><pre><code><code>~/Library/Application Support/ComfyUI/config.json</code></code></pre><h4>Linux</h4><pre><code><code>~/.config/ComfyUI/config.json</code></code></pre><p>Open the file in a text editor and search for:</p><pre><code><code>normalvram</code></code></pre><p>If you find it, remove only that argument. Save the file and restart ComfyUI Desktop.</p><p>Do not delete the whole config unless you are willing to reselect paths and rebuild preferences.</p><h3>Step 4: Check your shortcuts and launch scripts</h3><p>On Windows, right-click the ComfyUI Desktop shortcut and check the Target field.</p><p>Look for something like:</p><pre><code><code>ComfyUI.exe --normalvram</code></code></pre><p>Change it back to:</p><pre><code><code>ComfyUI.exe</code></code></pre><p>If you use a <code>.bat</code> file, check for:</p><pre><code><code>--normalvram</code></code></pre><p>Remove it.</p><p>On Linux, check any <code>.desktop</code> launcher:</p><pre><code><code>grep -R "normalvram" ~/.local/share/applications ~/.config 2&gt;/dev/null</code></code></pre><p>On macOS, check any custom shell launcher or Automator script that starts ComfyUI.</p><h3>Step 5: Run ComfyUI directly to bypass the desktop wrapper</h3><p>This is the best diagnostic test. If direct ComfyUI works, but ComfyUI Desktop fails, the desktop wrapper or its config is the problem.</p><p>Find your ComfyUI backend folder, then run:</p><pre><code><code>python main.py</code></code></pre><p>Or, if the Desktop app uses its bundled Python environment, run from that environment.</p><p>A basic direct launch should not include <code>--normalvram</code>.</p><p>Use:</p><pre><code><code>python main.py</code></code></pre><p>For a browser auto-launch:</p><pre><code><code>python main.py --auto-launch</code></code></pre><p>For low VRAM systems:</p><pre><code><code>python main.py --lowvram</code></code></pre><p>For very low VRAM systems:</p><pre><code><code>python main.py --novram</code></code></pre><p>For CPU-only fallback:</p><pre><code><code>python main.py --cpu</code></code></pre><p>Again, CPU mode is mainly a fallback. It will be much slower than GPU generation.</p><h3>Step 6: Use supported VRAM flags only</h3><p>Here is the practical meaning of the current supported modes.</p><h4>Default mode</h4><p>Use no VRAM flag:</p><pre><code><code>python main.py</code></code></pre><p>This is best for most users because ComfyUI&#8217;s current memory management is designed to handle normal GPU loading and offloading automatically.</p><h4>High VRAM</h4><p>Use this if you have enough VRAM and want ComfyUI to keep models in GPU memory:</p><pre><code><code>python main.py --highvram</code></code></pre><p><a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">ComfyUI&#8217;s current argument help</a> says <code>--highvram</code> keeps models in GPU memory instead of unloading them to CPU memory after use.</p><h4>Low VRAM</h4><p>Use this when models fit poorly or you hit memory errors:</p><pre><code><code>python main.py --lowvram</code></code></pre><p><a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">The current help text</a> says <code>--lowvram</code> affects behavior when dynamic VRAM is not being used.</p><h4>No VRAM</h4><p>Use this for extreme memory pressure:</p><pre><code><code>python main.py --novram</code></code></pre><p>This may keep ComfyUI running, but expect slower generation.</p><h4>GPU only</h4><p>Use this only if you want everything stored and run on the GPU:</p><pre><code><code>python main.py --gpu-only</code></code></pre><p>This can be fast when you have enough VRAM. It can also fail hard when you do not.</p><h4>CPU</h4><p>Use this only when GPU support is broken or unavailable:</p><pre><code><code>python main.py --cpu</code></code></pre><p><a href="https://github.com/comfyanonymous/ComfyUI">ComfyUI says it can work without a GPU using </a><code>--cpu</code>, but it is slow.</p><h3>Step 7: Remove stale environment or wrapper settings</h3><p>Some users run ComfyUI through custom tools, launchers, or environment managers. Search your ComfyUI folders for the bad flag.</p><h4>Windows PowerShell</h4><pre><code><code>Select-String -Path "$env:APPDATA\ComfyUI\*" -Pattern "normalvram" -Recurse</code></code></pre><p>Also check:</p><pre><code><code>Select-String -Path "$env:LOCALAPPDATA\Programs\ComfyUI\*" -Pattern "normalvram" -Recurse</code></code></pre><h4>Linux and macOS</h4><pre><code><code>grep -R "normalvram" ~/.config/ComfyUI 2&gt;/dev/null
grep -R "normalvram" ~/Library/Application\ Support/ComfyUI 2&gt;/dev/null</code></code></pre><p>For Linux app launchers:</p><pre><code><code>grep -R "normalvram" ~/.local/share/applications /usr/share/applications 2&gt;/dev/null</code></code></pre><p>If the search finds <code>--normalvram</code>, remove it from the config or script that contains it.</p><h3>Step 8: Check whether a custom node or manager profile is involved</h3><p>ComfyUI Desktop bundles ComfyUI-Manager behavior depending on the bundled ComfyUI version, and <a href="https://github.com/Comfy-Org/desktop">the desktop README says the app can install ComfyUI-Manager via pip when </a><code>--enable-manager</code><a href="https://github.com/Comfy-Org/desktop"> is set</a>.</p><p>A custom node usually should not inject a startup flag before ComfyUI launches. Still, if you have a manager profile, a custom update script, or a third-party launcher that writes startup arguments, it can be involved.</p><p>To isolate this, start ComfyUI with custom nodes disabled:</p><pre><code><code>python main.py --disable-all-custom-nodes</code></code></pre><p>If that works, your core backend is fine. Then remove stale launcher settings before turning custom nodes back on.</p><h3>Step 9: Roll back only if updating fails</h3><p>Since v0.9.2 was followed by v0.9.3 and v0.9.4 within days, updating is the cleaner fix. <a href="https://github.com/Comfy-Org/desktop/releases">The release page shows v0.9.3 bumped ComfyUI to 0.22.2 and v0.9.4 bumped it to 0.22.3</a>.</p><p>Rollback should be a last resort.</p><p>Use rollback only when:</p><ul><li><p>The newest Desktop release still crashes.<br></p></li><li><p>You cannot remove the injected argument.<br></p></li><li><p>You need a working generation environment immediately.<br></p></li><li><p>Direct <code>python main.py</code> works but the Desktop wrapper does not.<br></p></li></ul><p>In that case, run the backend directly until the desktop wrapper is repaired.</p><h3>Step 10: Confirm the fix</h3><p>After removing <code>--normalvram</code>, start ComfyUI Desktop again.</p><p>A healthy launch should reach the normal server startup stage and open the UI. By default, ComfyUI listens on port <code>8188</code> unless configured otherwise. <a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">The current argument parser includes </a><code>--port</code><a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py"> with default </a><code>8188</code>.</p><p>You can test the local server in a browser:</p><pre><code>http://127.0.0.1:8188</code></pre><p>If the page opens, the startup argument crash is fixed.</p><h3>What not to do</h3><p>Do not delete your models folder. This error is not caused by checkpoints, LoRAs, VAEs, or ControlNet models.</p><p>Do not reinstall CUDA first. CUDA problems usually produce CUDA, Torch, driver, or device errors. This error is much earlier and cleaner: the Python argument parser rejected a startup flag.</p><p>Do not switch to CPU mode unless you need a temporary fallback.</p><p>Do not add random VRAM flags from old tutorials. ComfyUI changes over time. Use the flags supported by your installed backend.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Why this matters</h3><p>Desktop wrappers are useful until they hide the thing that broke. A local AI workflow should be under your control. When a launcher silently passes a stale flag, it can make an open-source backend look broken even when the backend is fine.</p><p>The fix is to get back to first principles:</p><pre><code><code>python main.py</code></code></pre><p>Then add only the arguments you actually need.</p><p>That approach keeps your setup recoverable. It also keeps your image generation stack independent of whatever the desktop wrapper, updater, or third-party launcher decided to do.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lFvJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lFvJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!lFvJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!lFvJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!lFvJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lFvJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1379970,&quot;alt&quot;:&quot;How to fix ComfyUI Desktop --normalvram startup crashes&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200912708?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How to fix ComfyUI Desktop --normalvram startup crashes" title="How to fix ComfyUI Desktop --normalvram startup crashes" srcset="https://substackcdn.com/image/fetch/$s_!lFvJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!lFvJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!lFvJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!lFvJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0efc09b1-7195-41d2-9aec-91fc2c5362f7_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ComfyUI Desktop can crash when an old launcher, config file, or shortcut passes the unsupported <code>--normalvram</code> flag. Here&#8217;s how to remove it safely. &#169; Popular AI</figcaption></figure></div><h3>FAQ</h3><h4>What does &#8220;unrecognized arguments: --normalvram&#8221; mean?</h4><blockquote><p>It means ComfyUI was started with a command-line flag it does not understand. The invalid flag is <code>--normalvram</code>.</p><div><hr></div></blockquote><h4>Is <code>--normalvram</code> a valid ComfyUI flag?</h4><blockquote><p>Not in the current ComfyUI argument list I checked. <a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">Current VRAM flags include </a><code>--gpu-only</code><a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">, </a><code>--highvram</code><a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">, </a><code>--lowvram</code><a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">, </a><code>--novram</code><a href="https://raw.githubusercontent.com/Comfy-Org/ComfyUI/master/comfy/cli_args.py">, and </a><code>--cpu</code>.</p><div><hr></div></blockquote><h4>Should I use <code>--lowvram</code> instead?</h4><blockquote><p>Only if you actually need low VRAM behavior. For most users, the best replacement is no VRAM flag at all.</p><div><hr></div></blockquote><h4>Will deleting models fix this?</h4><blockquote><p>No. This is a startup argument problem, not a model problem.</p><div><hr></div></blockquote><h4>Is ComfyUI Desktop v0.9.2 the latest version?</h4><blockquote><p>No. <a href="https://github.com/Comfy-Org/desktop/releases">The release page shows v0.9.2 on May 20, 2026, followed by v0.9.3 on May 23 and v0.9.4 on May 28</a>.</p><div><hr></div></blockquote><h4>Where are ComfyUI Desktop logs stored?</h4><blockquote><p><a href="https://github.com/Comfy-Org/desktop">The desktop README lists logs</a> under <code>%AppData%\{app name}\logs</code> on Windows, <code>~/Library/Logs/{app name}</code> on macOS, and <code>~/.config/{app name}/logs</code> on Linux.</p><div><hr></div></blockquote><h3>Final recommendation</h3><p>Remove <code>--normalvram</code>, update ComfyUI Desktop, and test a direct backend launch with:</p><pre><code><code>python main.py</code></code></pre><p>If direct launch works, the backend is healthy and the desktop wrapper or its saved config is the issue. Keep the fix simple: no unsupported flags, no model deletion, no full reinstall unless the config is too damaged to repair.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/fix-comfyui-desktop-normalvram-crash/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/fix-comfyui-desktop-normalvram-crash/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[How to run Gemma 4 31B locally with Ollama for private writing]]></title><description><![CDATA[Gemma 4 31B can handle long-context writing locally, but memory matters. Here is how to set it up without chasing 256K too soon.]]></description><link>https://www.popularai.org/p/run-gemma-4-31b-locally-ollama</link><guid isPermaLink="false">https://www.popularai.org/p/run-gemma-4-31b-locally-ollama</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Fri, 05 Jun 2026 14:09:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7DEw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7DEw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7DEw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!7DEw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!7DEw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!7DEw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7DEw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1589538,&quot;alt&quot;:&quot;Run Gemma 4 31B locally: the Ollama setup writers need&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200409526?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Run Gemma 4 31B locally: the Ollama setup writers need" title="Run Gemma 4 31B locally: the Ollama setup writers need" srcset="https://substackcdn.com/image/fetch/$s_!7DEw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!7DEw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!7DEw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!7DEw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0149b9e-aae4-4ac6-a714-70baecb95f11_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Learn how to install Gemma 4 31B in Ollama, set context length, tune hardware expectations, and build a local writing assistant. &#169; Popular AI</figcaption></figure></div><p>Gemma 4 31B is one of the most interesting local AI releases for writers in 2026 because it gives you a serious open-weight model with a 256K context window, native system prompt support, image input, and direct Ollama support. The practical catch is memory. You can run Gemma 4 31B locally with Ollama, but most people should start with 32K or 64K context instead of trying to use the full 256K window on day one. Google&#8217;s <a href="https://ai.google.dev/gemma/docs/core">Gemma 4 model overview</a> lists Gemma 4 31B at roughly 58.3GB in BF16, 30.4GB in SFP8, and 17.4GB in Q4_0 for static model weights before KV cache and runtime overhead.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/run-gemma-4-31b-locally-ollama?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/run-gemma-4-31b-locally-ollama?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>That distinction matters. The model file size is only the starting point. Long context adds memory pressure, and a larger window does not guarantee perfect recall across a giant manuscript, research packet, or project bible. Ollama&#8217;s <a href="https://ollama.com/library/gemma4">Gemma 4 model page</a> lists <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:31b</mark> as a 20GB download with a 256K context window, but its benchmark table also shows Gemma 4 31B scoring 66.4 percent on MRCR v2 8-needle at 128K. Treat that as a useful warning. Long context is powerful, but &#8220;flawless recall&#8221; is not the right expectation.</p><h3>Quick verdict</h3><blockquote><p>Use Gemma 4 31B locally if you want a high-quality writing, reasoning, and research model that can work with large manuscripts, outlines, repositories, research notes, character bibles, or worldbuilding documents without sending private material to a hosted AI account.</p></blockquote><blockquote><p>Skip Gemma 4 31B on underpowered machines. A small laptop may technically launch a quantized model, but the experience can become slow once you raise context length, keep other apps open, or ask the model to process long documents. Smaller Gemma 4 variants, especially E4B, are better starting points for machines with limited memory.</p></blockquote><blockquote><p>For writers, the sweet spot is control. A local Gemma 4 31B setup gives you a private AI writing assistant that can help with continuity, structural editing, scene revision, research synthesis, and brainstorming through a local Ollama workflow. The model can run on your own hardware, answer through a local API, and keep working without a subscription or cloud chat window.</p></blockquote><h3>What Gemma 4 31B is</h3><p>Google introduced Gemma 4 on April 2, 2026 as an open model family built for advanced reasoning, agentic workflows, coding, multimodal understanding, and local deployment. The family includes E2B, E4B, 26B A4B Mixture of Experts, and 31B Dense variants. Google&#8217;s launch post <a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/">describes Gemma 4 as its most capable open model family to date</a> and says it is released under an Apache 2.0 license, which is much easier to work with for commercial building, fine-tuning, and deployment than the older custom Gemma terms.</p><p>Gemma 4 31B is the dense, quality-first option. Google&#8217;s <a href="https://ai.google.dev/gemma/docs/core">model overview</a> describes the 31B dense model as the version that bridges server-grade performance and local execution. The 26B A4B model is the speed-focused option. It uses a Mixture of Experts architecture, activates a smaller number of parameters per token during inference, and is designed for higher-throughput reasoning. Google notes that all parameters still need to be loaded into memory for fast routing and inference.</p><p>For local AI users, the key change is that Gemma 4 is much more convenient to run than a research-only model release. Google provides a <a href="https://ai.google.dev/gemma/docs/integrations/ollama">Gemma with Ollama integration guide</a>, and Ollama lists the exact tag you need: <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:31b</mark>. That makes the setup approachable for writers, editors, developers, and researchers who want a strong local model without building an inference stack from scratch.</p><div><hr></div><h4><em><strong>More on choosing local LLMs for your hardware:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;692cc909-41ca-4740-b257-4d02fda988eb&quot;,&quot;caption&quot;:&quot;Running a local model sounds wonderfully simple. One box. One model. No API bill. No usage cap. No surprise account lockout.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to choose the right local LLM for 8GB, 12GB, and 24GB VRAM&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-15T14:18:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!CEOc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6a71d4f-7366-4a02-86b4-2d5471da6e55_2560x1507.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/how-to-choose-the-right-local-llm-for-8gb-12gb-and-24gb-vram&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191511400,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Why Gemma 4 31B matters for writers</h3><p>Long-form writing runs into two problems with hosted AI tools.</p><p>First, manuscripts, outlines, client drafts, notes, character sketches, business research, and unpublished ideas often contain material that should not be pasted into a cloud chat window by default. Even when a cloud tool has strong privacy controls, many writers want a local fallback for sensitive work.</p><p>Second, hosted models can change behavior without your approval. Filters change. Plans change. Rate limits change. Model names change. Prices change. A model that handled your workflow last month may become slower, less permissive, more expensive, or unavailable later.</p><p>Gemma 4 31B does not remove every problem. You still need enough hardware. You still need to test recall. You still need backups, citations, version control, and editorial judgment. The value is that it gives serious writers a local model for work that benefits from privacy and control.</p><p>A local Gemma 4 31B writing workflow is especially useful for:</p><ul><li><p>Summarizing a full manuscript or long outline<br></p></li><li><p>Checking continuity across chapters<br></p></li><li><p>Finding contradictions in worldbuilding notes<br></p></li><li><p>Rewriting a scene against a style guide<br></p></li><li><p>Building character, setting, and plot bibles<br></p></li><li><p>Reviewing large research packets<br></p></li><li><p>Running private brainstorming sessions<br></p></li><li><p>Auditing client drafts without sending them to a cloud account<br></p></li><li><p>Creating a local editorial assistant for repeatable revision tasks<br></p></li></ul><p>The control advantage is straightforward. When Gemma 4 31B runs locally in Ollama, the model can answer through a local command line or local API on your own machine. Google&#8217;s <a href="https://ai.google.dev/gemma/docs/integrations/ollama">Ollama guide for Gemma</a> describes installing Ollama, pulling Gemma models, and using model tags, while Ollama&#8217;s own page identifies <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:31b</mark> as the dense workstation variant.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Hardware requirements: what you actually need</h3><p>Do not plan your setup around the model file size alone. You need memory for the quantized weights, context window, operating system, Ollama, and any writing apps, editors, browsers, note tools, or scripts feeding the model.</p><p>Google&#8217;s Gemma 4 overview lists these approximate inference memory requirements for the 31B model:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/oD9dr/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f6bf8b1-1011-4279-820a-c18f1303f7e4_1220x398.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32ba7436-eb7b-4e88-81f3-09bad92a4372_1220x398.png&quot;,&quot;height&quot;:193,&quot;title&quot;:&quot;| Created with Datawrapper&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/oD9dr/1/" width="730" height="193" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>Those figures cover static model weights. They do not include the additional VRAM required for context, and Google explicitly notes that <a href="https://ai.google.dev/gemma/docs/core">KV cache memory rises dynamically with total prompt and response length</a>. Larger context windows require more memory on top of the base model.</p><p>A practical hardware plan looks like this:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/lrzEY/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ba04ab1-4683-446e-99d9-1f44b9f27441_1220x812.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a6f0f79-0de6-4de3-be92-faa591bf14d4_1220x812.png&quot;,&quot;height&quot;:409,&quot;title&quot;:&quot;| Created with Datawrapper&quot;,&quot;description&quot;:&quot;Create interactive, responsive &amp; beautiful charts &#8212; no code required.&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/lrzEY/1/" width="730" height="409" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>Ollama&#8217;s <a href="https://docs.ollama.com/context-length">context length documentation</a> gives a useful sanity check. Ollama defaults to 4K context below 24 GiB VRAM, 32K context from 24 to 48 GiB VRAM, and 256K context at 48 GiB VRAM or more. Ollama also says large-context tasks such as web search, agents, and coding tools should be set to at least 64,000 tokens when the task actually needs that much context.</p><p>For writing, bigger is not always better. A 32K or 64K context window is often enough for a chapter, a dense project bible, a style guide, and instructions. A full 256K context window can be useful for whole-manuscript passes, repository-scale work, and large research packets, but only when the machine can keep the model fast enough to make the workflow usable.</p><h3>Install Ollama and pull Gemma 4 31B</h3><p>Install Ollama from the official download flow described in Google&#8217;s <a href="https://ai.google.dev/gemma/docs/integrations/ollama">Gemma Ollama setup guide</a>, then confirm it works:</p><pre><code><code>ollama --version</code></code></pre><p>Pull the 31B model:</p><pre><code><code>ollama pull gemma4:31b</code></code></pre><p>Run a quick smoke test:</p><pre><code><code>ollama run gemma4:31b "Write a 150-word scene summary for a noir detective novel."</code></code></pre><p>Google lists the Gemma 4 Ollama tags as <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:e2b</mark>, <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:e4b</mark>, <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:26b</mark>, and <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:31b</mark>. Ollama&#8217;s <a href="https://ollama.com/library/gemma4">Gemma 4 library page</a> also shows <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:31b</mark> as the dense workstation model.</p><p>This is the simplest path for most users. Pull the official model tag first, confirm it runs, then make adjustments for context length and writing behavior. Do not start by building a complicated toolchain. A working baseline gives you a clear performance reference before you change settings.</p><h3>Set a context length your hardware can handle</h3><p>Start with 32K. Then try 64K. Push toward 128K or 256K only when the model remains usable and stays in GPU or unified memory.</p><p>For a one-session test:</p><pre><code><code>OLLAMA_CONTEXT_LENGTH=32768 ollama serve</code></code></pre><p>For heavier writing projects:</p><pre><code><code>OLLAMA_CONTEXT_LENGTH=64000 ollama serve</code></code></pre><p>Then check whether the model is staying on GPU:</p><pre><code><code>ollama ps</code></code></pre><p>Ollama&#8217;s <a href="https://docs.ollama.com/faq">FAQ</a> says <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">ollama ps</mark> shows which models are loaded into memory, and the <code>Processor</code> column reports whether a model is running fully on GPU, fully on CPU, or split between CPU and GPU. Ollama&#8217;s <a href="https://docs.ollama.com/context-length">context-length docs</a> also recommend checking allocated context and avoiding CPU offload for best performance.</p><p>The rule is simple: reduce context before you blame the model. If 64K is slow, try 32K. If 32K is slow, close GPU-heavy apps, reduce the prompt size, or try <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:26b</mark>. If the model spills into CPU memory, generation can become painfully slow even though the model technically runs.</p><h3>Create a writing-focused Modelfile</h3><p>A Modelfile lets you save context, sampling, and system behavior as a reusable local model profile. Ollama&#8217;s <a href="https://docs.ollama.com/modelfile">Modelfile reference</a> describes a Modelfile as the blueprint for creating customized models with Ollama, and documents <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">PARAMETER</mark> for runtime settings and <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">SYSTEM</mark> for defining the model&#8217;s system behavior.</p><p>Create a file named <code>Modelfile</code>:</p><pre><code><code>FROM gemma4:31b

PARAMETER num_ctx 32768
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER top_k 64

SYSTEM """
You are a private local writing assistant. Help with long-form fiction, nonfiction, outlines, continuity, structure, research synthesis, and editorial revision.

Preserve the user's style unless asked to rewrite.
Do not invent continuity details.
When unsure, say what is missing.
For manuscript review, separate:
1. confirmed details from the supplied text
2. likely inferences
3. contradictions or open questions
4. recommended edits
"""</code></code></pre><p>Create the custom model:</p><pre><code><code>ollama create gemma4-31b-writing -f ./Modelfile</code></code></pre><p>Run it:</p><pre><code><code>ollama run gemma4-31b-writing</code></code></pre><p>For a 64K version, duplicate the Modelfile and change:</p><pre><code><code>PARAMETER num_ctx 64000</code></code></pre><p>Use 256K only when your system can handle it. Gemma 4 31B supports a 256K context window, but the context cache adds memory pressure on top of the base model weights. Google&#8217;s <a href="https://ai.google.dev/gemma/docs/core">Gemma 4 model overview</a> makes clear that context-window memory is separate from the static model weights.</p><p>A writing-focused Modelfile is worth the effort because it saves you from repeating the same instructions in every session. It also makes your local model more predictable. You can create separate profiles for manuscript critique, continuity checking, research synthesis, copyediting, and brainstorming, each with its own context length and system behavior.</p><h3>Use the local API for writing workflows</h3><p>Ollama exposes a local API at <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">localhost:11434</mark>, which makes it useful for writing tools, scripts, editors, and private workflow apps. The Ollama <a href="https://github.com/ollama/ollama/blob/main/docs/api.md">API documentation on GitHub</a> documents <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">/api/generate</mark> and shows options such as model name, prompt, system message, streaming, and generation settings.</p><p>Example:</p><pre><code><code>curl http://localhost:11434/api/generate -d '{
  "model": "gemma4-31b-writing",
  "prompt": "Summarize this chapter and list continuity risks: [paste chapter here]",
  "stream": false,
  "options": {
    "num_ctx": 32768
  }
}'</code></code></pre><p>For chapter-by-chapter workflows, resist the urge to paste an entire novel at once simply because the model supports large context. A better pattern is:</p><ol><li><p>Ask for a chapter summary.</p></li><li><p>Ask for character, setting, and timeline facts.</p></li><li><p>Save those facts into a project bible.</p></li><li><p>Feed the project bible plus the active chapter into the next prompt.</p></li><li><p>Run a final continuity pass on the whole outline or manuscript when needed.</p></li></ol><p>That method is slower than dumping everything into one giant prompt, but it is easier to audit. It also reduces the chance that the model misses a detail buried deep in context. For serious writing work, a stable project bible often beats raw context length because it turns important details into a compact, reviewable memory layer.</p><h3>Best settings for creative writing</h3><p>Google and Ollama list these default Gemma 4 sampling settings:</p><pre><code><code>temperature = 1.0
top_p = 0.95
top_k = 64</code></code></pre><p>They are good starting points for fiction, ideation, outlining, and structural feedback. Ollama lists the same standardized sampling configuration in its <a href="https://ollama.com/library/gemma4">Gemma 4 best practices</a>.</p><p>Use a lower temperature when you want tighter editorial work:</p><pre><code><code>temperature = 0.4
top_p = 0.9
top_k = 40</code></code></pre><p>Use the default settings when you want more variation:</p><pre><code><code>temperature = 1.0
top_p = 0.95
top_k = 64</code></code></pre><p>For continuity, fact extraction, and manuscript diagnosis, use structured prompts. For example:</p><pre><code><code>Read the supplied chapter.

Return only these sections:

1. Scene summary
2. Characters present
3. New facts introduced
4. Timeline markers
5. Continuity risks
6. Questions to resolve before revision

Do not rewrite the chapter.
Do not invent facts that are not present.</code></code></pre><p>For creative rewriting:</p><pre><code><code>Rewrite this scene while preserving:
- point of view
- tense
- character intent
- plot facts
- approximate length

Improve:
- sentence rhythm
- sensory detail
- dialogue subtext

Do not add new lore or backstory.</code></code></pre><p>The core habit is to separate tasks. Ask for fact extraction before revision. Ask for continuity risks before line edits. Ask for style diagnosis before rewriting. Gemma 4 31B can handle complex prompts, but writing workflows become more reliable when each pass has a clear job.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MPEr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MPEr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!MPEr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!MPEr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!MPEr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MPEr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1654732,&quot;alt&quot;:&quot;Gemma 4 31B with Ollama: a practical guide for long-context writing&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200409526?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Gemma 4 31B with Ollama: a practical guide for long-context writing" title="Gemma 4 31B with Ollama: a practical guide for long-context writing" srcset="https://substackcdn.com/image/fetch/$s_!MPEr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!MPEr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!MPEr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!MPEr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F175d24ce-64e3-4fb7-8368-6cc529b8ac20_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Gemma 4 31B can handle long-context writing locally, but memory matters. Here is how to set it up without chasing 256K too soon. &#169; Popular AI</figcaption></figure></div><h3>Should you use Gemma 4 31B or 26B?</h3><p>Use Gemma 4 31B when quality matters more than speed. That includes deep planning, dense reasoning, manuscript critique, style analysis, research synthesis, or difficult continuity work.</p><p>Use Gemma 4 26B when speed matters more. Google&#8217;s <a href="https://ai.google.dev/gemma/docs/core">Gemma 4 model overview</a> describes the 26B A4B model as a Mixture of Experts model designed for high-throughput reasoning, and Ollama lists <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:26b</mark> as an 18GB model with a 256K context window on its <a href="https://ollama.com/library/gemma4">Gemma 4 library page</a>.</p><p>For many writing workflows, 26B may be the better daily driver. A fast model that you use constantly can be more valuable than a stronger model that feels too slow for drafting, outlining, and quick revision passes. Keep 31B for the work where quality matters enough to justify the added latency.</p><p>The best 31B tasks include:</p><ul><li><p>Final continuity review<br></p></li><li><p>Research synthesis<br></p></li><li><p>High-value rewrite passes<br></p></li><li><p>Style consistency checks<br></p></li><li><p>Dense planning sessions<br></p></li><li><p>Long project bible analysis<br></p></li><li><p>Editorial critique before publication<br></p></li></ul><p>The best 26B tasks include quick brainstorming, outline expansion, short scene rewrites, summarization, and routine writing support. Having both models available locally gives you the same workflow logic many writers already use with cloud tools: a faster assistant for daily work and a stronger model for harder passes.</p><h3>Common problems and fixes</h3><h3>The model is too slow</h3><p>Run:</p><pre><code><code>ollama ps</code></code></pre><p>If you see CPU offload, reduce <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">num_ctx</mark>, close other GPU-heavy apps, or use <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:26b</mark> instead. Ollama&#8217;s <a href="https://docs.ollama.com/faq">FAQ</a> explains that <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">ollama ps</mark> shows whether the model is loaded fully on GPU, fully on CPU, or split between CPU and GPU.</p><p>Long context is often the hidden cause of poor performance. A model that feels fine at 8K or 16K may become frustrating at 64K or 128K. Lower the context window, test again, and then increase gradually.</p><h3>The model loads but crashes at long context</h3><p>Lower context first:</p><pre><code><code>PARAMETER num_ctx 32768</code></code></pre><p>Then test 64K. Large context increases memory requirements, and Ollama&#8217;s <a href="https://docs.ollama.com/context-length">context length guide</a> warns that raising context length requires enough available VRAM.</p><p>Crashes are often a sign that the setup is too aggressive for the available memory. Before switching models, try a smaller context window, quit memory-heavy apps, and confirm that the model is not being split into CPU memory.</p><h3>The model ignores earlier details</h3><p>Do not assume the answer proves that the model used every token. Ask it to cite the relevant excerpt from the supplied material before giving advice. For manuscript workflows, make the model extract facts into a project bible and use that as the stable memory layer.</p><p>A practical continuity prompt should ask for evidence before recommendations. For example, ask the model to list the exact chapter fact, the inferred issue, and the suggested fix. That makes hallucinated continuity claims easier to catch.</p><h3>You are on AMD</h3><p>Ollama supports AMD GPUs through ROCm on supported cards, but the support matrix is more specific than NVIDIA support. Ollama&#8217;s <a href="https://docs.ollama.com/gpu">hardware support documentation</a> lists GPU requirements and supported families, including NVIDIA details and platform-specific hardware support.</p><p>AMD users should check support before planning a Gemma 4 31B workstation. A smaller model with reliable acceleration is usually better than a larger model that runs unpredictably or falls back to CPU memory.</p><h3>You are on Apple Silicon</h3><p>Ollama supports Apple GPU acceleration through Metal. Gemma 4 31B is a realistic target only on Macs with enough unified memory, and long-context writing benefits from extra headroom. A 32GB unified-memory Mac may be able to experiment with modest context. A 64GB or larger machine is a better fit for serious long-context work.</p><p>Apple Silicon users should also consider MLX variants when available. Ollama&#8217;s Gemma 4 page lists <a href="https://ollama.com/library/gemma4">MLX tags for some Gemma 4 models</a>, including <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:31b-mlx</mark>, which can be useful for Mac-focused local inference workflows.</p><h3>Where Gemma 4 31B beats a cloud writing tool</h3><p>Gemma 4 31B is attractive when the manuscript, research, or business material is private enough that hosted AI should not be the default.</p><p>The strongest use cases are:</p><ul><li><p>Private fiction and nonfiction manuscripts<br></p></li><li><p>Client drafts under NDA<br></p></li><li><p>Sensitive research notes<br></p></li><li><p>Local coding and writing assistants<br></p></li><li><p>Long project bibles<br></p></li><li><p>Offline brainstorming<br></p></li><li><p>Local agent experiments<br></p></li><li><p>Private editorial workflows<br></p></li><li><p>Internal documentation review<br></p></li><li><p>Draft analysis before a public release<br></p></li></ul><p>The main advantage is control. The model weights can run locally. Your drafts do not need to pass through a cloud chat product. Your workflow can keep working without a subscription, account, or external API quota.</p><p>Local AI also makes experimentation easier. You can build repeatable prompts, test different Modelfiles, run the same project through multiple passes, and keep the workflow stable over time. That stability matters for long-form writing projects that can last months or years.</p><h3>Where cloud models still win</h3><p>Cloud models still make sense when you need the strongest available reasoning, fast inference without buying hardware, polished multimodal tools, browser access, collaboration, managed reliability, or easy sharing across a team.</p><p>A hybrid setup is usually the honest answer. Use Gemma 4 31B locally for private drafts, continuity checks, planning, sensitive notes, and offline fallback. Use a hosted model when the material is low-sensitivity, the task benefits from stronger cloud performance, or the time savings justify the privacy and control tradeoff.</p><p>For many writers, that hybrid approach is more practical than trying to make one tool handle every task. Local models can own private work and repeatable editorial workflows. Cloud models can handle low-risk tasks that benefit from speed, convenience, or the strongest frontier reasoning.</p><h3>Bottom line</h3><p>Gemma 4 31B is worth running locally if you have the memory and you care about keeping long-form writing work under your control. Start with Ollama, pull <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">gemma4:31b</mark>, set a conservative context length, and build a writing-focused Modelfile.</p><p>Do not chase 256K context on day one. Start with a fast, stable 32K setup. Move to 64K when you need it. Use 128K or 256K only when your machine can keep the model in fast memory and the task truly benefits from that much context.</p><p>The model gives writers a serious local option. It still requires structure, testing, and human judgment. That is the right bargain for private writing workflows: more control, more capability, and fewer reasons to hand every draft to a cloud account.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/run-gemma-4-31b-locally-ollama/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/run-gemma-4-31b-locally-ollama/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[llama.cpp vs Ollama vs LM Studio: which is fastest in 2026?]]></title><description><![CDATA[A practical 2026 guide to llama.cpp vs Ollama vs LM Studio, covering benchmarks, GPU offload, context length, APIs and local AI privacy.]]></description><link>https://www.popularai.org/p/llama-cpp-vs-ollama-vs-lm-studio-speed</link><guid isPermaLink="false">https://www.popularai.org/p/llama-cpp-vs-ollama-vs-lm-studio-speed</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Thu, 04 Jun 2026 13:36:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!L5ep!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L5ep!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L5ep!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!L5ep!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!L5ep!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!L5ep!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L5ep!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1674150,&quot;alt&quot;:&quot;llama.cpp vs Ollama vs LM Studio: fastest local LLM tool in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200356661?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="llama.cpp vs Ollama vs LM Studio: fastest local LLM tool in 2026" title="llama.cpp vs Ollama vs LM Studio: fastest local LLM tool in 2026" srcset="https://substackcdn.com/image/fetch/$s_!L5ep!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!L5ep!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!L5ep!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!L5ep!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc540130c-3893-4e70-ae77-e71f3d5c1143_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Compare llama.cpp, Ollama and LM Studio for local LLM speed, API workflows, model management, privacy and real-world usability in 2026. &#169; Popular AI</figcaption></figure></div><p>If you care about llama.cpp vs Ollama vs LM Studio speed, the first answer is simple. The real answer gets messy once you start changing models, quants, context length, GPU offload and runtime settings.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/llama-cpp-vs-ollama-vs-lm-studio-speed?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/llama-cpp-vs-ollama-vs-lm-studio-speed?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>llama.cpp is usually the fastest raw runner when the same GGUF model, quant, context length, backend and offload settings are used. Ollama is usually the easiest way to run a local model behind an API. LM Studio is usually the best desktop app for downloading, testing, chatting and serving models without living in terminal flags.</p><p>The important part is that these tools do not sit at the same layer. llama.cpp is the engine. Ollama and LM Studio are higher-level products that can sit on top of similar local inference technology. That means speed differences often come from defaults, model packaging, context size, GPU offload, runtime version and whether the model is staying fully in fast memory.</p><div><hr></div><h4><em><strong>More on llama.cpp and Ollama:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c37288ae-2bd6-4a43-b423-a9193a87fb9c&quot;,&quot;caption&quot;:&quot;Local inference sounds simple on paper. Download a model, point Ollama or llama.cpp at your GPU, and start chatting. Then the trap shows up. The model loads, but replies dribble out one token at a time, the first t&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Why Ollama and llama.cpp crawl when models spill into RAM, and how to fix it&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-16T15:15:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!fHgx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91019711-eaa2-4daf-b3f9-6b77a7229c81_2560x1369.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/why-ollama-and-llama-cpp-crawl-when-models-spill-into-ram-and-how-to-fix-it&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191486166,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Key takeaways</h3><blockquote><p><strong>Fastest raw option:</strong> llama.cpp, especially for users who know how to tune GPU layers, batch size, context, KV cache, Flash Attention, and backend builds.</p></blockquote><blockquote><p><strong>Best API convenience:</strong> Ollama, because it gives you a simple local server, model pulls, Modelfiles, official Python and JavaScript libraries, and OpenAI-compatible endpoints.</p></blockquote><blockquote><p><strong>Best desktop experience:</strong> LM Studio, because it combines model search, downloads, chat, presets, local server mode, OpenAI-compatible endpoints, and offline document chat in one GUI.</p></blockquote><blockquote><p><strong>Best for Apple Silicon experimentation:</strong> LM Studio deserves a close look because it supports both llama.cpp GGUF models and Apple MLX models on Apple Silicon. Ollama and llama.cpp remain strong options too.</p></blockquote><blockquote><p><strong>Best for repeatable benchmarking:</strong> llama.cpp, because <code>llama-bench</code> directly measures prompt processing and token generation across different settings.</p></blockquote><blockquote><p><strong>Most common mistake:</strong> comparing different quants, context sizes, templates, GPU offload settings, or model load states, then blaming the app.</p></blockquote><div><hr></div><h3>Quick verdict</h3><p><strong>Use llama.cpp if speed is the priority.</strong> It is the cleanest way to run GGUF models with direct control over the runtime. The <a href="https://github.com/ggml-org/llama.cpp">llama.cpp GitHub repository</a> describes the project as local and cloud LLM inference in C and C++ with performance-focused support for backends such as Metal, CUDA, HIP, Vulkan, SYCL, CPU plus GPU hybrid inference and low-bit quantization.</p><p><strong>Use Ollama if workflow speed matters more than benchmark speed.</strong> The <a href="https://docs.ollama.com/api/introduction">Ollama API documentation</a> shows a local API at <code>http://localhost:11434/api</code>, while the project also gives you model management, <code>ollama run</code>, model import, Modelfiles, and official Python and JavaScript libraries. That makes it the easiest default for local coding tools, chat frontends and quick experiments.</p><p><strong>Use LM Studio if you want the fastest path from &#8220;I found a model&#8221; to &#8220;I tested it.&#8221;</strong> The <a href="https://lmstudio.ai/docs/app">LM Studio docs</a> describe a desktop app for macOS, Windows and Linux that supports llama.cpp on all three platforms, MLX on Apple Silicon, model search and downloads through Hugging Face, local chat and OpenAI-compatible serving.</p><p>The practical answer is straightforward. llama.cpp wins for raw speed and control. Ollama wins for simple local API workflows. LM Studio wins for GUI model management and local testing.</p><h3>Why people are interested in this comparison</h3><p>People do not care about this because they enjoy runtime architecture. They care because local LLMs feel inconsistent.</p><p>One setup gets 40 tokens per second. Another gets 12. One app loads the same model on the GPU. Another quietly falls back to CPU or uses a different context length. One GUI feels faster, then a command-line run beats it after a better backend build. Community threads such as this <a href="https://www.reddit.com/r/LocalLLaMA/comments/1pc700g/what_is_the_benifit_of_running_llamacpp_instead/">LocalLLaMA discussion about running llama.cpp instead of LM Studio or Ollama</a> show the same pattern: users compare llama.cpp, Ollama and LM Studio because they see real differences in tokens per second, memory use, GPU offload, setup friction and API behavior.</p><p>The word &#8220;fastest&#8221; also hides two separate questions.</p><p>First, which runner generates tokens fastest on one machine?</p><p>Second, which tool gets useful local AI into your workflow fastest?</p><p>Those are different decisions. llama.cpp often wins the first. Ollama or LM Studio often wins the second.</p><h3>How to compare speed fairly</h3><p>Most bad speed comparisons between llama.cpp, Ollama and LM Studio make at least one common mistake. They use a different model file, a different quant, a different context length, a different prompt template, a different GPU offload setting, a different backend, a different runtime version, a different batch size, a different KV cache setting or a cold model load in one app and an already-loaded model in another.</p><p>The fair test is boring:</p><ol><li><p>Use the same GGUF file.</p></li><li><p>Use the same quant.</p></li><li><p>Use the same prompt.</p></li><li><p>Use the same context size.</p></li><li><p>Confirm GPU offload.</p></li><li><p>Separate prompt processing speed from token generation speed.</p></li><li><p>Run multiple passes after the model is already loaded.</p></li></ol><p>llama.cpp is strongest here because <code>llama-bench</code> is built for exactly this kind of measurement. The <a href="https://github.com/ggml-org/llama.cpp/blob/master/tools/llama-bench/README.md">llama-bench documentation</a> describes tests for prompt processing, text generation and combined prompt-plus-generation, with options for repeated runs, output formats, batch sizes, thread counts and GPU offload experiments.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Speed ranking: which is fastest?</h3><h4>1. llama.cpp: fastest raw runner for most tuned GGUF use</h4><p>llama.cpp is the speed-first pick because it gives you the least abstraction and the most control. You can run a local GGUF directly with <code>llama-cli</code>, serve it with <code>llama-server</code>, choose a backend, tune context and batch settings, then benchmark changes without guessing what a desktop app or daemon decided for you. The <a href="https://github.com/ggml-org/llama.cpp">llama.cpp README</a> shows direct local model execution, Hugging Face model loading and launching an OpenAI-compatible API server from the command line.</p><p>That does not make llama.cpp magically faster in every possible configuration. It means that when users care enough to tune, llama.cpp exposes the knobs that affect speed. It also tends to get new backend work and model support quickly because other tools frequently depend on or track the same lower-level ecosystem. The same <a href="https://github.com/ggml-org/llama.cpp">llama.cpp project page</a> lists active work around server updates, multimodal support, GGUF support through Hugging Face and new model support.</p><p>llama.cpp wins when you want raw single-user speed, direct GGUF control, reproducible benchmarking, CPU-only experiments, CUDA, Metal, Vulkan, HIP, SYCL, hybrid CPU/GPU tuning, lightweight server deployment and exact runtime flags.</p><p>It is weaker for beginner model discovery, friendly chat history, a built-in model library experience, presets, document chat, polished desktop workflows and users who do not want to read runtime flags.</p><p><strong>Verdict:</strong> Use llama.cpp when you want maximum speed, exact control or a clean benchmark baseline.</p><h4>2. Ollama: best speed-to-convenience ratio for local API workflows</h4><p>Ollama is not the lowest-level speed tool. It is the tool that makes local models feel usable fast. The <a href="https://github.com/ollama/ollama">Ollama GitHub repository</a> shows the simple <code>ollama run</code> workflow, local model usage, supported backends and official libraries. The command <code>ollama run gemma3</code> starts a chat, the REST API can run locally, and the project provides official Python and JavaScript libraries.</p><p>Ollama also makes model packaging easier through Modelfiles. The <a href="https://docs.ollama.com/modelfile">Ollama Modelfile reference</a> shows how to build from existing models, Safetensors directories or GGUF files, including <code>FROM ./ollama-model.gguf</code> for GGUF and <code>PARAMETER</code> settings such as <code>num_ctx</code>, temperature, repeat penalty and seed.</p><p>For speed, Ollama&#8217;s defaults matter a lot. Its <a href="https://docs.ollama.com/context-length">context length documentation</a> says the default context depends on VRAM, from 4k below 24 GiB VRAM to 32k on 24 to 48 GiB and 256k at 48 GiB or more. The same page warns that increasing context length raises memory requirements and tells users to verify model offloading with <code>ollama ps</code>.</p><p>That is a major reason Ollama can feel fast in one setup and painfully slow in another. A larger context window can eat enough memory to push a model out of a clean GPU fit. Once that happens, the bottleneck may be context, KV cache, offload or VRAM spill rather than Ollama itself. Popular AI has a full guide to <a href="https://www.popularai.org/p/why-ollama-and-llama-cpp-crawl-when-models-spill-into-ram-and-how-to-fix-it">why Ollama and llama.cpp crawl when models spill into RAM</a>, which is one of the most common local LLM speed traps.</p><p>Ollama wins for simple install and run commands, local API workflows, coding agents, editor integrations, model packaging with Modelfiles, pulling and running common models quickly, keeping models loaded through <code>keep_alive</code> and OpenAI-compatible local endpoints.</p><p>It is weaker for raw benchmark tuning, GUI-first model comparison, exact visibility into every lower-level runtime choice, some custom GGUF workflows and users who want the newest llama.cpp behavior the moment it lands.</p><p>Ollama also has a practical model-load advantage for app workflows. The <a href="https://docs.ollama.com/faq">Ollama FAQ</a> says models are kept in memory for 5 minutes by default, and the <code>keep_alive</code> parameter can keep a model loaded longer, keep it loaded indefinitely with a negative value or unload it immediately with <code>0</code>.</p><p><strong>Verdict:</strong> Use Ollama when your real goal is a dependable local model service for apps, agents, chat frontends and scripts.</p><h4>3. LM Studio: best local LLM desktop app, with strong server features</h4><p>LM Studio is the best choice for people who want to compare local models without turning every test into a terminal session. The <a href="https://lmstudio.ai/docs/app">LM Studio app docs</a> say it supports macOS, Windows and Linux, runs GGUF models through llama.cpp, supports MLX on Apple Silicon, downloads models through the app, manages prompts and configurations, and can serve models through OpenAI-like local endpoints.</p><p>The server story has improved a lot. The <a href="https://lmstudio.ai/docs/app/api/endpoints/openai">LM Studio OpenAI compatibility docs</a> list endpoints for <code>/v1/models</code>, <code>/v1/responses</code>, <code>/v1/chat/completions</code>, <code>/v1/embeddings</code> and <code>/v1/completions</code>. They also show the standard OpenAI client pattern with the base URL changed to <code>http://localhost:1234/v1</code>.</p><p>LM Studio also has its own native REST API for local inference and model management. The <a href="https://lmstudio.ai/docs/app/api/endpoints/rest">LM Studio REST API docs</a> include endpoints to list models, load models, unload models, download models and check download status. The docs also compare endpoint support for streaming, stateful chat, MCPs, tools, model load events, prompt processing events and per-request context length.</p><p>For headless use, LM Studio now has <code>llmster</code>, a daemon-style option. The <a href="https://lmstudio.ai/docs/developer/core/headless">LM Studio headless documentation</a> says LM Studio can run as a background service without the GUI, either through <code>llmster</code> or the desktop app in headless mode.</p><p>LM Studio wins for GUI model discovery, one-app local chat, offline document chat, model downloads and cleanup, easy settings and presets, an OpenAI-compatible local server, a native REST API for model management and Apple Silicon users who want MLX as an option.</p><p>It is weaker for lowest-level speed tuning, fully open-source control, minimal server footprint compared with llama.cpp, users who want everything in config files and shell scripts, and workloads where a GUI app adds no value.</p><p>LM Studio is free for home and work use as of July 8, 2025, according to the <a href="https://lmstudio.ai/blog/free-for-work">LM Studio free-for-work announcement</a>. The current homepage also describes the app as free for home and work use.</p><p><strong>Verdict:</strong> Use LM Studio if you want the best local LLM app experience and a capable local server without giving up your afternoon to command-line tuning.</p><h3>OpenAI-compatible API comparison</h3><p>All three options can serve local models behind API-like workflows, but they are not equally polished for every use.</p><p><strong>llama.cpp:</strong> <code>llama-server</code> provides a fast local REST API and OpenAI-compatible endpoints. The <a href="https://www.mintlify.com/ggml-org/llama.cpp/api/rest/overview">llama.cpp REST API overview</a> lists OpenAI-compatible API support, streaming responses, GPU acceleration, router mode for multiple models, experimental multimodal support, function calling and deployment through Docker, native binaries or cloud platforms.</p><p><strong>Ollama:</strong> The <a href="https://docs.ollama.com/api/introduction">Ollama API introduction</a> describes an API that runs by default at <code>http://localhost:11434/api</code>, while its OpenAI compatibility docs cover <code>/v1/responses</code>, streaming, tools, reasoning summaries and supported fields. The docs also note that stateful <code>previous_response_id</code> and <code>conversation</code> are not supported for the Responses API path.</p><p><strong>LM Studio:</strong> LM Studio exposes OpenAI-compatible endpoints on <code>localhost:1234/v1</code>. The <a href="https://lmstudio.ai/docs/app/api/endpoints/openai">LM Studio OpenAI compatibility documentation</a> covers <code>/v1/responses</code>, <code>/v1/chat/completions</code>, <code>/v1/embeddings</code>, <code>/v1/completions</code> and <code>/v1/models</code>. Its native v1 REST API adds model loading, unloading, downloads and richer local model management.</p><p><strong>API verdict:</strong> Ollama is the simplest default. LM Studio is stronger if you also want GUI control and model management. llama.cpp is the cleanest speed-first server when you are comfortable owning the flags.</p><h3>Model management comparison</h3><p><strong>llama.cpp model management:</strong> You manage files yourself. That is annoying for beginners and excellent for control. You choose the GGUF, put it where you want, run it directly and benchmark it directly. The project&#8217;s normal local model path uses GGUF, and this <a href="https://raw.githubusercontent.com/ggml-org/llama.cpp/b7376/README.md">raw llama.cpp README snapshot</a> points to the model conversion and local model workflow that make llama.cpp attractive to users who want file-level control.</p><p><strong>Ollama model management:</strong> Ollama gives you a packaging layer. Models live in Ollama&#8217;s model store by default, and the <a href="https://docs.ollama.com/faq">Ollama FAQ</a> lists the default model locations on macOS, Linux and Windows. You can change the location with <code>OLLAMA_MODELS</code>.</p><p><strong>LM Studio model management:</strong> LM Studio is the easiest for browsing, downloading, testing and deleting models. The <a href="https://lmstudio.ai/docs/app">LM Studio docs</a> say the app can search and download through Hugging Face, manage local models, prompts and configurations, and attach documents for offline local chat.</p><p><strong>Model management verdict:</strong> LM Studio is best for humans. Ollama is best for repeatable app workflows. llama.cpp is best for users who want file-level control.</p><h3>Privacy and control</h3><p>All three tools can run models locally, but the privacy story depends on which features you use.</p><p><strong>llama.cpp</strong> is the cleanest from a control standpoint because it is a local open-source runtime under the MIT license, according to the <a href="https://github.com/ggml-org/llama.cpp">llama.cpp repository</a>. You still need to check the license of the model weights you run.</p><p><strong>Ollama</strong> says local prompts and responses stay on your machine when you run locally. The <a href="https://ollama.com/privacy">Ollama privacy policy</a> says Ollama does not collect, store, transmit or access local prompts, responses, model interactions or other locally processed content. It separately says cloud-hosted models process prompts and responses transiently to provide the service.</p><p><strong>LM Studio</strong> says messages, chat histories and documents are not transmitted from your system by default, and that the app can run entirely offline. The <a href="https://lmstudio.ai/privacy">LM Studio privacy policy</a> says LM Studio receives data when users search for or download AI models, when the app checks for updates or when users email the company.</p><p>There is one important caveat. Local software privacy is different from zero network activity. Model downloads, update checks, cloud features, Hub features, remote links, telemetry policies and support requests can change what data leaves the machine. For sensitive work, disable cloud features you do not need, keep models local, avoid remote tunnels unless necessary and test with network access off before trusting the setup.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lqkN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lqkN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!lqkN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!lqkN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!lqkN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lqkN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1616261,&quot;alt&quot;:&quot;llama.cpp, Ollama or LM Studio? The fastest local AI runner&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200356661?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="llama.cpp, Ollama or LM Studio? The fastest local AI runner" title="llama.cpp, Ollama or LM Studio? The fastest local AI runner" srcset="https://substackcdn.com/image/fetch/$s_!lqkN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!lqkN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!lqkN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!lqkN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F829d115b-e36c-43e1-8354-6ed5d9905294_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">llama.cpp usually wins raw speed, but Ollama and LM Studio may be faster for actual workflows. Here&#8217;s how to choose the right local LLM tool. &#169; Popular AI</figcaption></figure></div><h3>Speed traps that make the wrong tool look slow</h3><h4>Context length</h4><p>Context length is one of the easiest ways to ruin a fair comparison. The <a href="https://docs.ollama.com/context-length">Ollama context length docs</a> say larger context length increases memory requirements, and tasks such as web search, agents and coding tools may need at least 64,000 tokens. That can be useful, but it is also expensive in memory.</p><h4>Cold starts</h4><p>A model that is already loaded will feel much faster than one that has to load from disk first. The <a href="https://docs.ollama.com/faq">Ollama FAQ</a> says models are kept loaded for 5 minutes by default and lets API users control that with <code>keep_alive</code>. LM Studio has Idle TTL and Auto-Evict settings for loaded models, and its docs say models loaded through <code>lms load</code> do not have a TTL unless one is set.</p><h4>GPU offload</h4><p>A model running fully on GPU feels completely different from one split between GPU and CPU. The <a href="https://github.com/ggml-org/llama.cpp">llama.cpp repository</a> describes CPU plus GPU hybrid inference for models larger than total VRAM capacity, while the <a href="https://docs.ollama.com/context-length">Ollama context length docs</a> tell users to check the <code>PROCESSOR</code> split with <code>ollama ps</code>.</p><h3>Backend differences</h3><p>LM Studio can use llama.cpp runtimes and, on Apple Silicon, MLX. The <a href="https://github.com/ollama/ollama">Ollama repository</a> names llama.cpp as a supported backend. The same model can behave differently depending on runtime version, backend, driver path and hardware support.</p><h3>Prompt processing versus token generation</h3><p>A local model can be slow before the first token because prompt processing is the bottleneck, then fast after generation begins. It can also do the reverse. The <a href="https://github.com/ggml-org/llama.cpp/blob/master/tools/llama-bench/README.md">llama-bench docs</a> separate prompt processing from text generation, which is exactly the distinction users need when diagnosing speed.</p><h3>Best choice by use case</h3><h4>Fastest for one local user</h4><p>Use <strong>llama.cpp</strong>.</p><p>This is the answer for people who want the highest token speed from a local GGUF model and are willing to tune. It is also the best baseline for deciding whether Ollama or LM Studio is losing speed because of defaults.</p><h4>Best for local coding agents</h4><p>Use <strong>Ollama</strong> first, then test <strong>llama.cpp</strong> if performance becomes the bottleneck.</p><p>Ollama&#8217;s local API, model management, Python and JavaScript libraries, and integrations make it the convenient first choice for tools that expect a persistent local model service. The <a href="https://github.com/ollama/ollama">Ollama GitHub page</a> is the best starting point for the project&#8217;s local run workflow and supported model ecosystem.</p><h4>Best for trying many models</h4><p>Use <strong>LM Studio</strong>.</p><p>The GUI matters when you are comparing models, templates, quants and chat behavior. LM Studio is also strong when you want a local chat app and a local server from the same tool. The <a href="https://lmstudio.ai/docs/app">LM Studio docs</a> cover the app&#8217;s model search, download, local chat and serving features.</p><h4>Best for a home server</h4><p>Use <strong>Ollama</strong> for simplicity or <strong>llama.cpp</strong> for control.</p><p>Ollama is easier to operate as a local service. llama.cpp is better when you want to build the server yourself and tune the runtime. If your home server is part of a broader Open WebUI or private document setup, Popular AI&#8217;s <a href="https://www.popularai.org/p/the-best-private-family-ai-nas-build">private family AI NAS build</a> is a useful next read.</p><div><hr></div><h4><em><strong>More on local AI servers:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f6972811-d9ad-4e76-b5de-8e0c5eece2b2&quot;,&quot;caption&quot;:&quot;Readers keep asking the same question in slightly different ways: can Proxmox be the sane way to run local AI, or does GPU passthrough turn your server into a weekend-long science project? The sh&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best Proxmox AI server build for Ollama in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-27T14:42:08.272Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!xtmG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8475f5ca-da2d-45da-b1b9-831b63fa392c_2400x1491.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/the-best-proxmox-ai-server-build&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192077067,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h4>Best for Mac users</h4><p>Use <strong>LM Studio</strong> if you want a desktop app. Use <strong>Ollama</strong> if you want a simple local API. Use <strong>llama.cpp</strong> if you want maximum control.</p><p>LM Studio&#8217;s extra Apple Silicon angle is MLX support, and the <a href="https://lmstudio.ai/docs/app">LM Studio docs</a> make that support part of the platform story. For Mac users, the real buying question is memory and bandwidth, because unified memory can change what feels practical on a local AI machine.</p><div><hr></div><h4><em><strong>More on local AI for Mac users:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;58e098af-5263-486d-a771-090ac9edc20a&quot;,&quot;caption&quot;:&quot;A Mac mini can be a surprisingly good local LLM machine in 2026, but only if you buy the right memory tier. The wrong Mac mini for local AI usually fails because the model does not fit cleanly in unified memory. CPU binning matters, G&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Mac mini LLM performance in 2026: which model should you buy?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-12T16:52:31.992Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zHxt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f0ab956-91e4-4590-9801-99669cf81360_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/mac-mini-llm-performance-in-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:197374222,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h4>Best for a budget local AI PC</h4><p>Use <strong>Ollama</strong> to start, then add <strong>llama.cpp</strong> when you want more speed and measurement.</p><p>On a used RTX 3090 setup, the 24GB VRAM tier gives you far more room before you hit memory pain. Popular AI&#8217;s <a href="https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026">budget local AI PC guide</a> covers why a used RTX 3090 remains a strong first local AI box in 2026.</p><div><hr></div><h4><em><strong>More on local AI on a budget:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;60cef0bd-e923-4675-b967-1051906c1a28&quot;,&quot;caption&quot;:&quot;The best budget local LLM PC under $1,000 in 2026 starts with one boring rule: buy as much NVIDIA VRAM as the budget can handle, then keep everything else practical.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to build a local AI PC under $1,000 in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-19T13:35:36.745Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!1CQp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-budget-local-llm-pc-under-1000-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:198385359,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h4>Best for laptops</h4><p>Use <strong>LM Studio</strong> if you want the friendliest GUI and easier model testing. Use <strong>Ollama</strong> if your laptop is going to act like a local API endpoint. Popular AI&#8217;s <a href="https://www.popularai.org/p/best-laptops-for-local-llms-2026">best laptops for local LLMs guide</a> covers Ollama and LM Studio laptop choices by VRAM and unified memory.</p><div><hr></div><h4><em><strong>More on self-hosted AI on a laptop:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;fd6cba1f-0540-4d46-b042-8041f21e61c0&quot;,&quot;caption&quot;:&quot;You do not need a custom desktop to run local LLMs with Ollama or LM Studio in 2026. You do need to stop shopping like a gamer. For local inference, memory is usually the first thing that decides whether a laptop fee&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best laptops for running local LLMs in 2026: 5 smart picks&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-11T14:41:38.560Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!YoET!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5292cd6d-38ac-4490-8ec8-57f35d970411_2400x1350.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-laptops-for-local-llms-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:193888747,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>When none of these is the right tool</h3><p>llama.cpp, Ollama and LM Studio are excellent local AI tools, but they are not always the right answer for high-concurrency serving.</p><p>If you need production-style multi-user throughput, batching and GPU saturation, look at vLLM or similar serving stacks instead. <a href="https://vllm.ai/">vLLM</a> describes itself as a high-throughput and memory-efficient inference and serving engine, and its docs include an OpenAI-compatible server path.</p><p>That does not make vLLM the better desktop local AI choice. It means the workload changed. One person chatting with a local model is a different problem from serving dozens of requests.</p><h3>Practical recommendation</h3><p>Choose based on the job, not brand loyalty.</p><p>Pick <strong>llama.cpp</strong> when raw speed is the main goal, when you want to benchmark properly, when CLI flags do not scare you, when exact GGUF control matters and when you are tuning CPU, CUDA, Metal, Vulkan, HIP or hybrid offload.</p><p>Pick <strong>Ollama</strong> when you want a local model server quickly, when you are building with Python, JavaScript, Open WebUI, coding tools or local agents, when you want simple model pulls and Modelfiles, and when you prefer a stable local API over direct runtime tuning.</p><p>Pick <strong>LM Studio</strong> when you want the best desktop local LLM experience, when you test lots of models, when you want model search, downloads, chat, presets and server mode in one app, when you are on Apple Silicon and want access to both llama.cpp and MLX paths, and when you want local document chat without building the whole stack yourself.</p><div><hr></div><h2>FAQ</h2><h4>Is llama.cpp faster than Ollama?</h4><blockquote><p>Usually, yes, when the same model file, quant, context size, backend and offload settings are used and llama.cpp is tuned well. Ollama is built for convenience and model management, while llama.cpp exposes more direct runtime control and benchmarking tools through <code>llama-bench</code>.</p><div><hr></div></blockquote><h4>Is LM Studio faster than Ollama?</h4><blockquote><p>Sometimes. It depends on hardware, backend, model file, runtime version, context size, GPU offload and settings. <a href="https://lmstudio.ai/docs/app">LM Studio</a> can use llama.cpp and MLX, while Ollama uses a model-management and server layer around its supported backends. The only honest answer is to test the same model and quant under the same conditions.</p><div><hr></div></blockquote><h4>Does Ollama use llama.cpp?</h4><blockquote><p>The <a href="https://github.com/ollama/ollama">Ollama GitHub README</a> lists llama.cpp as a supported backend. That does not mean every Ollama run behaves exactly like a manually tuned llama.cpp command, because Ollama adds its own model packaging, server behavior, defaults and management layer.</p><div><hr></div></blockquote><h4>Can LM Studio run as a server?</h4><blockquote><p>Yes. LM Studio can run a local server from the Developer tab or with <code>lms server start</code>, and its <a href="https://lmstudio.ai/blog/openresponses">Open Responses announcement</a> shows local models working through LM Studio&#8217;s OpenAI-compatible API path. LM Studio also has native REST endpoints for model management.</p><div><hr></div></blockquote><h4>Which one is best for privacy?</h4><blockquote><p>For pure local control, llama.cpp is the cleanest because it is a local runtime and you manage the files yourself. Ollama and LM Studio can also be private for local use, according to the <a href="https://ollama.com/privacy">Ollama privacy policy</a> and <a href="https://lmstudio.ai/privacy">LM Studio privacy policy</a>, but users should distinguish local execution from model downloads, update checks, Hub features, remote links, cloud-hosted models and support requests.</p><div><hr></div></blockquote><h4>Which one should beginners use?</h4><blockquote><p>Most beginners should start with LM Studio if they want a desktop app or Ollama if they want a simple local API. Move to llama.cpp when speed, benchmarking or exact control becomes more important than convenience.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>For 2026, the best default stack is to use <strong>LM Studio</strong> for discovering, downloading and testing models, <strong>Ollama</strong> for app integrations, local APIs and daily agent workflows, and <strong>llama.cpp</strong> for speed testing, tuning and serious control.</p><p>If the article has to answer one phrase, <strong>llama.cpp vs Ollama vs LM Studio speed</strong>, the winner is <strong>llama.cpp</strong>. If the question is which tool most people should use first, the answer is <strong>Ollama for server workflows</strong> and <strong>LM Studio for GUI workflows</strong>.</p><p>The right move is not to pick one religion. Use the stack that gives you the most useful local capability with the least dependency and the fewest hidden bottlenecks.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/llama-cpp-vs-ollama-vs-lm-studio-speed/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/llama-cpp-vs-ollama-vs-lm-studio-speed/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[The ChatGPT sidebar sucks now. Here’s how to fix it.]]></title><description><![CDATA[If your ChatGPT pinned chats, GPTs, or Projects disappeared, do not panic. Here is what changed and how to get them back.]]></description><link>https://www.popularai.org/p/chatgpt-sidebar-pinned-chats-gpts-projects-missing</link><guid isPermaLink="false">https://www.popularai.org/p/chatgpt-sidebar-pinned-chats-gpts-projects-missing</guid><dc:creator><![CDATA[Ben Geudens]]></dc:creator><pubDate>Wed, 03 Jun 2026 14:01:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!vIiK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vIiK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vIiK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!vIiK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!vIiK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!vIiK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vIiK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1580090,&quot;alt&quot;:&quot;ChatGPT sidebar changed? How to find missing pinned chats, GPTs, and Project&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200419614?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ChatGPT sidebar changed? How to find missing pinned chats, GPTs, and Project" title="ChatGPT sidebar changed? How to find missing pinned chats, GPTs, and Project" srcset="https://substackcdn.com/image/fetch/$s_!vIiK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!vIiK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!vIiK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!vIiK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb96306-8a20-42bc-9ac7-209488b594b5_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A sudden ChatGPT sidebar change made important chats, GPTs, and Projects harder to reach. Here is how to get your workspace back. &#169; Popular AI</figcaption></figure></div><p>I opened my ChatGPT workspace this morning and almost got a heart attack.</p><p>My pinned chats were gone. These are not casual chats. These are the important ones, the threads I keep around because they hold useful work, repeated workflows, planning, research trails, and whatever else I do not want to lose in the usual swamp of recent chats.</p><p>My custom GPTs were missing too.</p><p>Then, I tried to open a project, and for some reason clicking the project in the sidebar no longer opened the project dashboard.</p><p>At first glance, it looked like ChatGPT had eaten my workspace overnight.</p><p>It had not. All my stuff was still there. But there must have been a ChatGPT desktop sidebar update I had not been told about, and the new layout hid several of the features I use every day.</p><p>If your ChatGPT pinned chats, custom GPTs, or Projects look missing today, do not panic yet. Here is what fixed it for me.</p><h3>Quick fix</h3><p>If your pinned chats are missing, look for a new <strong>Pinned</strong> section in the ChatGPT desktop sidebar. It may be collapsed. Expand it and your pinned chats should be there.</p><p>If your custom GPTs are missing, click <strong>... More</strong> in the sidebar, choose <strong>GPTs</strong>, go to <strong>My GPTs</strong>, open the GPT you want, click the GPT name at the top of the chat, then choose <strong>Pin</strong>. It should now appear in the new <strong>Pinned</strong> section.</p><p>If a Project disappeared from your normal Projects list, check the new <strong>Pinned</strong> section too. Some Projects may now be there.</p><p>If clicking a Project no longer opens the Project dashboard, hover over the Project in the sidebar and click the small write or pen icon.</p><p>Nothing is permanently lost, and there is a workaround for all these changes. However, this does stress a bigger issue: tools like ChatGPT, Claude or Gemini, for some of us, are now workspaces for serious use cases, and companies like OpenAI, Anthropic or Google can change that workspace overnight.</p><h3>What changed in the ChatGPT sidebar?</h3><p>Based on what I saw on ChatGPT desktop web today, the sidebar appears to have changed in several ways.</p><p>Pinned chats, pinned GPTs, and pinned Projects now appear to live inside a shared <strong>Pinned</strong> section.</p><p>That might sound harmless if the section is expanded. But if it is collapsed, the effect is brutal: your most important chats look like they disappeared.</p><p>Custom GPTs also seem less directly accessible from the main sidebar. OpenAI&#8217;s own GPT documentation says users can access GPTs through the GPTs area in ChatGPT, and OpenAI&#8217;s GPT creation documentation says users can manage their own GPTs by opening <strong>Explore GPTs</strong> and selecting <strong>My GPTs</strong>. That path still works, but it is more buried than a sidebar shortcut.</p><p>Projects are affected too. OpenAI describes <a href="https://help.openai.com/en/articles/10169521-projects-in-chatgpt">Projects in ChatGPT</a> as workspaces that can contain chats, files, and instructions. For users who rely on Projects as real working folders, the dashboard is not decorative. It is the place where project structure, files, and settings live.</p><p>As of this writing, I could not find an official OpenAI release note clearly explaining this exact desktop sidebar change. OpenAI&#8217;s <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">current ChatGPT release notes</a> list recent changes such as Active Sessions on June 2, job search and resume formatting on June 1, and earlier updates to model selection, files, memory, and mobile sidebar behavior. An older ChatGPT web sidebar redesign note in the <a href="https://help.openai.com/en/articles/10128477-chatgpt-enterprise-edu-release-notes">ChatGPT Enterprise and Edu release notes</a> shows that such redesigns happend before, however. Back then, OpenAI limited recent conversations in the sidebar and moved recent GPTs and pinned GPTs below conversations.</p><p>In other words: there&#8217;s nothing new under the sun. This looks like another round of the same problem.</p><div><hr></div><h4><em><strong>More on ChatGPT troubleshooting:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;75217a78-e14c-4dd6-8c9e-fdf4ae58e526&quot;,&quot;caption&quot;:&quot;Why do long ChatGPT threads go bad after they seem so useful at first?&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Why long ChatGPT threads break down, and how to fix them&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-14T14:33:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!PQmd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00be5707-26fa-4417-9137-6426b763db16_1312x736.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/why-long-chatgpt-threads-break-down-and-how-to-fix-them&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191487944,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Fix 1: Missing pinned chats</h3><p>If your pinned chats are missing, do not assume they were deleted.</p><p>Look in the desktop sidebar for a section called <strong>Pinned</strong>.</p><p>If you do not see your chats under it, the section may be collapsed. Expand it.</p><p>That fixed the missing pinned-chat problem for me. The chats were still there. They had been hidden inside a new sidebar section I had no reason to expect.</p><p>This is bad UX because pinned chats are, by definition, the chats users have marked as important. Hiding them behind a collapsed section without a clear warning creates the exact problem pinning is supposed to prevent. At least open/expand the &#8220;look here for your most valued, precious stuff&#8221; section by default, guys.</p><p>OpenAI has described <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">pinned chats as a way to keep important conversations quickly accessible</a>. If the interface suddenly makes those pinned chats look gone, the feature has failed at its primary job.</p><div><hr></div><h3>Fix 2: Missing custom GPTs</h3><p>This one is more annoying.</p><p>If your custom GPTs vanished from the sidebar, they are probably still there. Reaching them is now a pain in the ass.</p><p>Here is the path that worked for me on desktop:</p><ol><li><p>Click <strong>... More</strong> in the ChatGPT sidebar.</p></li><li><p>Click <strong>GPTs</strong> in the dropdown.</p></li><li><p>This opens the <strong>Explore GPTs</strong> screen.</p></li><li><p>Click <strong>My GPTs</strong>.</p></li><li><p>Click the GPT you want to use.</p></li></ol><p>That gets you to the GPT, but it is too much friction for something many users treat like a daily tool.</p><p>To restore a custom GPT to the sidebar:</p><ol><li><p>Use the steps above to find and open your custom GPT.</p></li><li><p>Inside the GPT, click the GPT name at the top of the chat.</p></li><li><p>This is the same top area where you normally find model or chat controls.</p></li><li><p>In the dropdown, click <strong>Pin</strong>.</p></li><li><p>The GPT should now appear in the sidebar under the new <strong>Pinned</strong> section.</p></li></ol><p>OpenAI&#8217;s documentation explains that <a href="https://help.openai.com/en/articles/8554407-gpts-in-chatgpt">GPTs are customized versions of ChatGPT</a>, which is exactly why hiding them hurts power users. A custom GPT is often a reusable tool, not a one-off conversation.</p><p>An important catch: pinned GPTs now appear alongside pinned chats and Projects. That may be tidy from a product-design perspective, but it mixes different object types into one list. A chat, a custom GPT, and a Project are not the same thing. Users rely on them differently.</p><p>A custom GPT is a reusable assistant. A pinned chat is an ongoing thread. A Project is a workspace. Collapsing all of them into one section may save sidebar space, but it also makes the user do more mental sorting. In other words: a newly created problem that will likely prompt another silent UI overhaul at some point.</p><div><hr></div><h3>Fix 3: Missing Projects</h3><p>If a Project is missing from the normal Projects list, check the new <strong>Pinned</strong> section.</p><p>Some of my Projects disappeared from the Projects list and could only be found in <strong>Pinned</strong>, even though I do not remember ever pinning them.</p><p>That is the kind of change that makes users doubt themselves. Did I pin this? Did ChatGPT pin it for me? Did the Projects list stop showing all Projects? Is this a bug? Is it an A/B test?</p><p>I do not know. What I do know is that if a Project appears missing, the first place to check is now <strong>Pinned</strong>.</p><p>This matters because Projects are one of ChatGPT&#8217;s most useful power-user features. OpenAI says Projects can include <a href="https://help.openai.com/en/articles/10169521-projects-in-chatgpt">chats, uploaded files, and custom instructions</a>. They are closer to workspaces than labels.</p><p>If those workspaces move around without warning, users waste time doing archaeology inside their own account.</p><div><hr></div><h3>Fix 4: Cannot open a Project dashboard anymore</h3><p>This was the most annoying part.</p><p>Previously, clicking a Project in the sidebar opened the Project dashboard or start screen. That made sense. The Project name functioned like a folder.</p><p>Now, at least in the interface I saw, clicking a Project does not open the Project dashboard anymore.</p><p>The workaround: hover over the Project and click the small write or pen icon.</p><p>This is strange because the icon does not obviously communicate &#8220;open Project dashboard.&#8221; A pen icon usually means edit, compose, or start writing. If the only way into the Project start screen is now hidden behind a hover-only icon, that is a bad affordance for a core workspace feature.</p><p>It also makes Projects feel less like stable folders and more like items in a constantly shifting app shell.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Why would OpenAI make custom GPTs and Projects harder to reach?</h3><p>The charitable explanation here is sidebar pressure: how do you sort and manage so many items in one vertical space?</p><p>AI chatbots are no longer a simple chat products. OpenAI keeps adding more surfaces: Projects, GPTs, Library, Apps, Codex, Images, Pulse, jobs, finance tools, spreadsheets, memory sources, and more. OpenAI&#8217;s <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">release notes</a> show how much has been added to ChatGPT in 2026 alone.</p><p>At some point, the sidebar becomes a product battlefield. Every new feature demands space. Every team wants visibility. Every new workflow wants to feel first-class.</p><p>That may explain why OpenAI would combine pinned chats, GPTs, and Projects into one <strong>Pinned</strong> bucket. It may also explain why GPTs are pushed behind <strong>More</strong> and Projects behave more like quick-start items than folders.</p><p>But no amount of marketing or executive explanations makes unrequested UI changes automatically good. Certainly not for everyone.</p><p>For power users, custom GPTs and Projects are not obscure extras. They are the structure that makes ChatGPT usable for repeated work. Hiding them may make the interface look cleaner for casual users, but it makes the tool worse for businesses or people who built elaborate workflows around it.</p><p>There is also a product incentive angle worth watching. OpenAI seems to be making ChatGPT into a broader consumer and work platform, not just a chatbot. Apps, connectors, job search, finance, file libraries, spreadsheets, Codex, and project sharing all compete for the same navigation space.</p><p><a href="https://www.popularai.org/p/average-users-dumb-down-ai-chatbots">A recent Popular AI article</a> signals another sad truth about cloud-based AI companies, especially when their services are broadly accessible: power users do not make up the majority of users, and they tend to become less and less of a concern to tech companies as their user base grows.</p><p>It is not unthinkable that individual tech-savvy users might find themselves nudged towards business and professional plans, leaving consumer-tier plans only for those who want to have an AI buddy, therapist, travel planner or stylist.</p><div><hr></div><h4><em><strong>More on the power user versus casual user divide:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;4a72c8fd-833c-4d19-b580-d28a088d30e8&quot;,&quot;caption&quot;:&quot;The biggest risk to mass-use AI chatbots is not that they stop getting smarter. It is that their default behavior becomes optimized for the easiest user to please, the least risky answer to publish, and the&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Will the average user make AI worse for power users?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-18T07:59:10.714Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zRa2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d4b6ba-af0d-413e-b32a-8396235da50b_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/average-users-dumb-down-ai-chatbots&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:198226636,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><p>Either way, when a company controls the full hosted interface, it can decide which features get prime placement and which ones get buried. Whether you like that or not.</p><h3>This fits a longer pattern of unrequested ChatGPT changes</h3><p>This would not the first time ChatGPT users have had to relearn the interface.</p><p>OpenAI&#8217;s older web release notes describe a previous <a href="https://help.openai.com/en/articles/10128477-chatgpt-enterprise-edu-release-notes">sidebar redesign</a> that limited recent conversations in the sidebar, moved recent and pinned GPTs below conversations, and changed the sidebar&#8217;s behavior. More recently, OpenAI moved model selection into the composer and moved thinking-effort controls into the model picker, according to the <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">ChatGPT release notes</a>. On mobile, OpenAI also said it <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">simplified the sidebar</a> so experiences like Images, Codex, Pulse, and Apps moved into a horizontal bar above chats and projects.</p><p>Some of these changes may be useful. Some may be necessary as the product grows.</p><p>The problem is that ChatGPT, and products like it, are now important enough that sudden interface changes have real consequences for professionals and businesses.</p><p>If a writing app moves a button, that is annoying. If the AI workspace you use for research, publishing, coding, project planning, file analysis, custom instructions, and recurring workflows hides your pinned chats and custom GPTs overnight, that is a reliability problem.</p><p>It is also a dependency problem.</p><h3>What this says about cloud AI tools</h3><p>Cloud-based commercial AI services can upset your entire workflow overnight. They can move buttons. They can hide sections. They can retire models. They can change plan limits. They can add new products to the interface. They can demote features you use every day. They can change how your workspace behaves before you even start your morning coffee.</p><p>OpenAI&#8217;s <a href="https://openai.com/policies/row-terms-of-use/">terms of use</a> say its services may be modified from time to time. That is normal SaaS language. It is also the bargain users accept when their workflow lives inside a hosted product.</p><p>ChatGPT is still extremely useful. I continue to use it, alongside other alternatives, and will likely continue to do so. The point is not &#8220;never use hosted AI.&#8221; That would be silly.</p><p>The point is that rented capability does not come with true ownership or control.</p><p>If your workflow depends on ChatGPT&#8217;s sidebar, model picker, Projects, custom GPTs, memory behavior, or file interface staying exactly where it was yesterday, you do not fully control that workflow, and you will have to accept that your user experience may not always be stable or reliable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g_zg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g_zg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!g_zg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!g_zg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!g_zg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g_zg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1784413,&quot;alt&quot;:&quot;ChatGPT pinned chats and GPTs missing? The new sidebar may be hiding them&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200419614?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ChatGPT pinned chats and GPTs missing? The new sidebar may be hiding them" title="ChatGPT pinned chats and GPTs missing? The new sidebar may be hiding them" srcset="https://substackcdn.com/image/fetch/$s_!g_zg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!g_zg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!g_zg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!g_zg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968adc70-d44d-41f5-ac23-6f05fbfaac2a_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI&#8217;s latest desktop sidebar change shows the hidden cost of relying on cloud AI tools for serious daily work. &#169; Popular AI</figcaption></figure></div><h3>Local AI does not have this exact problem</h3><p>Local and self-hosted AI tools have plenty of shortcomings.</p><p>They can be slower. They can be harder to install. They can break after updates. They can require real hardware. They often trail the best hosted frontier models. A local workflow can also become its own maintenance project if you are careless.</p><p>Popular AI has covered this with practical local workflow pieces, including <a href="https://www.popularai.org/p/pewdiepie-odysseus-ai-workspace">PewDiePie&#8217;s Odysseus AI workspace</a>, <a href="https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent">local coding agents with GGUF Loader</a>, and a <a href="https://www.popularai.org/p/a-local-perplexity-alternative-with">local Perplexity-style research workflow with Vane, Ollama, and SearXNG</a>. We have also covered the other side of the bargain: <a href="https://www.popularai.org/p/why-comfyui-updates-break-workflows-and-how-to-fix-them">local tools can break after updates too</a>.</p><p>But there is one huge difference: with local, self-hosted AI, you control the update schedule.</p><p>A local setup does not usually wake up one morning and hide your most important work because a vendor changed the hosted sidebar overnight. You can pin versions. You can back up folders. You can delay updates. You can clone a working setup before experimenting. You can keep your own file structure outside someone else&#8217;s product decisions.</p><p>That kind of control does not come for free, even if the software and the models do: it costs setup time, hardware, maintenance, and patience. Blood, sweat and tears, in many cases.</p><p>But cloud-based AI companies messing with our user interfaces should serve as a timely reminder of why local AI matters.</p><p>Hosted ChatGPT is powerful. It is also a managed workspace that OpenAI can rearrange whenever it wants.</p><p>Use it. Benefit from it. But do not forget what it is.</p><p>Your pinned chats are not gone this time. Your GPTs are not deleted. Your Projects probably still exist.</p><p>Even so: if an AI tool is central to your work, even the smallest unannounced UI update can easily burn an otherwise productive morning.</p><div><hr></div><h3>FAQ</h3><h4>Why are my ChatGPT pinned chats missing?</h4><blockquote><p>They may not be missing. Look for the new <strong>Pinned</strong> section in the ChatGPT desktop sidebar. If it is collapsed, expand it. Your pinned chats should appear there.</p><div><hr></div></blockquote><h4>Where did my custom GPTs go?</h4><blockquote><p>On desktop, click <strong>... More</strong> in the sidebar, choose <strong>GPTs</strong>, then click <strong>My GPTs</strong>. Open the GPT you want. To restore it to the sidebar, open the GPT, click the GPT name at the top, and choose <strong>Pin</strong>.</p><div><hr></div></blockquote><h4>Why are my Projects missing from the Projects list?</h4><blockquote><p>Some Projects may now appear under the new <strong>Pinned</strong> section. Check there before assuming the Project is deleted.</p><div><hr></div></blockquote><h4>How do I open a ChatGPT Project dashboard now?</h4><blockquote><p>Hover over the Project in the sidebar and click the small write or pen icon. In the interface I saw, simply clicking the Project name no longer opened the Project dashboard.</p><div><hr></div></blockquote><h4>Did OpenAI announce this sidebar change?</h4><blockquote><p>I could not find a clear official release note for this exact desktop sidebar behavior as of June 3, 2026. OpenAI has documented <a href="https://help.openai.com/en/articles/10128477-chatgpt-enterprise-edu-release-notes">previous sidebar changes</a> and <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">recent ChatGPT updates</a>, but this specific desktop behavior appears undocumented so far.</p><div><hr></div></blockquote><h4>Are my pinned chats, GPTs, or Projects deleted?</h4><blockquote><p>Probably not. In my case, they were still there. They had been moved or hidden in the updated sidebar. Check the <strong>Pinned</strong> section, <strong>My GPTs</strong>, and the Project hover controls before assuming anything was deleted.</p><div><hr></div></blockquote><h4>Is this a reason to stop using ChatGPT?</h4><blockquote><p>No. ChatGPT is still useful. But it is a reason to avoid building your entire workflow around a hosted interface you do not control. Keep important notes, prompts, files, and project structure in places you can back up and export.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>Fix the immediate problems first: expand <strong>Pinned</strong>, re-pin your custom GPTs, check pinned Projects, and use the pen icon to open Project dashboards.</p><p>Then do the more important thing: audit how much of your work depends on ChatGPT&#8217;s interface staying stable.</p><p>If ChatGPT is your main AI workspace, keep backups of your important prompts, project instructions, files, and workflow notes outside ChatGPT. Bookmark critical chats. Export what matters. Consider a local or self-hosted fallback for work you cannot afford to lose access to.</p><p>A cloud AI tool can be worth paying for, but it should never be the only place your working system exists.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/chatgpt-sidebar-pinned-chats-gpts-projects-missing/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/chatgpt-sidebar-pinned-chats-gpts-projects-missing/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[A local Perplexity alternative with Vane, Ollama and SearXNG]]></title><description><![CDATA[Build a Perplexity-style research stack you control using Vane, Ollama, Docker, and SearXNG for private web search.]]></description><link>https://www.popularai.org/p/local-perplexity-alternative-vane-searxng</link><guid isPermaLink="false">https://www.popularai.org/p/local-perplexity-alternative-vane-searxng</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Tue, 02 Jun 2026 23:16:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3RAU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3RAU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3RAU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!3RAU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!3RAU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!3RAU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3RAU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1596277,&quot;alt&quot;:&quot;Perplexica to Vane: set up private AI research with Ollama&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200329050?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Perplexica to Vane: set up private AI research with Ollama" title="Perplexica to Vane: set up private AI research with Ollama" srcset="https://substackcdn.com/image/fetch/$s_!3RAU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!3RAU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!3RAU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!3RAU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731bb114-47fc-4a7a-bcaa-ebdca823ef10_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Set up Vane, Ollama, and SearXNG for a private AI research workflow with local models, Docker, and source-backed search. &#169; Popular AI</figcaption></figure></div><p>If you want a private Perplexity-style research workflow in 2026, start with the most important update: Perplexica now redirects to Vane. The <a href="https://github.com/ItzCrazyKns/Perplexica">Vane GitHub repository</a> describes the project as a privacy-focused AI answering engine that runs on your own hardware, supports local LLMs through Ollama, and uses SearXNG for web search.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/local-perplexity-alternative-vane-searxng?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/local-perplexity-alternative-vane-searxng?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>That makes the practical goal simple. Use Vane as the browser interface, Ollama as the local model runner, and SearXNG as the search layer.</p><p>Hosted Perplexity is still smoother and faster for everyday research. Vane gives you a more private research stack with fewer account dependencies, fewer cloud data paths, and more control over the tools doing the work.</p><div><hr></div><h3>Key takeaways</h3><blockquote><p><strong>Perplexica is now Vane.</strong> The old Perplexica GitHub URL redirects to the Vane repository, so use the current Vane setup instructions.</p></blockquote><blockquote><p><strong>The easiest setup is Docker.</strong> Vane&#8217;s README recommends Docker and provides a one-command setup with bundled SearXNG.</p></blockquote><blockquote><p><strong>Use Ollama for the local LLM layer.</strong> <a href="https://github.com/ollama/ollama">Ollama</a> installs on macOS, Windows, Linux, and Docker, and exposes a local REST API on <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">localhost:11434</mark>.</p></blockquote><blockquote><p><strong>Use SearXNG for private metasearch.</strong> <a href="https://docs.searxng.org/">SearXNG</a> aggregates results from many search services and says users are not tracked or profiled.</p></blockquote><blockquote><p><strong>This is better for private research than polished convenience.</strong> Hosted Perplexity is easier, but the local stack gives you more control over search queries, prompts, model choice, and stored research history.</p></blockquote><blockquote><p><strong>Do not expose this stack to the public internet casually.</strong> A private research tool becomes a liability if you run it on an open port with weak or missing authentication.</p></blockquote><div><hr></div><h3>The practical answer</h3><p>For most users, the best path is:</p><ol><li><p>Install Ollama.</p></li><li><p>Pull a practical local model.</p></li><li><p>Run Vane with Docker.</p></li><li><p>Configure Vane to use Ollama.</p></li><li><p>Use the bundled SearXNG first.</p></li><li><p>Move to a separate SearXNG instance only when you need more control.</p></li></ol><p>Use hosted Perplexity when you want the fastest polished research tool and the data is not sensitive. Use Vane with Ollama and SearXNG when you want a private AI research workflow for client work, business planning, unpublished drafts, technical research, private notes, or topics you do not want tied to a hosted AI account.</p><p>The tradeoff is maintenance. This local setup gives you more ownership, but you are also responsible for Docker, model downloads, updates, machine security, and troubleshooting.</p><h3>What this workflow is for</h3><p>This setup is for people who want AI-assisted web research without making a hosted AI company the center of every query.</p><p>Good use cases:</p><ul><li><p>Private market research.<br></p></li><li><p>Technical documentation searches.<br></p></li><li><p>Competitor research.<br></p></li><li><p>Source gathering for articles.<br></p></li><li><p>Local business planning.<br></p></li><li><p>Internal research notes.<br></p></li><li><p>Research over sensitive topics.<br></p></li><li><p>Repeatable AI search workflows for creators and small businesses.<br></p></li></ul><p>Skip this setup when you need the most polished interface, the strongest frontier model, easy mobile access, team admin features, or no maintenance. Hosted tools win on convenience. Local tools win when control matters more.</p><h3>What you need</h3><p>You need:</p><ol><li><p><strong>Docker Desktop or Docker Engine</strong></p></li></ol><p>The <a href="https://github.com/ItzCrazyKns/Perplexica">Vane README</a> says Docker is the recommended setup path. Docker keeps the install simple because the main container can include the Vane web app and bundled SearXNG search layer.</p><ol start="2"><li><p><strong>Ollama</strong></p></li></ol><p>Ollama provides official install paths for macOS, Windows, Linux, and Docker. The <a href="https://github.com/ollama/ollama">Ollama GitHub README</a> also shows how to run models and use the local REST API.</p><ol start="3"><li><p><strong>A local model</strong></p></li></ol><p>Start with a smaller model before chasing huge context windows. A 7B or 8B model is enough to test the pipeline. Use a stronger model later if your hardware can handle it.</p><p>Good starter choices in Ollama:</p><pre><code><code>ollama pull gemma3</code></code></pre><p>or:</p><pre><code><code>ollama pull qwen3</code></code></pre><ol start="4"><li><p><strong>A browser</strong></p></li></ol><p>Vane runs as a local web app. The <a href="https://github.com/ItzCrazyKns/Perplexica">Docker setup in the Vane README</a> points users to <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">http://localhost:3000</mark><code> </code>after the container starts.</p><ol start="5"><li><p><strong>Enough RAM and storage</strong></p></li></ol><p>The interface itself is light compared with the local model. If you only have 8GB RAM, use small models. If you have 16GB to 32GB RAM, you have more room. If you want larger models, read Popular AI&#8217;s guide to <a href="https://www.popularai.org/p/best-budget-gpus-local-llms-2026">budget GPUs for local LLMs</a> before spending money.</p><div><hr></div><h4><em><strong>Learn how to build your first local AI pc:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;cf847921-1312-4280-8d8f-1265f83e97aa&quot;,&quot;caption&quot;:&quot;Running larger local language models at home in 2026 is easier than it was a year ago, but building the right machine has become a lot less forgiving. Software has improved. vLLM&#8217;s parallelism and scaling docs&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;These 3 dual GPU AI pc builds absolutely crush local LLMs in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-09T21:22:10.662Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZhPn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926cb61e-307e-4df5-ae0f-ed4930172adb_2400x1559.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/dual-gpu-ai-pc-builds-local-llm-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196145185,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>What you will have when finished</h3><p>You will have a browser-based local AI research tool running on your machine.</p><p>The workflow will look like this:</p><pre><code><code>Your browser
  &#8594; Vane local web app
    &#8594; SearXNG search layer
    &#8594; Ollama local LLM
      &#8594; Answer with sources
</code></code></pre><p>The important difference from hosted AI search is where the control sits. Your model runs locally through Ollama. Your search layer can be self-hosted. Your interface and history stay on your machine unless you deliberately connect cloud providers.</p><p>That does not mean every web query becomes invisible. Web research still needs internet access, and SearXNG still sends queries to upstream search services. The privacy gain comes from reducing account-level tracking, keeping model inference local, and controlling more of the research path yourself.</p><h3>Step 1: Install Ollama</h3><p>Install Ollama first because Vane needs a model provider.</p><p>On macOS or Linux:</p><pre><code><code>curl -fsSL https://ollama.com/install.sh | sh</code></code></pre><p>On Windows PowerShell:</p><pre><code><code>irm https://ollama.com/install.ps1 | iex</code></code></pre><p>These commands come from Ollama&#8217;s official README. The macOS and Linux command uses the <a href="https://ollama.com/install.sh">official Ollama install script</a>, while the Windows command uses the <a href="https://ollama.com/install.ps1">official Ollama PowerShell installer</a>.</p><p>After installation, test that Ollama runs:</p><pre><code><code>ollama --version</code></code></pre><p>Then pull a starter model:</p><pre><code><code>ollama pull gemma3</code></code></pre><p>Test the model:</p><pre><code><code>ollama run gemma3</code></code></pre><p>Ask:</p><pre><code><code>Reply with one sentence: local AI is running.</code></code></pre><p>Exit the chat with:</p><pre><code><code>/bye</code></code></pre><p>This first step proves that local inference works before you add Docker, Vane, or search.</p><h3>Step 2: Confirm the Ollama API is reachable</h3><p>Vane needs to reach Ollama&#8217;s local API.</p><p>The <a href="https://github.com/ollama/ollama">Ollama README</a> shows a REST API example using <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">http://localhost:11434/api/chat</mark>. Test it:</p><pre><code><code>curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [
    {
      "role": "user",
      "content": "Reply with OK."
    }
  ],
  "stream": false
}'
</code></code></pre><p>Expected result: a JSON response containing a short model reply.</p><p>If this fails, Vane will not work with Ollama yet. Fix Ollama before moving on. Common causes include the Ollama app not running, the wrong model name, a blocked local port, or a shell that cannot reach the local API.</p><h3>Step 3: Run Vane with bundled SearXNG</h3><p>The simplest Vane setup uses Docker with bundled SearXNG. The project README gives this command:</p><pre><code><code>docker run -d -p 3000:3000 -v vane-data:/home/vane/data --name vane itzcrazykns1337/vane:latest</code></code></pre><p>Then open:</p><pre><code>http://localhost:3000</code></pre><p>According to the <a href="https://github.com/ItzCrazyKns/Perplexica">Vane Docker setup instructions</a>, this pulls and starts the container with the bundled SearXNG search engine and lets you configure providers through the setup screen.</p><p>This is the recommended starting point because it removes one moving part. Get Vane working first. Replace the bundled SearXNG later only if you need more control over search engines, settings, or network access.</p><h3>Step 4: Configure Vane to use Ollama</h3><p>Inside the Vane setup screen, choose Ollama or a local OpenAI-compatible provider if the UI presents that path.</p><p>Use this Ollama API URL on macOS or Windows when Vane is running in Docker:</p><pre><code>http://host.docker.internal:11434</code></pre><p>The <a href="https://github.com/ItzCrazyKns/Perplexica">Vane README</a> lists <code>host.docker.internal:11434</code> for Windows and Mac Docker setups when fixing Ollama connection errors.</p><p>On Linux, Docker often cannot use that hostname by default. The Vane README recommends using the host&#8217;s private IP and, when needed, exposing Ollama with <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">OLLAMA_HOST=0.0.0.0:11434</mark> in the Ollama systemd service.</p><p>Use this Linux pattern only on your trusted local network:</p><pre><code>http://YOUR_HOST_PRIVATE_IP:11434</code></pre><p>Do not expose Ollama to the public internet. Treat it like any other local service that can receive prompts and return model output.</p><h3>Step 5: Run your first private research query</h3><p>Use a simple query first:</p><pre><code><code>Find three current sources explaining what SearXNG is. Summarize each source in one sentence and include the source links.</code></code></pre><p>Check three things:</p><ol><li><p>Vane returns an answer.</p></li><li><p>The answer includes sources.</p></li><li><p>The local model is doing the synthesis.</p></li></ol><p>If the answer works but quality is weak, the search layer may be fine and the model may be the bottleneck. Try a stronger Ollama model before blaming Vane.</p><p>A weak first answer is usually a signal to test methodically. Try a narrower query. Ask for source summaries first. Then ask for synthesis. Local models can do useful research work, but they often need more structure than hosted frontier models.</p><h3>Step 6: Move to your own SearXNG instance only if needed</h3><p>The bundled SearXNG setup is enough for most first installs.</p><p>Use your own SearXNG instance if you want:</p><ul><li><p>More control over search engines.<br></p></li><li><p>More predictable settings.<br></p></li><li><p>Separate maintenance.<br></p></li><li><p>Network-wide search for several local tools.<br></p></li><li><p>A private search endpoint that other apps can use.<br></p></li></ul><p>SearXNG describes itself as a metasearch engine that aggregates results from up to 244 search services, with no user tracking or profiling. The <a href="https://docs.searxng.org/">SearXNG documentation</a> also says users can set up their own instance if they do not trust someone else&#8217;s.</p><p>Vane&#8217;s README says the slim container can point to an existing SearXNG instance with <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">SEARXNG_API_URL</mark>, but it also says the SearXNG instance needs JSON format enabled and Wolfram Alpha enabled.</p><p>Use the slim Vane container like this:</p><pre><code><code>docker run -d \
  -p 3000:3000 \
  -e SEARXNG_API_URL=http://your-searxng-url:8080 \
  -v vane-data:/home/vane/data \
  --name vane \
  itzcrazykns1337/vane:slim-latest</code></code></pre><p>Replace:</p><pre><code>http://your-searxng-url:8080</code></pre><p>with your actual SearXNG address.</p><p>A separate SearXNG instance makes sense once you know Vane works. It is less useful as the first troubleshooting step because it adds another service, another configuration file, and another place where search can fail.</p><h3>Step 7: Use a better research prompt</h3><p>Bad prompt:</p><pre><code><code>Research this topic.</code></code></pre><p>Better prompt:</p><pre><code><code>Research [TOPIC] using current web sources.

Return:
1. A short answer.
2. Five source-backed findings.
3. A section called "What is uncertain."
4. A section called "What I should verify manually."
5. Links to the original sources.

Do not make claims that are not supported by the sources.</code></code></pre><p>For article research, use:</p><pre><code><code>You are helping prepare a source brief for an article.

Topic:
[TOPIC]

Task:
Find primary sources first. Prefer official documentation, GitHub repositories, product pages, pricing pages, privacy policies, changelogs, and legal documents.

Return:
- The strongest sources.
- What each source proves.
- What each source does not prove.
- Claims that need manual verification.
- Suggested article angle.

Do not write the article yet.</code></code></pre><p>For technical research, use:</p><pre><code><code>Research [TOOL OR ERROR].

Focus on:
- Official docs.
- GitHub issues.
- Release notes.
- Known fixes.
- Version requirements.
- Common failure modes.

Return:
- Likely cause.
- Safest fix.
- Risky fixes to avoid.
- Sources.
</code></code></pre><p>Structured prompts matter more with local models. A hosted research product often hides the workflow. With Vane and Ollama, you get better results when you tell the model what kind of sources to prefer, what uncertainty to report, and what claims need manual checking.</p><h3>Step 8: Add a workflow rule for sensitive research</h3><p>Create a simple rule for yourself before using the stack with client, business, or private material.</p><p>Use this:</p><pre><code><code>Private research rule:
- Hosted tools may be used for public facts and non-sensitive summaries.
- Vane with Ollama and SearXNG is used for sensitive queries, unpublished drafts, private strategy, client material, and internal planning.
- No API keys, passwords, customer data, private documents, or legal material are pasted into cloud tools unless there is a deliberate reason and a written record of the tradeoff.</code></code></pre><p>This is where a local workflow earns its keep. The point is not to avoid every cloud tool forever. The point is to stop making a hosted account the default place where all research begins.</p><p>A good private research workflow should be boring and repeatable. Public facts can go through hosted tools when speed matters. Sensitive queries, unpublished strategy, client context, and early article angles should start in the local stack.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Commercial Perplexity vs local Vane: the real tradeoff</h3><p>Hosted Perplexity is still easier. It is polished, fast, and good at turning search results into readable answers.</p><p>The catch is the control layer. A hosted AI search account can change pricing, change models, apply usage limits, alter retention settings, remove features, or restrict access. Perplexity&#8217;s own <a href="https://docs.perplexity.ai/docs/resources/privacy-security">Privacy &amp; Security documentation</a> says the Sonar API has zero data retention and does not use customer API data to train models, but that statement applies to the Sonar API. It should not be treated as a blanket promise about every consumer product surface.</p><p>Perplexity Pro has also been sold as a paid subscription. <a href="https://www.reuters.com/business/paypal-venmo-users-gain-early-access-perplexitys-comet-ai-browser-2025-09-03/">Reuters reported in September 2025</a> that Perplexity Pro was worth $200 per year or $20 per month in a PayPal and Venmo promotion.</p><p>Vane with Ollama and SearXNG gives up some polish. In exchange, you can keep the model local, control the search layer, and avoid building your entire research habit around a hosted AI search account.</p><p>Use Perplexity when:</p><ul><li><p>You need speed.<br></p></li><li><p>The research is not sensitive.<br></p></li><li><p>You want a polished mobile and web experience.<br></p></li><li><p>You do not want to maintain a stack.<br></p></li><li><p>You need stronger hosted models.<br></p></li></ul><p>Use Vane when:</p><ul><li><p>The topic is sensitive.<br></p></li><li><p>You want a local fallback.<br></p></li><li><p>You want to avoid subscription dependency.<br></p></li><li><p>You want to choose your own local model.<br></p></li><li><p>You want your search workflow to survive account or pricing changes.<br></p></li><li><p>You are willing to maintain Docker, Ollama, and SearXNG.<br></p></li></ul><p>Use both when:</p><ul><li><p>Hosted Perplexity is your fast public-research tool.<br></p></li><li><p>Vane is your private research and source-gathering tool.<br></p></li><li><p>You manually verify important claims before publishing or acting on them.<br></p></li></ul><p>The best workflow for most people is hybrid. Let hosted tools handle low-risk public research. Keep sensitive work, unpublished thinking, and private source gathering in the local stack.</p><h3>Privacy, account risk, and lock-in</h3><p>This workflow has three privacy layers.</p><h4>The model layer</h4><p>With Ollama, model inference can run locally. That means prompts do not need to go to OpenAI, Anthropic, Google, Perplexity, or another model provider unless you add one.</p><p>The <a href="https://github.com/ollama/ollama">Ollama project</a> describes local model running, model management, and a local REST API. That makes it a practical base layer for local AI research because the model does not have to leave your machine.</p><p>Local model choice still matters. Small models are easier to run, but they may miss nuance or produce weaker summaries. Larger models can improve synthesis, but they need more RAM, VRAM, storage, and patience.</p><h4>The search layer</h4><p>SearXNG protects more privacy than ordinary search, but it is not magic. It still sends search requests to upstream engines. Its advantage is that it can reduce profiling and tracking, especially when self-hosted and configured carefully.</p><p>The <a href="https://docs.searxng.org/">SearXNG documentation</a> says users are neither tracked nor profiled and that SearXNG can be self-hosted. That makes it a useful search layer for private research, especially when paired with a local model.</p><p>The tradeoff is maintenance. Search engines change. Engines can fail. Rate limits happen. A private metasearch setup gives you more control, but it may require occasional tuning.</p><h4>The interface layer</h4><p>Vane can save search history locally, according to its README. That is useful for research continuity, but it also means your local machine becomes the place where research history lives.</p><p>Protect it accordingly:</p><ul><li><p>Use disk encryption.<br></p></li><li><p>Do not expose the local app to the public internet.<br></p></li><li><p>Do not run unknown containers without checking the project.<br></p></li><li><p>Keep Docker images updated.<br></p></li><li><p>Keep sensitive research out of shared machines.<br></p></li><li><p>Back up useful notes deliberately rather than relying on hidden app data.<br></p></li></ul><p>A local research stack is only as private as the machine and network around it. If your laptop is shared, unencrypted, or exposed, local storage can become a risk.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qFBO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qFBO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!qFBO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!qFBO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!qFBO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qFBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1664918,&quot;alt&quot;:&quot;Vane, Ollama, and SearXNG: the private AI research setup&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200329050?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Vane, Ollama, and SearXNG: the private AI research setup" title="Vane, Ollama, and SearXNG: the private AI research setup" srcset="https://substackcdn.com/image/fetch/$s_!qFBO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!qFBO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!qFBO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!qFBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f9dbfea-70ff-46e6-b16a-e90b68ce94de_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Perplexica is now Vane. Here&#8217;s how to run it with Ollama and SearXNG for private, local AI research in 2026. &#169; Popular AI</figcaption></figure></div><h3>Common problems and fixes</h3><h4>Problem: Vane cannot connect to Ollama</h4><p>What it means: The Vane container cannot reach the Ollama API.</p><p>How to fix it:</p><p>On Windows or macOS, set the Ollama URL to:</p><pre><code>http://host.docker.internal:11434</code></pre><p>The <a href="https://github.com/ItzCrazyKns/Perplexica">Vane README</a> lists this address for Windows and Mac Docker setups.</p><p>On Linux, use your host machine&#8217;s private IP:</p><pre><code>http://YOUR_HOST_PRIVATE_IP:11434</code></pre><p>If needed, configure Ollama to listen on the network interface, then restart Ollama. Keep this local. Do not expose it publicly.</p><div><hr></div><h4>Problem: Search works, but answers are weak</h4><p>What it means: The local model is probably too small, poorly suited for research synthesis, or running with weak settings.</p><p>How to fix it:</p><ul><li><p>Try a stronger Ollama model.<br></p></li><li><p>Ask for source summaries before final answers.<br></p></li><li><p>Keep prompts structured.<br></p></li><li><p>Reduce the task size.<br></p></li><li><p>Use hosted models only for non-sensitive research if local quality is not enough.<br></p></li></ul><p>Model quality matters because Vane is the interface, not the intelligence layer. If the search results are good but the summary is thin, upgrade or change the local model before rebuilding the whole stack.</p><div><hr></div><h4>Problem: Search returns too few sources</h4><p>What it means: SearXNG settings or engine availability may be limiting results.</p><p>How to fix it:</p><ul><li><p>Test SearXNG directly.<br></p></li><li><p>Enable more engines.<br></p></li><li><p>Check whether JSON output is enabled.<br></p></li><li><p>Try the bundled SearXNG first.<br></p></li><li><p>Try a public query before a niche query.<br></p></li></ul><p>Start with a broad query that should return many results. Then narrow the topic. This helps you separate a search configuration problem from a topic problem.</p><div><hr></div><h4>Problem: Docker says the container name already exists</h4><p>What it means: You already created a container named <code>vane</code>.</p><p>How to fix it:</p><pre><code><code>docker stop vane
docker rm vane</code></code></pre><p>Then run the container again.</p><div><hr></div><h4>Problem: You want to update Vane</h4><p>What it means: Your Docker image may be old.</p><p>How to fix it:</p><pre><code><code>docker pull itzcrazykns1337/vane:latest
docker stop vane
docker rm vane
docker run -d -p 3000:3000 -v vane-data:/home/vane/data --name vane itzcrazykns1337/vane:latest</code></code></pre><p>The <mark data-color="#d0e0e3" style="background-color: rgb(208, 224, 227); color: rgb(0, 0, 0);">vane-data</mark> volume preserves your data. Still, back up anything important before updating.</p><div><hr></div><h3>Best local models to start with</h3><p>Start small. Prove the pipeline works before trying a huge model.</p><h4>Best first test model</h4><pre><code><code>ollama pull gemma3</code></code></pre><p>Use it to confirm that Vane can talk to Ollama.</p><div><hr></div><h4>Better research model</h4><p>Use a stronger Qwen, Llama, Gemma, or Mistral model that fits your machine.</p><p>General guidance:</p><ul><li><p>8GB RAM: use small models.<br></p></li><li><p>16GB RAM: use 7B or 8B models.<br></p></li><li><p>32GB RAM: try larger quantized models.<br></p></li><li><p>24GB VRAM GPU: you have far more room for useful local LLM work.<br></p></li></ul><p>For buying advice, use Popular AI&#8217;s <a href="https://www.popularai.org/p/is-local-ai-hardware-worth-it-2026">local AI hardware guide</a> and the <a href="https://www.popularai.org/p/best-budget-gpus-local-llms-2026">budget Ollama GPU guide</a>.</p><p>Do not make your first test harder than it needs to be. A small model that responds reliably is better for setup than a giant model that barely fits in memory. Once the pipeline works, upgrade the model and compare results.</p><div><hr></div><h3>A safer hybrid workflow</h3><p>The strongest practical setup is role separation.</p><p>Use Vane locally for:</p><ul><li><p>Sensitive queries.<br></p></li><li><p>Early research.<br></p></li><li><p>Unpublished angles.<br></p></li><li><p>Client-specific questions.<br></p></li><li><p>Internal strategy.<br></p></li><li><p>Source discovery.<br></p></li><li><p>Draft outlines.<br></p></li></ul><p>Use hosted Perplexity or another cloud research tool for:</p><ul><li><p>Low-sensitivity public facts.<br></p></li><li><p>Fast source discovery.<br></p></li><li><p>Casual searches.<br></p></li><li><p>Queries where convenience matters more than privacy.<br></p></li></ul><p>Then verify important facts manually from primary sources before publishing.</p><p>That gives you speed without making your most sensitive research dependent on a hosted account. It also gives you a fallback if a cloud tool changes pricing, limits features, or becomes unavailable.</p><div><hr></div><h3>FAQ</h3><h4>Is Perplexica still available?</h4><blockquote><p>The old Perplexica GitHub URL redirects to Vane. In practice, use the <a href="https://github.com/ItzCrazyKns/Perplexica">current Vane repository</a> and current Vane setup instructions. The project still appears in some references as Perplexica, but the current repo branding is Vane.</p><div><hr></div></blockquote><h4>Is Vane a full Perplexity replacement?</h4><blockquote><p>No. Vane is a local, self-hostable alternative for AI-assisted search and answering. Hosted Perplexity is more polished and easier to use. Vane gives you more control over the model, search layer, and local data path.</p><div><hr></div></blockquote><h4>Does Vane run fully offline?</h4><blockquote><p>No, not for web research. The local LLM can run through Ollama, but web search requires internet access. If you ask it to search the web, SearXNG still has to query search sources. The private advantage is reduced account dependency and more control, not total offline operation.</p><div><hr></div></blockquote><h4>Does SearXNG make searches completely anonymous?</h4><blockquote><p>No. SearXNG improves privacy by reducing tracking and profiling, and it can be self-hosted, but upstream search engines still receive queries from the server making the request. SearXNG says users are neither tracked nor profiled by SearXNG itself.</p><div><hr></div></blockquote><h4>Can I use cloud models inside Vane?</h4><blockquote><p>Yes. Vane&#8217;s README says it supports local LLMs through Ollama and cloud providers including OpenAI, Claude, and Groq. That is useful, but it changes the privacy model. Once you connect a cloud provider, prompts sent to that provider are no longer local.</p><div><hr></div></blockquote><h4>What is the best model for this setup?</h4><blockquote><p>Use a small model first to test the pipeline. Then use the strongest model your hardware can run comfortably. For many users, a good 7B or 8B model is the sensible starting point. Larger models improve synthesis but need more RAM, VRAM, and patience.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>Set up Vane with bundled SearXNG first, connect it to Ollama, and test it with a small local model. After that works, decide whether you need a separate SearXNG instance or a stronger model.</p><p>Treat this as a practical local research stack. Hosted AI search is smoother, but Vane with Ollama and SearXNG gives you a working fallback for research you do not want tied to a cloud account, changing subscription terms, or vendor-controlled defaults.</p><p>For creators, consultants, researchers, small businesses, and technical writers, that fallback matters. It helps separate public research from sensitive work, keeps more of the workflow under your control, and makes AI search less dependent on one hosted product.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/local-perplexity-alternative-vane-searxng/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/local-perplexity-alternative-vane-searxng/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Llama-3-Groq-70B-Tool-Use: scale private agents locally in Ollama]]></title><description><![CDATA[A practical guide to scaling local AI agent workflows with Ollama, tool schemas, permission controls, monitoring, and private knowledge search.]]></description><link>https://www.popularai.org/p/llama-3-groq-70b-tool-use-scale-private</link><guid isPermaLink="false">https://www.popularai.org/p/llama-3-groq-70b-tool-use-scale-private</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Tue, 02 Jun 2026 16:39:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BrRP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BrRP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BrRP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!BrRP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!BrRP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!BrRP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BrRP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1787460,&quot;alt&quot;:&quot;How to scale private AI agent workflows with Ollama&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200313358?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How to scale private AI agent workflows with Ollama" title="How to scale private AI agent workflows with Ollama" srcset="https://substackcdn.com/image/fetch/$s_!BrRP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!BrRP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!BrRP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!BrRP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83e67d26-0acd-4f41-aaa6-58dff7d258d9_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Learn how to run Llama-3-Groq-70B-Tool-Use locally in Ollama and build private AI agents for tool calling, automation, and internal workflows. &#169; Popular AI</figcaption></figure></div><p>Artificial intelligence agents are moving fast from experimental prototypes into production systems. Businesses now use them to search internal knowledge bases, generate reports, automate support workflows, retrieve company data, and coordinate actions across multiple software systems.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/llama-3-groq-70b-tool-use-scale-private?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/llama-3-groq-70b-tool-use-scale-private?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Many agent platforms still depend heavily on cloud-hosted models. That can be convenient, especially when teams want to move quickly. But cloud-based agents also introduce recurring costs, vendor dependence, and privacy concerns that become harder to ignore once workflows begin touching sensitive customer records, financial data, legal documents, or internal business systems.</p><p>Large local tool-use models offer a practical alternative. By running <a href="https://huggingface.co/Groq/Llama-3-Groq-70B-Tool-Use">Llama-3-Groq-70B-Tool-Use</a> locally through <a href="https://ollama.com/library/llama3-groq-tool-use%3A70b">Ollama</a>, organizations can build private agents capable of complex reasoning and structured function calling without sending data to external AI providers.</p><p>This guide explains how to deploy, optimize, and scale local agent workflows using one of the most capable open tool-use models available for self-hosted environments.</p><div><hr></div><h4><em><strong>More on Llama-3-Groq-70B-Tool-Use:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;cbff56ef-5734-4c06-8d3c-2bf4c388ae51&quot;,&quot;caption&quot;:&quot;The easiest way to misunderstand AI agents is to think of them as chatbots with a few extra controls.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Private AI Agents with Ollama: run Llama-3-Groq-8B-Tool-Use locally&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-31T14:09:22.506Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!j05H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/llama-3-groq-8b-tool-use-ollama-local-ai-agents&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:199960583,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Why large tool-use models matter</h3><p>Tool calling has become one of the most important capabilities in modern AI systems. With <a href="https://docs.ollama.com/capabilities/tool-calling">Ollama tool calling</a>, a model can invoke tools and incorporate their results into a response, which makes local agents far more useful than simple chatbots.</p><p>Instead of merely generating text, a tool-use model can query databases, search internal documentation, retrieve customer records, generate reports, trigger automations, call APIs, and execute business workflows.</p><p>A typical interaction looks like this:</p><pre><code><code>User Request
      &#8595;
Model decides which tool is needed
      &#8595;
Tool executes
      &#8595;
Result returned to model
      &#8595;
Final response generated
</code></code></pre><p>Smaller models can handle straightforward tool calls effectively. As workflows become more sophisticated, larger models typically demonstrate stronger planning, better multi-step reasoning, improved context retention, and more reliable tool selection.</p><p>That matters when an agent needs to coordinate several actions before producing a useful result. A simple customer support question might require account lookup, ticket search, invoice review, policy retrieval, and response drafting. If the model chooses the wrong tool or loses track of the sequence, the entire workflow becomes unreliable.</p><h3>Installing the model with Ollama</h3><p>Begin by downloading the model:</p><pre><code><code>ollama pull llama3-groq-tool-use:70b
</code></code></pre><p>Launch an interactive session:</p><pre><code><code>ollama run llama3-groq-tool-use:70b
</code></code></pre><p>For application development, the local <a href="https://docs.ollama.com/api/chat">Ollama chat API</a> provides a simple interface:</p><pre><code><code>curl http://localhost:11434/api/chat \
-d '{
  "model":"llama3-groq-tool-use:70b",
  "messages":[
    {
      "role":"user",
      "content":"Hello"
    }
  ]
}'
</code></code></pre><p>Once running, the model becomes available to local applications without requiring external API keys or internet connectivity. That makes it especially useful for teams that want to keep agent workflows close to their own data, infrastructure, and access controls.</p><h3>Hardware considerations</h3><p>A 70B model represents a substantial step up from smaller open models. It can offer stronger reasoning and tool-use performance, but it also requires more careful planning around memory, throughput, and deployment architecture.</p><h3>GPU-based deployments</h3><p>The most practical approach involves dedicated GPUs with significant memory capacity.</p><p>Suitable configurations include:</p><ul><li><p>Multi-GPU workstations<br></p></li><li><p>Enterprise inference servers<br></p></li><li><p>RTX 4090 clusters<br></p></li><li><p>Data center accelerators<br></p></li></ul><p>GPU acceleration is the best fit for production-grade agent systems where multiple users, long workflows, or frequent tool calls are expected.</p><div><hr></div><h4><em><strong>More on GPUs for local LLMs:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;76f5d7c6-2388-4eca-ab2b-c9bf38b7c4f6&quot;,&quot;caption&quot;:&quot;For anyone building a cheap local AI box in 2026, the first rule has not changed. VRAM matters more than gamer marketing. A Llama 3.1 8B Q4 build in Ollama is 4.9GB. A Gemma 3 12B Q4 build lands at 8.1GB, while its Q8 &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best budget GPUs for local LLMs in 2026: 5 smart buys for Ollama&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-21T13:31:02.258Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!vIue!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ed22ac-c47a-4628-85f2-763942f38049_2303x1478.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-budget-gpus-local-llms-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:194906880,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Apple Silicon systems</h3><p>High-memory Apple Silicon devices have become popular for self-hosted AI deployments.</p><p>Systems equipped with:</p><ul><li><p>128 GB unified memory<br></p></li><li><p>192 GB unified memory<br></p></li></ul><p>can run heavily quantized large models while maintaining acceptable performance.</p><div><hr></div><h4><em><strong>More on Apple Silicon systems for local AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;3507d4d5-1cd6-4012-9444-c93d248783ee&quot;,&quot;caption&quot;:&quot;A lot of people who want to run local models do not want a loud gaming tower under the desk. They want a small machine that can sit beside a monitor, stay quiet, and handle private AI work without turning every useful tas&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The Best Mac mini for local LLMs in 2026: M4 vs M4 Pro for Ollama and MLX&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-25T14:41:41.196Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!anDH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F426c063c-e449-490a-a516-41bef0079dc7_2400x1491.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/the-best-mac-mini-for-local-llms&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192019126,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:2,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>CPU execution</h3><p>CPU-only inference remains possible but is generally best reserved for testing, experimentation, or low-frequency workloads.</p><p>Production-grade agent systems benefit significantly from GPU acceleration.</p><div><hr></div><h4><em><strong>More on CPUs for local LLMs:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ad307492-3ae2-419a-be2f-9cbda30d9485&quot;,&quot;caption&quot;:&quot;If you care about running local LLMs without being boxed in by API limits, feature removals, or policy changes, CPU choice still matters. The GPU still does most of the heavy lifting in a sensible local AI build&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best CPU for running local LLMs: top AMD vs Intel processors ranked&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-26T14:48:48.300Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3ZfR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a4fd65-8759-4663-94b8-73a686cfb188_2400x1444.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-cpu-for-running-local-llms-top&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192086772,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Understanding agent architecture</h3><p>An agent is fundamentally a reasoning layer that sits above tools. The model decides which actions should occur, while the external tools perform those actions.</p><p>A typical architecture looks like:</p><pre><code><code>User
 &#8595;
Llama-3-Groq-70B-Tool-Use
 &#8595;
Available Tools
 &#9500;&#9472;&#9472; Search Documents
 &#9500;&#9472;&#9472; Query Database
 &#9500;&#9472;&#9472; Read CRM
 &#9500;&#9472;&#9472; Generate Reports
 &#9492;&#9472;&#9472; Draft Emails
</code></code></pre><p>The model determines when each tool should be used and how results should be combined. This separation is important. The model should not directly control every business system without guardrails. Instead, it should operate through clearly defined tools that expose only the actions the agent is allowed to take.</p><p>That design gives teams more control. It also makes agent behavior easier to test, monitor, and audit.</p><h3>Building a multi-tool agent</h3><p>A simple Python example might expose several business functions:</p><pre><code><code>def search_documents(query):
    pass

def get_customer_record(customer_id):
    pass

def get_invoice_status(invoice_id):
    pass

def create_email_draft(recipient, subject, body):
    pass
</code></code></pre><p>A user request such as:</p><pre><code><code>Review Acme's account, check unpaid invoices,
summarize recent support activity,
and prepare a follow-up email.
</code></code></pre><p>can trigger multiple tool calls.</p><p>The model might:</p><ol><li><p>Retrieve account information.</p></li><li><p>Check invoice status.</p></li><li><p>Search support notes.</p></li><li><p>Draft communication.</p></li><li><p>Present a final summary.</p></li></ol><p>To the user, the workflow appears seamless even though several independent systems were consulted. Behind the scenes, the agent is planning the sequence, selecting tools, interpreting each result, and deciding whether another action is needed before producing the final answer.</p><p>This is where a large tool-use model becomes valuable. Basic tool calls are easy. Coordinating a chain of related actions while preserving context is much harder.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Designing effective tool schemas</h3><p>The quality of your tool definitions directly influences agent performance.</p><p>Avoid vague interfaces:</p><pre><code><code>def process_data(input):
    pass
</code></code></pre><p>Instead, create highly specific functions:</p><pre><code><code>def get_invoice_status(invoice_id):
    pass

def search_support_tickets(customer_id):
    pass

def retrieve_contract(contract_id):
    pass
</code></code></pre><p>Clear names reduce ambiguity and improve tool selection accuracy. A model is more likely to call the right function when each tool has a narrow purpose and an obvious name.</p><p>For production systems, structured schemas are even better:</p><pre><code><code>{
  "type":"object",
  "required":["customer_id"],
  "properties":{
    "customer_id":{
      "type":"string"
    }
  }
}
</code></code></pre><p>Explicit requirements help prevent malformed tool calls. They also make validation easier before a tool touches a database, CRM, ticketing platform, or other production system.</p><p>Strong schemas are one of the simplest ways to make private AI agents more reliable. The model should not need to infer what a tool expects from vague parameter names or loosely written descriptions.</p><h3>Creating a private knowledge assistant</h3><p>One of the most valuable local-agent patterns combines internal documents, embedding models, vector search, and tool calling.</p><p>Architecture:</p><pre><code><code>Company Documents
        &#8595;
Embedding Model
        &#8595;
Vector Database
        &#8595;
Search Tool
        &#8595;
Llama-3-Groq-70B-Tool-Use
        &#8595;
Answer
</code></code></pre><p>Instead of feeding entire document collections into prompts, the agent retrieves only relevant information. This approach helps keep context windows manageable while improving answer quality.</p><p>Benefits include:</p><ul><li><p>Better accuracy<br></p></li><li><p>Lower context usage<br></p></li><li><p>Faster responses<br></p></li><li><p>Reduced hallucinations<br></p></li><li><p>Stronger privacy controls<br></p></li></ul><p>A private knowledge assistant can support legal teams, finance departments, HR operations, support agents, sales teams, and engineering groups. The key is to keep retrieval focused. The search tool should return concise passages, summaries, or structured snippets rather than overwhelming the model with every possible document.</p><h3>Supporting long multi-step conversations</h3><p>Large models excel when workflows extend across many interactions.</p><p>Example:</p><h3>User</h3><pre><code><code>Find contracts expiring in the next 60 days.
</code></code></pre><h3>Agent</h3><p>Uses contract-search tools.</p><h3>User</h3><pre><code><code>Show only customers spending more than $10,000 annually.
</code></code></pre><h3>Agent</h3><p>Filters results.</p><h3>User</h3><pre><code><code>Draft renewal proposals for each account.
</code></code></pre><h3>Agent</h3><p>Generates tailored drafts.</p><p>Maintaining coherence across multiple steps is one area where larger models often outperform smaller alternatives. In business workflows, users rarely provide every requirement in the first message. They refine, filter, and redirect as new information appears.</p><p>A strong local agent needs to remember what has already happened, understand the user&#8217;s current intent, and decide whether to reuse previous results or call another tool.</p><h3>Implementing permission controls</h3><p>Powerful agents require strict controls. Every tool should have a clear risk level, and the agent should not be allowed to treat all actions the same way.</p><h3>Low-risk tools</h3><pre><code><code>search_documents()
get_customer_record()
get_inventory_levels()
</code></code></pre><h3>High-risk tools</h3><pre><code><code>delete_file()
send_email()
approve_payment()
modify_database()
</code></code></pre><p>A practical framework is:</p><pre><code><code>Read operations: automatic
Write operations: review required
Critical actions: explicit approval required
</code></code></pre><p>This significantly reduces the risk of unintended consequences. An agent can search a knowledge base or retrieve a record automatically, but sending an email, modifying a database, approving a payment, or deleting a file should require human review.</p><p>Permission design is especially important for local deployments because private agents are often connected to internal systems. The more useful the agent becomes, the more carefully teams need to control what it can do.</p><h3>Scaling across departments</h3><p>Organizations frequently achieve better results using specialized agents rather than one universal system.</p><h4>Sales agent</h4><p>Tools:</p><ul><li><p>CRM lookup<br></p></li><li><p>Lead scoring<br></p></li><li><p>Proposal generation</p><p></p></li></ul><div><hr></div><h4>Customer support agent</h4><p>Tools:</p><ul><li><p>Ticket search<br></p></li><li><p>Knowledge retrieval<br></p></li><li><p>Response drafting<br></p></li></ul><div><hr></div><h4>Finance agent</h4><p>Tools:</p><ul><li><p>Invoice lookup<br></p></li><li><p>Budget analysis<br></p></li><li><p>Forecast generation<br></p></li></ul><div><hr></div><h4>Operations agent</h4><p>Tools:</p><ul><li><p>Inventory systems<br></p></li><li><p>Vendor databases<br></p></li><li><p>Procurement workflows</p><p></p></li></ul><div><hr></div><p>Each agent receives access only to the resources necessary for its role. That keeps prompts simpler, reduces tool-selection confusion, and limits risk if an agent behaves unexpectedly.</p><p>A department-specific agent can also be tuned around the workflows, vocabulary, and approval requirements of that team. Sales may care about CRM context and proposal drafts. Finance may care about invoice status, budget variance, and audit trails. Support may care about ticket history and response quality.</p><p>The best private agent systems usually grow in stages. Start with one workflow, make it reliable, then expand the toolset and user base gradually.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rNAG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rNAG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!rNAG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!rNAG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!rNAG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rNAG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1524675,&quot;alt&quot;:&quot;Llama-3-Groq-70B-Tool-Use: Build private agents locally&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200313358?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Llama-3-Groq-70B-Tool-Use: Build private agents locally" title="Llama-3-Groq-70B-Tool-Use: Build private agents locally" srcset="https://substackcdn.com/image/fetch/$s_!rNAG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!rNAG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!rNAG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!rNAG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdadd9773-2269-48ef-9bfb-ae9896b43719_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Build private AI agents without relying on cloud models by using Llama-3-Groq-70B-Tool-Use, Ollama, vector search, and local tools. &#169; Popular AI</figcaption></figure></div><h3>Optimizing performance</h3><p>Large models consume substantial resources, making optimization important. Performance problems often come from overly large tool responses, unchecked conversation growth, and unnecessary tool calls.</p><h3>Keep tool responses short</h3><p>Instead of returning large records:</p><pre><code><code>Entire customer history...
</code></code></pre><p>Return structured summaries:</p><pre><code><code>{
  "status":"active",
  "balance":"$420",
  "last_payment":"paid"
}
</code></code></pre><p>Short, structured responses are easier for the model to interpret. They also reduce latency and help preserve context for the rest of the conversation.</p><h3>Limit context growth</h3><p>Very long conversations eventually degrade performance.</p><p>Periodic summarization helps preserve responsiveness.</p><p>The summary should capture the user&#8217;s goal, tools already called, key facts retrieved, and pending next steps. That gives the model enough continuity without forcing every prior message back into the prompt.</p><h3>Reduce tool spam</h3><p>Agents sometimes call unnecessary tools.</p><p>Prompt instructions such as:</p><pre><code><code>Only call tools when required.
</code></code></pre><p>can significantly reduce latency.</p><p>Tool descriptions can also help. Make it clear when a tool should be used, when it should be avoided, and what kind of result it returns. The goal is to make the correct path obvious to the model.</p><h3>Monitoring and auditing agent activity</h3><p>Production systems should log:</p><pre><code><code>Timestamp
User request
Tool selected
Arguments
Tool output
Final response
</code></code></pre><p>These logs help:</p><ul><li><p>Debug failures<br></p></li><li><p>Audit decisions<br></p></li><li><p>Identify bottlenecks<br></p></li><li><p>Improve prompts<br></p></li><li><p>Evaluate tool effectiveness<br></p></li></ul><p>Without visibility into tool usage, troubleshooting complex agent systems becomes difficult.</p><p>Monitoring also helps teams spot patterns. If an agent repeatedly calls the wrong tool, the schema may be unclear. If tool outputs are too large, the response format may need tightening. If users frequently override the final answer, the workflow may need better approval steps or better retrieval.</p><h3>Real-world use cases</h3><p>Local 70B tool-use models are particularly attractive for:</p><ul><li><p>Legal research<br></p></li><li><p>Financial analysis<br></p></li><li><p>Compliance operations<br></p></li><li><p>Internal knowledge management<br></p></li><li><p>Software development support<br></p></li><li><p>Enterprise search<br></p></li><li><p>Customer service automation<br></p></li></ul><p>These environments often contain sensitive information that organizations prefer to keep entirely within their own infrastructure.</p><p>For legal teams, a private agent can search contracts, summarize clauses, and identify renewal dates. For finance teams, it can retrieve invoice records, compare budgets, and prepare reports. For support teams, it can search ticket history and draft customer replies based on approved knowledge sources.</p><p>The common thread is control. Local agents let organizations decide where data lives, which systems the agent can access, and which actions require human approval.</p><h3>The road ahead for private agents</h3><p>The capabilities of self-hosted agents have improved dramatically over the past few years. Tasks that once required expensive proprietary APIs can now be performed locally with open models and consumer-accessible hardware.</p><p><a href="https://groq.com/blog/introducing-llama-3-groq-tool-use-models">Llama-3-Groq-70B-Tool-Use</a> demonstrates how far local AI has progressed. Combined with Ollama, vector databases, internal tools, and careful permission design, it enables organizations to build sophisticated automation systems while maintaining stronger ownership of their data.</p><p>For businesses prioritizing privacy, control, and long-term cost predictability, large local tool-use models are quickly becoming a practical foundation for the next generation of AI-powered workflows.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/llama-3-groq-70b-tool-use-scale-private/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/llama-3-groq-70b-tool-use-scale-private/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[PewDiePie built a private AI workspace, and it is worth watching]]></title><description><![CDATA[PewDiePie&#8217;s Odysseus AI workspace brings local models, agents, memory, files, research, and email into one self-hosted AI interface.]]></description><link>https://www.popularai.org/p/pewdiepie-odysseus-ai-workspace</link><guid isPermaLink="false">https://www.popularai.org/p/pewdiepie-odysseus-ai-workspace</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Mon, 01 Jun 2026 10:53:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!miGp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!miGp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!miGp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!miGp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!miGp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!miGp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!miGp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2322354,&quot;alt&quot;:&quot;PewDiePie&#8217;s Odysseus AI workspace: why local AI matters now&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200099905?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="PewDiePie&#8217;s Odysseus AI workspace: why local AI matters now" title="PewDiePie&#8217;s Odysseus AI workspace: why local AI matters now" srcset="https://substackcdn.com/image/fetch/$s_!miGp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!miGp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!miGp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!miGp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2848c23f-0f3c-4c9e-ba2b-e57be1a62ac7_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Here is what Odysseus does, why it matters for local AI, how it compares with Open WebUI, and why security matters.</figcaption></figure></div><p>PewDiePie&#8217;s new Odysseus AI workspace is interesting for reasons that go well beyond the celebrity name attached to it.</p><p>The project, introduced in <a href="https://www.youtube.com/watch?v=rAzT5lcezPs">PewDiePie&#8217;s YouTube announcement</a>, is a self-hosted AI workspace built around local models, agents, memory, research, email, documents, model serving, and personal data. The pitch is simple: instead of handing more of your work to a hosted AI account, you can run the workspace on hardware you control and decide which local or cloud model endpoints it can use.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/pewdiepie-odysseus-ai-workspace?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/pewdiepie-odysseus-ai-workspace?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>That makes Odysseus part of a bigger shift in AI. The first wave of consumer AI was about chatbots. The next wave is about workspaces that can remember, search, write, organize, use tools, and act across files and apps. Odysseus is aimed directly at that second wave, with a local-first design that tries to bring the convenience of ChatGPT-style products closer to the privacy and control of self-hosted software.</p><p>The key thing to understand is that Odysseus is a workspace, rather than a new foundation model. It is the surrounding layer: the chat interface, memory system, file layer, document tools, email assistant, research workflow, agent shell, model comparison view, and model-serving control panel. The project&#8217;s <a href="https://github.com/pewdiepie-archdaemon/odysseus">GitHub repository</a> describes it as a self-hosted AI workspace, while the <a href="https://pewdiepie-archdaemon.github.io/odysseus/">official Odysseus project page</a> presents it as a local-first, privacy-first interface for language models with chat, autonomous agents, tools, model serving, email, research, and more.</p><p>That framing matters because local AI already has plenty of model runners. What it still needs is better everyday software.</p><div><hr></div><h3>Key takeaways</h3><blockquote><p><strong>Odysseus is a self-hosted AI workspace</strong>, not a new foundation model.</p></blockquote><blockquote><p><strong>It is aimed at the ChatGPT and Claude interface problem:</strong> local AI can be powerful, but the surrounding app layer often feels worse than hosted tools.</p></blockquote><blockquote><p><strong>The big idea is personal context without cloud dependency.</strong> Odysseus <a href="https://raw.githubusercontent.com/pewdiepie-archdaemon/odysseus/main/README.md">stores app data locally</a> in its <code>data/</code> folder and is designed around local-first workflows.</p></blockquote><blockquote><p><strong>It is early software.</strong> The roadmap openly <a href="https://raw.githubusercontent.com/pewdiepie-archdaemon/odysseus/main/ROADMAP.md">asks for Docker install testing, integration audits, bug fixes</a>, better setup docs, security hardening, and UI cleanup.</p></blockquote><blockquote><p><strong>It competes more with Open WebUI and AnythingLLM than with ChatGPT itself.</strong> Those projects already offer mature self-hosted local AI interfaces, so Odysseus needs to win on workflow, taste, and integration.</p></blockquote><blockquote><p><strong>Do not expose it casually to the internet.</strong> The project&#8217;s <a href="https://raw.githubusercontent.com/pewdiepie-archdaemon/odysseus/main/SECURITY.md">own security notes</a> say to keep authentication on, use HTTPS beyond localhost, and treat shell, model-serving, email, calendar, and vault features as privileged admin functionality.</p></blockquote><div><hr></div><h3>The real problem Odysseus is trying to solve</h3><p>Local AI has had the same recurring problem for years: the model can be good enough, but the experience around it often feels unfinished.</p><p>Hosted AI tools are popular because they hide the messy parts. They handle files, remember context, manage sessions, search the web, sync across devices, run tools, and package everything in a polished interface. Local AI gives users more control, but often makes them assemble a toolkit from separate pieces: Ollama, llama.cpp, Open WebUI, model files, vector databases, search tools, scripts, file watchers, and custom prompts.</p><p>Odysseus is PewDiePie&#8217;s attempt to close that gap. The project&#8217;s <a href="https://raw.githubusercontent.com/pewdiepie-archdaemon/odysseus/main/README.md">README</a> describes it as the self-hosted version of the UI experience people get from ChatGPT and Claude, with support for local and API-backed providers including vLLM, llama.cpp, Ollama, OpenRouter, and OpenAI.</p><p>That is a strong target. Local AI stops being a novelty when it can work with your real context. A model that only answers questions in a blank chat box is useful, but limited. A model that can work with your documents, notes, email, memory, tasks, calendar, research, files, and tools becomes something closer to a personal workspace.</p><p>That also raises the stakes. The more power a local AI workspace has, the more careful its design has to be. An interface that touches your files, shell, email, calendar, tokens, model server, and memory is closer to an admin console than a chatbot.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>What Odysseus can do</h3><p>Odysseus has an unusually broad feature list for an early public project.</p><p>The core chat layer can connect to local models or external APIs. The agent layer can use tools for web access, files, shell, skills, and memory. The Cookbook feature is designed to scan hardware, recommend models, and support GGUF, FP8, and AWQ model paths with vLLM and llama.cpp serving.</p><p>The workspace also includes Deep Research, model comparison, document editing, persistent memory, email triage, notes, tasks, calendar sync, file uploads, vision and PDF support, web search, presets, sessions, 2FA, and a mobile-friendly PWA interface. The official landing page presents the same broad structure, including chat and agents, MCP tools, Cookbook, email assistance, Deep Research, Compare, Memory, self-evolving skills, and private-by-default local execution.</p><p>That ambition is the project&#8217;s main appeal. It is also the reason users should be cautious.</p><p>A simple local chat UI can break without doing much harm. A full AI workspace that can touch models, files, shell, email, memory, search, and tasks needs stronger boundaries. If Odysseus works well, it could feel like a private AI cockpit. If configured carelessly, it could expose too much power to a tool that users may not fully understand yet.</p><h3>The strongest idea is local context</h3><p>The best idea in Odysseus is not the interface. It is where the context lives.</p><p>According to the README, user data lives in the local <code>data/</code> folder. That includes the app database, sessions, messages, documents, memory, uploads, personal docs, Chroma data, presets, and settings. This changes the bargain users make with an AI assistant.</p><p>Hosted AI tools become more useful when they know more about you. That usually means uploading more documents, notes, prompts, customer data, code, transcripts, business context, or private research into someone else&#8217;s account system. Odysseus is built around the opposite idea. Give the assistant rich context, but keep that context on your own machine or server wherever possible.</p><p>That does not make it automatically safe. A local AI workspace can still leak secrets if it calls external APIs, exposes ports, logs sensitive data, runs unsafe tools, or connects to third-party services. But the control point moves. The user gets to decide which endpoints to use, which integrations to enable, which files to index, and what leaves the machine.</p><p>That is the practical reason Odysseus is worth watching. It is not only another way to talk to a model. It is an attempt to make local personal context usable without making a cloud AI account the center of the workflow.</p><div><hr></div><h4><em><strong>Build a local AI pc on the cheap:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;2258e4c0-9898-42eb-acfc-24f14a06347f&quot;,&quot;caption&quot;:&quot;People are no longer asking for a local model in the abstract. They want a local coding agent that can inspect a repo, run tools, write patches, refactor code, and keep working even when a vendor changes p&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best RTX 3090 PC build for local coding agents in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-24T19:15:55.624Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!i1uR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918fc1c1-2dc0-45b4-9fa7-fd17f7ebaf1a_2400x1405.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/the-best-rtx-3090-pc-build-for-local&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192010030,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Odysseus is early software</h3><p>Odysseus should not be treated as a polished replacement for ChatGPT, Claude, Open WebUI, or AnythingLLM yet.</p><p>The project&#8217;s <a href="https://raw.githubusercontent.com/pewdiepie-archdaemon/odysseus/main/ROADMAP.md">roadmap</a> openly calls for fresh Docker install testing on Linux, macOS, and Windows, integration audits, self-host troubleshooting docs, Cookbook reliability work across different GPUs and drivers, UI bug fixes, accessibility work, and security hardening.</p><p>That honesty is useful. It tells users what this is: a fast-moving personal tool that has been made public, rather than a boring enterprise product with a support department and a long stability history.</p><p>That means the safest way to try Odysseus is the way people should approach early self-hosted software. Test it with non-critical data. Keep it on localhost or a private network. Read the security notes before enabling powerful features. Avoid pointing it at sensitive email, work files, production repositories, private credentials, or important API keys until every enabled tool and integration is understood.</p><p>For many users, the right move is to watch Odysseus first and install later. For local AI hobbyists and developers, it may already be an interesting lab project.</p><h3>How Odysseus compares with Open WebUI and AnythingLLM</h3><p>Odysseus enters a crowded local AI interface space. That is good for users, but it means the project has to earn attention on more than novelty.</p><p><a href="https://github.com/open-webui/open-webui">Open WebUI</a> is already a mature self-hosted AI platform. It supports Ollama, OpenAI-compatible APIs, RAG, permissions, mobile PWA use, model tools, and local or cloud providers. It is one of the obvious starting points for anyone who wants a local AI interface that feels closer to a real product.</p><p><a href="https://github.com/Mintplex-Labs/anything-llm">AnythingLLM</a> is another strong comparison. It is an all-in-one AI app built around document chat, agents, multi-user support, vector databases, document pipelines, and local or cloud model support. Its appeal is similar: reduce setup friction and give users a practical workspace around models.</p><p>That means Odysseus does not win merely by being self-hosted. Open WebUI and AnythingLLM already serve that need for many people.</p><p>What makes Odysseus interesting is the shape of the product. It feels like a power-user workspace built by someone trying to replace his own daily AI environment. The feature mix is personal and opinionated: model comparison, self-evolving skills, email, memory, calendar, notes, tasks, research, image tools, theming, mobile support, and hardware-aware model management.</p><p>That is less like installing a chat UI and more like building a private AI cockpit.</p><p>The open question is whether that cockpit becomes reliable enough for people who are not PewDiePie.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PtBQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PtBQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!PtBQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!PtBQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!PtBQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PtBQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1502600,&quot;alt&quot;:&quot;PewDiePie built a private AI workspace, and it is worth watching&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/200099905?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="PewDiePie built a private AI workspace, and it is worth watching" title="PewDiePie built a private AI workspace, and it is worth watching" srcset="https://substackcdn.com/image/fetch/$s_!PtBQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!PtBQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!PtBQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!PtBQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c7efcd8-9d34-40ba-a8b2-dc7108f160e9_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Odysseus is not a new AI model. It is PewDiePie&#8217;s self-hosted AI workspace for local models, private context, agents, and tools.</figcaption></figure></div><h3>The security warning matters</h3><p>Odysseus gives agents access to serious capabilities, which makes security central to the product rather than an afterthought.</p><p>The project&#8217;s <a href="https://raw.githubusercontent.com/pewdiepie-archdaemon/odysseus/main/SECURITY.md">security policy</a> says not to run Odysseus as a public unauthenticated service, to use HTTPS when exposing it beyond localhost, and to restrict high-risk tools such as shell, Python, file read and write, email, MCP, API access, tasks, skills, memory, settings, tokens, and model serving to admins.</p><p>That is the right warning. Local AI can be better for privacy than hosted AI when prompts, files, and model calls stay on hardware you control. At the same time, a local agent can be more dangerous to your own machine if exposed poorly. A hosted chatbot usually cannot run shell commands on your server. Odysseus can, depending on how it is configured.</p><p>That is why Odysseus should be treated like privileged software. Authentication should stay enabled for any network-accessible deployment. Public internet exposure should be avoided unless the operator knows exactly what they are doing. Tokens should be protected. Private folders should stay out of Git. Development runs should bind to <code>127.0.0.1</code> unless LAN access is intentional.</p><p>The big privacy promise of local AI only holds if the user also handles local security properly.</p><h3>Who should try Odysseus now</h3><p>Odysseus makes the most sense for people who already understand local models, Docker or Python environments, and the security risks of agent tools.</p><p>It is a good fit for local AI hobbyists who already use Ollama, llama.cpp, vLLM, or OpenAI-compatible endpoints. It also fits power users who want one workspace for chat, research, memory, documents, model comparison, and personal automation. Developers may find it useful for testing local agents with file and shell access in a controlled lab. Privacy-conscious users may be interested in the idea of rich personal context without making a hosted AI account the center of their workflow.</p><p>Creators and researchers may also see the appeal. A private workspace that can draft documents, search, summarize files, compare models, and maintain memory could become genuinely useful if the rough edges improve.</p><p>Odysseus is a weaker fit for beginners who want a predictable production tool. It is also not the obvious choice for teams that need mature documentation, stable upgrades, fine-grained permissions, and a support path. For those users, Open WebUI and AnythingLLM may be better starting points.</p><p>For readers weighing the hardware side of local AI, Popular AI has already covered whether <a href="https://www.popularai.org/p/is-local-ai-hardware-worth-it-2026">local AI hardware is worth buying in 2026</a>, how to build a <a href="https://www.popularai.org/p/best-budget-local-llm-pc-under-1000-2026">budget local LLM PC under $1,000</a>, and how to use <a href="https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent">GGUF Loader Agentic Mode</a> for smaller local file-agent workflows. For a broader buying overview, the guide on <a href="https://www.popularai.org/p/is-local-ai-hardware-worth-it-2026">whether local AI hardware is worth it in 2026</a> is the best place to start.</p><div><hr></div><h4><em><strong>More on hardware for self-hosted AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;85bf625e-05f8-4274-aa7a-b4d5b9554c88&quot;,&quot;caption&quot;:&quot;Running local LLMs on your own desktop still solves a lot of problems at once. It keeps private work local. It cuts recurring API costs. It reduces the risk that a favorite model, feature, or account &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The 5 best prebuilt AI PCs for Ollama and local LLMs in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-13T13:42:54.236Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Qvp4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a476a49-8f0d-44e4-9e26-a6fc2f1a3d74_2400x1350.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-prebuilt-ai-pcs-for-local-llms-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:193899264,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>What Hacker News got right</h3><p>The early <a href="https://news.ycombinator.com/item?id=48346693">Hacker News discussion about Odysseus</a> quickly moved to the obvious question: why use this instead of Open WebUI?</p><p>That is the correct challenge. Odysseus cannot succeed long term as a celebrity wrapper around existing local AI ideas. It needs a reason to exist after the novelty fades.</p><p>That reason might be the all-in-one workspace. It might be the hardware-aware Cookbook. It might be the memory and skills system. It might be the email, notes, tasks, research, and model comparison pieces coming together in a way that feels more personal than other local AI tools.</p><p>It might also end up as an entertaining experiment that sends more users toward mature alternatives.</p><p>Either outcome would still be useful. The self-hosted AI world needs more people building the tools they wish existed. It also needs fewer people pretending every local AI demo is ready for critical work. Odysseus sits somewhere in the middle: promising, chaotic, funny, and early.</p><p>That is a more interesting place to be than polished irrelevance.</p><h3>What Odysseus means for local AI</h3><p>The most important part of Odysseus may be cultural.</p><p>A massive YouTuber is showing viewers that AI can be something you run, customize, wire together, break, repair, and make your own. That is a healthier message than the default consumer path, where AI means renting access to a hosted chat box and hoping the provider keeps the tool useful.</p><p>Odysseus also points toward the next phase of local AI. The model runner is becoming the boring layer. The real fight is the workspace around it: memory, files, agents, search, email, permissions, scheduling, tool use, security, and interface design.</p><p>Local AI wins when the workflow becomes good enough that people keep using it after the novelty wears off. That is the hard part. Running a model locally is no longer the finish line. The finish line is making that model useful in the daily mess of real work.</p><p>Odysseus is one more attempt to make that happen.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share Popular AI&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share Popular AI</span></a></p><div><hr></div><h3>Frequently asked questions about Odysseus</h3><h4>Is Odysseus a new AI model?</h4><blockquote><p>Odysseus is a self-hosted AI workspace. It connects to local models and API-backed models through providers such as vLLM, llama.cpp, Ollama, OpenRouter, and OpenAI.</p><p>That distinction is important. Users are not downloading a new PewDiePie foundation model. They are installing a workspace that can sit around multiple models and providers.</p><div><hr></div></blockquote><h4>Is Odysseus open source?</h4><blockquote><p>Yes. The repository includes an <a href="https://raw.githubusercontent.com/pewdiepie-archdaemon/odysseus/main/LICENSE">MIT license</a>, which generally allows use, copying, modification, publishing, distribution, sublicensing, and sale, subject to the license terms.</p><p>That matters because the project is aimed at self-hosters, developers, and power users who may want to inspect, modify, deploy, or fork the software.</p><div><hr></div></blockquote><h4>Does Odysseus run fully locally?</h4><blockquote><p>Odysseus is designed as a local-first self-hosted workspace, but whether a specific setup is fully local depends on the providers and integrations enabled.</p><p>If the workspace connects to OpenAI, OpenRouter, email services, cloud APIs, external search, or other hosted services, those parts of the workflow are no longer purely local. If the user connects only to local models and local tools, more of the workflow can stay on the machine or server under their control.</p><p>The important point is choice. Odysseus gives users a framework for deciding what stays local and what leaves the machine.</p><div><hr></div></blockquote><h4>Is Odysseus safer than ChatGPT or Claude?</h4><blockquote><p>Odysseus can give users more control over their data because it can run against local endpoints and store workspace data locally. That does not make it automatically safer.</p><p>The project includes powerful tools and integrations, including shell access, file operations, API tokens, model serving, email, calendar, memory, and settings. Those features need strong boundaries. A local workspace with weak authentication, exposed ports, unsafe plugins, or poorly handled tokens can create serious risk.</p><p>The privacy advantage is real only when the deployment is locked down and the user understands which tools and providers are active.</p><div><hr></div></blockquote><h4>Should beginners install Odysseus?</h4><blockquote><p>Most beginners should start with a simpler local AI setup first.</p><p>Ollama plus Open WebUI, LM Studio, AnythingLLM, or another beginner-friendly local model app will be a better entry point for many people. Those tools make it easier to learn the basics of local model selection, hardware limits, model endpoints, document chat, and setup tradeoffs.</p><p>Odysseus is more interesting once a user already understands those basics and wants a larger workspace with agents, memory, documents, research, and administrative capabilities.</p><div><hr></div></blockquote><div class="callout-block" data-callout="true"><h3>The bottom line</h3><p>Odysseus is worth watching and worth testing for users who are comfortable with early self-hosted software. It is not the safe default for ordinary users yet, and it is not automatically better than Open WebUI or AnythingLLM.</p><p>Its real value is the thesis behind it: useful local AI will not be a bare chat window forever. It will be a personal workspace that can use your context, work across your tools, and keep you in control of where your data lives.</p><p>Run it locally. Keep it locked down. Use non-critical data first. If the rough edges smooth out, Odysseus could become one of the more interesting self-hosted AI projects of 2026.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/pewdiepie-odysseus-ai-workspace/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/pewdiepie-odysseus-ai-workspace/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Private AI Agents with Ollama: run Llama-3-Groq-8B-Tool-Use locally]]></title><description><![CDATA[A practical guide to local tool calling with Ollama, Llama-3-Groq-8B-Tool-Use, Python tools, private RAG, and safer agent design.]]></description><link>https://www.popularai.org/p/llama-3-groq-8b-tool-use-ollama-local-ai-agents</link><guid isPermaLink="false">https://www.popularai.org/p/llama-3-groq-8b-tool-use-ollama-local-ai-agents</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Sun, 31 May 2026 14:09:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!j05H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j05H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j05H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!j05H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!j05H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!j05H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j05H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/acbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1481873,&quot;alt&quot;:&quot;How to run Llama-3-Groq-8B-Tool-Use locally with Ollama&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/199960583?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How to run Llama-3-Groq-8B-Tool-Use locally with Ollama" title="How to run Llama-3-Groq-8B-Tool-Use locally with Ollama" srcset="https://substackcdn.com/image/fetch/$s_!j05H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!j05H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!j05H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!j05H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facbfee2a-9f27-45e8-a382-2d004294ac22_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Want private AI agents without cloud APIs? Here is how Ollama and Llama-3-Groq-8B-Tool-Use bring function calling to local workflows. &#169; Popular AI</figcaption></figure></div><p>The easiest way to misunderstand AI agents is to think of them as chatbots with a few extra controls.</p><p>A real agent can decide when to call a tool, pass arguments to that tool, read the result, and continue the task. That might mean checking a database, writing to a file, searching a private knowledge base, drafting a customer reply, or triggering an internal automation.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/llama-3-groq-8b-tool-use-ollama-local-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/llama-3-groq-8b-tool-use-ollama-local-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>That extra power creates a tradeoff. The more useful an AI agent becomes, the more sensitive the data around it usually gets.</p><p>If your workflow involves client notes, internal documents, private code, customer records, financial data, or business strategy, sending every prompt and tool result to a cloud model can become the central risk in the system.</p><p>That is why local tool-use models matter. With <a href="https://ollama.com/download">Ollama</a> and <a href="https://ollama.com/library/llama3-groq-tool-use">Llama-3-Groq-8B-Tool-Use</a>, you can run a function-calling model on your own machine and build private agent workflows without sending prompts, documents, or tool outputs to an external model provider. Ollama lists the model series as focused on tool use and function calling, with an 8B variant at about 4.7GB and an 8K context window. The same model page says the 8B model reached 89.06% overall accuracy on BFCL at the time of publication in July 2024.</p><div><hr></div><h4><em><strong>More on local agentic AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;9858f757-4787-47fc-a953-7888411d7ecc&quot;,&quot;caption&quot;:&quot;GGUF Loader Agentic Mode is for developers who want a coding agent that can work on local files without sending a repository through a hosted AI account.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;GGUF Loader Agentic Mode: local coding agents without cloud accounts&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-20T13:31:44.487Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!6Ic0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:198398535,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Why local tool calling is useful</h3><p>A normal local chatbot is already helpful. It can summarize notes, rewrite text, answer questions about pasted material, and help with coding.</p><p>Tool calling adds a more practical layer. Instead of only producing text, the model can return a structured request such as:</p><pre><code><code>{
  "name": "search_invoices",
  "arguments": {
    "client": "Acme Ltd",
    "month": "April"
  }
}
</code></code></pre><p>Your application runs the real function, gives the result back to the model, and asks it to continue. This is the pattern that turns a local model into the decision-making layer of a private workflow.</p><p><a href="https://github.com/ollama/ollama/blob/main/docs/api.md#tool-calling">Ollama&#8217;s API documentation</a> explains the same flow directly: you provide tools through the <code>tools</code> parameter, the model can generate a response containing tool calls, and the model can then explain the tool result in its response.</p><p>That makes local agents useful for searching private documents, querying internal databases, creating draft emails, summarizing local meeting notes, reading project files, running safe internal scripts, building small business automations, and creating private research assistants.</p><p>The model does not need direct access to everything. You expose only the tools you want it to use, which makes the application design much easier to control.</p><h3>Install Ollama and pull the model</h3><p>First, install Ollama from the <a href="https://ollama.com/download">official Ollama download page</a> for your operating system.</p><p>Once installed, open a terminal and pull the model:</p><pre><code><code>ollama pull llama3-groq-tool-use
</code></code></pre><p>Then run it interactively:</p><pre><code><code>ollama run llama3-groq-tool-use
</code></code></pre><p>You can also call it through Ollama&#8217;s local API. The <a href="https://ollama.com/library/llama3-groq-tool-use">Llama-3-Groq-Tool-Use model page</a> shows the basic API pattern using the local Ollama server at <code>localhost:11434</code>.</p><pre><code><code>curl http://localhost:11434/api/chat \
  -d '{
    "model": "llama3-groq-tool-use",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
</code></code></pre><p>At this point, you have the model running locally and can start building a private agent loop around it.</p><h3>Build a simple local tool</h3><p>Here is a minimal Python example. This creates a fake private function called <code>get_project_status</code>, lets the model decide whether to call it, then returns the tool result to the model.</p><p>Install the Python client:</p><pre><code><code>pip install ollama -U
</code></code></pre><p>Then create <code>local_agent.py</code>:</p><pre><code><code>from ollama import chat


def get_project_status(project_name: str) -&gt; str:
    """Get the current status of a private internal project.

    Args:
        project_name: Name of the project to look up.

    Returns:
        A short project status summary.
    """
    private_projects = {
        "atlas": "Atlas is on track. Final review is due Friday.",
        "mercury": "Mercury is delayed because the client has not approved the data import.",
        "nova": "Nova is complete and ready for invoicing."
    }

    return private_projects.get(
        project_name.lower(),
        "No matching project was found."
    )


messages = [
    {
        "role": "user",
        "content": "What is the status of the Atlas project?"
    }
]

response = chat(
    model="llama3-groq-tool-use",
    messages=messages,
    tools=[get_project_status]
)

messages.append(response.message)

if response.message.tool_calls:
    for call in response.message.tool_calls:
        if call.function.name == "get_project_status":
            result = get_project_status(**call.function.arguments)

            messages.append({
                "role": "tool",
                "tool_name": call.function.name,
                "content": result
            })

    final_response = chat(
        model="llama3-groq-tool-use",
        messages=messages,
        tools=[get_project_status]
    )

    print(final_response.message.content)
else:
    print(response.message.content)
</code></code></pre><p>Run it:</p><pre><code><code>python local_agent.py
</code></code></pre><p>This is the core loop of a local agent:</p><ol><li><p>User asks a question.</p></li><li><p>Model chooses a tool.</p></li><li><p>Your code executes the tool.</p></li><li><p>Tool result goes back to the model.</p></li><li><p>Model writes the final answer.</p></li></ol><p>The privacy point is simple: the project data stays inside your machine or network.</p><h3>Use JSON schemas for stricter tools</h3><p>For production workflows, Python function signatures are often too loose. A stricter tool definition gives the model a clearer contract and gives your application a better chance of catching bad arguments before anything important happens.</p><p>Example:</p><pre><code><code>tools = [
    {
        "type": "function",
        "function": {
            "name": "create_invoice_draft",
            "description": "Create a draft invoice for a client.",
            "parameters": {
                "type": "object",
                "required": ["client_name", "amount", "currency"],
                "properties": {
                    "client_name": {
                        "type": "string",
                        "description": "The client name."
                    },
                    "amount": {
                        "type": "number",
                        "description": "The invoice amount."
                    },
                    "currency": {
                        "type": "string",
                        "description": "The currency code, such as USD or EUR."
                    }
                }
            }
        }
    }
]
</code></code></pre><p>This makes the tool contract easier to understand. The model knows which fields are required, what each field means, and how to format the call.</p><p><a href="https://github.com/ollama/ollama/blob/main/docs/api.md#tool-calling">Ollama&#8217;s tool-calling documentation</a> shows this same general structure for tool definitions, including the function name, description, parameters, properties, and required fields. The related <a href="https://ollama.com/blog/functions-as-tools">Ollama Python library update</a> also highlights support for passing functions as tools in the Python client.</p><h3>Keep the model away from dangerous tools</h3><p>Local does not automatically mean safe.</p><p>If you give a model access to a shell command tool, file deletion tool, email sending tool, or payment tool, you have created a risk. The model may call the wrong function, pass bad arguments, or misunderstand the user&#8217;s intent.</p><p>A safer pattern is to start with read-only tools:</p><pre><code><code>def search_docs(query: str) -&gt; str:
    ...

def get_customer_record(customer_id: str) -&gt; str:
    ...

def list_open_tasks(project: str) -&gt; str:
    ...
</code></code></pre><p>Then add write actions only when you have guardrails:</p><pre><code><code>def create_draft_email(recipient: str, subject: str, body: str) -&gt; str:
    ...

def create_invoice_draft(client: str, amount: float) -&gt; str:
    ...
</code></code></pre><p>Notice the word &#8220;draft.&#8221; For many business workflows, the best first step is to let the agent prepare work for review rather than execute irreversible actions.</p><p>A practical local-agent permission model looks like this:</p><pre><code><code>Safe:
- Search documents
- Read project data
- Summarize notes
- Draft replies
- Draft invoices
- Generate reports

Needs approval:
- Send emails
- Delete files
- Modify records
- Run shell commands
- Charge payments
- Publish content
</code></code></pre><p>This keeps the model useful while limiting the damage of a bad tool call. In business settings, that distinction matters more than model cleverness.</p><h3>Connect the agent to private documents</h3><p>A common local agent workflow is retrieval-augmented generation, often called RAG.</p><p>The basic architecture is:</p><pre><code><code>Private documents
        &#8595;
Local embedding model
        &#8595;
Local vector database
        &#8595;
Search tool
        &#8595;
Local Llama-3-Groq-8B-Tool-Use agent
        &#8595;
Answer with cited internal context
</code></code></pre><p>The agent does not need to ingest every document into the prompt. It can call a search tool, retrieve only the most relevant passages, and then write an answer using that context.</p><p>For example:</p><pre><code><code>def search_private_docs(query: str) -&gt; str:
    """Search local private documents for relevant passages."""
    # Connect this to Chroma, SQLite, LanceDB, or another local store.
    return "Relevant internal passage goes here."
</code></code></pre><p>Then ask:</p><pre><code><code>What does our onboarding policy say about contractor access?
</code></code></pre><p>The model can call <code>search_private_docs</code>, receive the relevant internal text, and answer without exposing the document set to a cloud model.</p><p>This is where local tool calling becomes especially useful. The model becomes the reasoning layer, while your tools decide exactly which private data can be accessed and how much context should be returned.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Understand the hardware expectations</h3><p>The 8B model is the practical starting point because it is much easier to run than a 70B model. Ollama lists the <code>llama3-groq-tool-use:8b</code> variant <a href="https://ollama.com/library/llama3-groq-tool-use">at about 4.7GB</a>, while the 70B variant is listed at about 40GB.</p><p>For casual local use, a modern laptop with enough RAM can run the 8B model. For smoother performance, use a machine with a capable GPU or Apple Silicon with unified memory.</p><div><hr></div><h4><em><strong>More on hardware for local agentic AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a57743e3-bd2b-4376-9d5e-31f73dc07b24&quot;,&quot;caption&quot;:&quot;People are no longer asking for a local model in the abstract. They want a local coding agent that can inspect a repo, run tools, write patches, refactor code, and keep working even when a vendor changes p&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best RTX 3090 PC build for local coding agents in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-24T19:15:55.624Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!i1uR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918fc1c1-2dc0-45b4-9fa7-fd17f7ebaf1a_2400x1405.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/the-best-rtx-3090-pc-build-for-local&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192010030,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><p>For heavier agent workflows, prioritize more RAM or unified memory, fast SSD storage, a GPU with enough VRAM, smaller prompts, focused tools, and short tool outputs.</p><p>Agents become slow when you stuff too much into the context. Keep tool results compact, and resist the temptation to pass entire documents back into the conversation when a few relevant excerpts will do.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IYJp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IYJp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!IYJp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!IYJp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!IYJp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IYJp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1632719,&quot;alt&quot;:&quot;Llama-3-Groq-8B-Tool-Use: run private AI agents locally&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/199960583?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Llama-3-Groq-8B-Tool-Use: run private AI agents locally" title="Llama-3-Groq-8B-Tool-Use: run private AI agents locally" srcset="https://substackcdn.com/image/fetch/$s_!IYJp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!IYJp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!IYJp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!IYJp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafe3b05f-c2b3-4e22-aa4a-fb5f55bb45d8_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Learn how to run Llama-3-Groq-8B-Tool-Use locally with Ollama and build private AI agent workflows that keep sensitive data on your machine. &#169; Popular AI</figcaption></figure></div><h3>Build a realistic private agent workflow</h3><p>Here is a useful small-business example.</p><p>Goal: a local client-support assistant that can answer questions using private support notes.</p><p>Tools:</p><pre><code><code>def search_support_notes(query: str) -&gt; str:
    """Search local support notes."""

def get_client_plan(client_name: str) -&gt; str:
    """Retrieve the client's current service plan."""

def draft_support_reply(client_name: str, issue: str, answer: str) -&gt; str:
    """Create a draft support reply for review."""
</code></code></pre><p>User prompt:</p><pre><code><code>Draft a reply to Martin about the data export issue. Check the support notes first.
</code></code></pre><p>The agent can:</p><ol><li><p>Search the support notes.</p></li><li><p>Check the client&#8217;s plan.</p></li><li><p>Draft a reply.</p></li><li><p>Leave the final send action to a human.</p></li></ol><p>That workflow can replace a lot of manual searching and writing, while keeping sensitive client data local.</p><p>The same pattern works for internal operations, legal intake, account management, customer support, project reporting, and research workflows. Start with one narrow task, expose only the tools required for that task, then expand once the agent behaves predictably.</p><h3>Fix common local agent problems</h3><p><strong>Problem: The model answers without calling the tool.</strong><br>Make the instruction clearer. Tell it that it must use a specific tool before answering.</p><pre><code><code>Use the search_private_docs tool before answering. Do not answer from memory.
</code></code></pre><div><hr></div><p><strong>Problem: The model calls the wrong tool.</strong><br>Use clearer tool names and descriptions. Avoid overlapping tools such as <code>get_data</code>, <code>search_data</code>, and <code>lookup_data</code>.</p><div><hr></div><p><strong>Problem: The tool arguments are messy.</strong><br>Use a strict JSON schema and keep fields simple.</p><div><hr></div><p><strong>Problem: The response is too slow.</strong><br>Reduce context size, shorten tool outputs, use the 8B model, or move to faster hardware.</p><div><hr></div><p><strong>Problem: The model tries to do too much.</strong><br>Split the workflow into smaller tools and require approval before write actions.</p><p>Most local agent problems come from vague tool names, overloaded tools, oversized context, or weak approval boundaries. The model is only one part of the system. The tool layer matters just as much.</p><h3>Follow local agent design best practices</h3><p>Start small. A local agent with three reliable tools is more useful than a sprawling system with twenty vague ones.</p><p>Use descriptive names:</p><pre><code><code>Good:
- search_client_notes
- get_invoice_status
- draft_email_reply

Bad:
- run_task
- lookup
- process
</code></code></pre><p>Return short, structured tool results:</p><pre><code><code>{
  "client": "Acme Ltd",
  "invoice_status": "overdue",
  "amount": "&#8364;2,400",
  "due_date": "2026-05-15"
}
</code></code></pre><p>Log every tool call:</p><pre><code><code>timestamp
user request
tool name
tool arguments
tool result
final response
</code></code></pre><p>For sensitive workflows, logs are not optional. They are how you debug mistakes, review unexpected behavior, and prove what happened after the fact.</p><p>Good logs also help you improve the tools themselves. If the model keeps passing messy arguments, the schema may need to be stricter. If it keeps choosing the wrong tool, the descriptions may be too similar. If responses are slow, the logs will show whether the model, the retrieval step, or the tool output size is the real bottleneck.</p><h3>Local agents are about control</h3><p>Cloud AI tools are convenient, but convenience has a cost. The most useful agent workflows often need access to information that should stay under your control.</p><p>Running Llama-3-Groq-8B-Tool-Use locally with Ollama gives you a practical middle ground. You get structured tool calling, useful automation, and private execution without building an entire model stack from scratch.</p><p>It will not replace every cloud model. Larger hosted models may still perform better on complex reasoning tasks. For private internal workflows, though, local function calling is already good enough to build useful systems.</p><p>The best place to start is simple:</p><pre><code><code>ollama pull llama3-groq-tool-use
</code></code></pre><p>Then expose one safe tool.</p><p>Make it reliable.</p><p>Add another.</p><p>That is how a local chatbot becomes a private agent.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/llama-3-groq-8b-tool-use-ollama-local-ai-agents/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/llama-3-groq-8b-tool-use-ollama-local-ai-agents/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[The local Otter.ai alternative: run private meeting transcription with Whisper and Ollama]]></title><description><![CDATA[A practical guide to private meeting transcription that keeps sensitive audio, transcripts, and AI summaries on your own machine.]]></description><link>https://www.popularai.org/p/private-local-meeting-transcription-whisper-ollama</link><guid isPermaLink="false">https://www.popularai.org/p/private-local-meeting-transcription-whisper-ollama</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Sat, 30 May 2026 17:50:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6Hob!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Hob!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Hob!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!6Hob!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!6Hob!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!6Hob!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Hob!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1786394,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/199890780?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Hob!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!6Hob!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!6Hob!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!6Hob!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51645be5-b867-4142-89ba-46f67e3e0144_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Replace Otter.ai with a private local meeting transcription workflow using WhisperX, pyannote, and Ollama for transcripts and summaries. &#169; Popular AI</figcaption></figure></div><p>Private local meeting transcription is the right move when meeting audio contains client work, legal strategy, hiring discussions, unpublished plans, financial details, or anything else that should not pass through another company&#8217;s servers or tech stack.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/private-local-meeting-transcription-whisper-ollama?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/private-local-meeting-transcription-whisper-ollama?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Hosted tools such as Otter.ai are convenient because they remove friction: they can join calls, transcribe live, organize notes, and turn conversations into follow-ups. But the tradeoff is control: the recording, transcription, metadata, summaries, sharing workflow, and account access all depend on a cloud service owned by someone else.</p><p>The local path takes more setup, but it is a better fit for sensitive work. Record the meeting with consent, transcribe the audio locally with <a href="https://github.com/m-bain/whisperX">WhisperX</a>, optionally add speaker labels with <a href="https://huggingface.co/pyannote/speaker-diarization-community-1">pyannote</a>, then summarize the transcript with a local model through <a href="https://github.com/ollama/ollama">Ollama</a>.</p><div><hr></div><h4><em><strong>More on local transcription solutions:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6b5af007-33b0-4f07-84e3-425aa9f14b0b&quot;,&quot;caption&quot;:&quot;Every private call you upload to a transcription SaaS creates a new copy of your conversations outside your control. That can turn into retention risk, compliance headaches, or a slow drip of vendor l&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Build this quiet Whisper server for private AI transcription in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-18T13:29:43.293Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!UERW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b96351a-7ee5-4aaf-863a-978e3026503a_2400x1350.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-whisper-server-private-ai-transcription-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:193982926,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Key takeaways</h3><p>Use <a href="https://otter.ai/pricing">Otter.ai</a> when convenience matters more than control. It supports Zoom, Microsoft Teams, and Google Meet, offers live transcription, and handles sharing and summaries inside one hosted product.</p><p>Use a local WhisperX and Ollama workflow when the audio is sensitive, recurring, or expensive to process in a subscription tool. The setup takes longer, but the transcript and summary can stay in your own preferred file management structure.</p><p>The privacy tradeoff is real. The <a href="https://otter.ai/privacy-policy">Otter.ai privacy policy</a> says users may provide audio recordings, OtterPilot screenshots, uploaded text, images, and video, and that Otter uses audio recordings and platform information to provide the service.</p><p>WhisperX is the best practical local base for long meeting audio because it adds faster batched transcription, word-level timestamps, voice activity detection, and optional diarization. <a href="https://github.com/openai/whisper">OpenAI Whisper</a> is still useful for simpler local transcription, but WhisperX adds the meeting-friendly pieces that matter for longer calls.</p><p>Full speaker diarization is the real catch. The <a href="https://huggingface.co/pyannote/speaker-diarization-community-1">pyannote community diarization model</a> can run locally after setup, but access requires accepting model conditions and creating a Hugging Face token.</p><p>Ollama handles the local summarization layer through a local REST API, so the transcript can stay on the machine instead of being pasted into a hosted chatbot.</p><h3>The practical answer</h3><p>For most people replacing Otter.ai, the best local workflow is:</p><ol><li><p>Record the meeting audio locally.</p></li><li><p>Convert it to a clean WAV or MP3 file.</p></li><li><p>Run WhisperX for transcription.</p></li><li><p>Add diarization only when speaker labels matter.</p></li><li><p>Feed the transcript into Ollama for summaries, action items, decisions, and follow-up emails.</p></li><li><p>Store the audio, transcript, and summary in your own folder structure.</p></li></ol><p>Use Otter.ai if you need a polished notetaker that automatically joins calls, syncs with calendars, shares notes, and works with minimal setup. Use the local workflow if privacy, cost control, and repeatability matter more than polish.</p><p>The cleanest rule is simple: if the meeting would be risky to upload, do not build your workflow around uploading it.</p><h3>Why people want an Otter.ai alternative</h3><p>Otter.ai is useful because it removes friction. The <a href="https://otter.ai/pricing">Otter.ai pricing page</a> lists Zoom, Microsoft Teams, and Google Meet support, live transcription, speaker identification, audio playback, mobile apps, and AI meeting workflows even on the free Basic plan. The Basic plan also includes 300 monthly transcription minutes and three lifetime audio or video file imports.</p><p>Paid plans add more minutes, longer meetings, more imports, exports, advanced search, team vocabulary, admin controls, and integrations. As of May 30, 2026, Otter lists Pro at $16.99 per user per month on monthly billing, with 1,200 in-app recording minutes and 10 monthly audio or video file imports. Business is listed at $30 per user per month on monthly billing, with unlimited meetings and in-app recordings, custom AI workflows, and more admin features.</p><p>That is a strong product shape for teams that value convenience. It is also why Otter.ai can become hard to leave. Once meetings, summaries, action items, integrations, and exports live inside one hosted account, the workflow starts to depend on that account.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>The privacy tradeoff behind cloud transcription</h3><p>Meeting transcription is unusually sensitive because it captures raw human conversation. A transcript can include names, client facts, legal comments, financial details, medical references, sales strategy, credentials spoken aloud, and private opinions that were never meant to become searchable text.</p><p>The <a href="https://otter.ai/privacy-policy">Otter.ai privacy policy</a> says users may provide audio recordings, OtterPilot screenshots, uploaded text, images, and videos. It also says Otter may receive platform information from connected services such as Google Calendar, iCal, Google Contacts, and Zoom.</p><p>The same policy says Otter uses audio recordings, usage information, and platform information to provide the service. It also says Otter trains its technology on transcriptions to provide more accurate services, which may contain personal information, while requiring explicit permission for manual review of specific audio recordings for model training and product improvement.</p><p>Otter also shares personal information with selected third parties, including cloud service providers such as AWS, platform support providers such as Amplitude, data labeling providers, and AI service providers that support product features.</p><p>The legal and operational point is not that Otter is uniquely bad. Hosted meeting transcription is a cloud workflow by design. The vendor processes the material, stores the output, controls the account, sets the terms, and can change the service.</p><p>The <a href="https://otter.ai/terms-of-service">Otter.ai terms of service</a> say users are responsible for providing notices and getting consent for recordings where required by law. The terms also say Otter may monitor information transmitted or received through the service for operational and other purposes, and that it does not guarantee user content or processing results will never be accessible by others.</p><p>Account risk matters too. Otter&#8217;s terms say it may terminate an account or suspend access at its sole discretion, at any time, for any reason or no reason, with or without notice. The terms also say Otter may modify or discontinue the service, including limiting or discontinuing features, without notice.</p><p>That is the control layer. The transcript may be yours in theory, but the workflow that produces it runs through someone else&#8217;s system.</p><h3>What local alternatives can realistically do</h3><p>A local workflow can handle the core job well. It can transcribe recorded meeting audio, generate word-level timestamps, label speakers with diarization, produce summaries, extract decisions and tasks, and store everything in local folders. It can also run without uploading the actual meeting audio to a transcription vendor.</p><p>It will not fully copy Otter&#8217;s polished workflow and ease of use.</p><p>A local workflow will not automatically join every meeting without extra tooling. It will not know each speaker&#8217;s real name by default. It will not sync into every CRM without automation work. It will not be as easy for non-technical coworkers.</p><p>That tradeoff is acceptable for many private workflows. Hosted tools win on convenience. Local tools win on control.</p><h3>Recommended local transcription stack</h3><p>The recommended stack has three layers: WhisperX for transcription, pyannote for optional speaker diarization, and Ollama for local summarization.</p><p>WhisperX is the practical engine for this workflow. Its repository describes fast automatic speech recognition with word-level timestamps and speaker diarization. It uses batched inference, the faster-whisper backend, wav2vec2 alignment, voice activity detection, and optional pyannote-based speaker diarization.</p><p>OpenAI&#8217;s original Whisper is still useful, especially for simpler local transcription. <a href="https://github.com/openai/whisper">Whisper</a> is a general-purpose speech recognition model trained for multilingual speech recognition, translation, and language identification. Its code and model weights are released under the MIT License.</p><p>WhisperX adds the meeting-friendly pieces that vanilla Whisper lacks. The WhisperX README says OpenAI Whisper&#8217;s timestamps are utterance-level rather than word-level and can be inaccurate by several seconds, while WhisperX adds word-level timestamp alignment and batching.</p><p>Diarization means &#8220;who spoke when.&#8221; This is useful for meetings, but it is also the most fragile part of the local workflow. The pyannote community diarization model says it can run locally on your computer, supports offline use, and ingests mono audio sampled at 16 kHz. It also requires accepting model conditions and creating a Hugging Face access token for setup.</p><p>Treat speaker labels as draft metadata, not proof. WhisperX itself warns that overlapping speech is not handled particularly well and that diarization is far from perfect.</p><p>For summaries, Ollama runs open models locally and exposes a REST API at <code>localhost:11434</code>, which makes it easy to send transcripts into a local summarization script.</p><p>For meeting summaries, start with a model that has enough context for long transcripts. <a href="https://ollama.com/library/gemma3">Gemma 3 models in Ollama</a> include 4B, 12B, and 27B variants with 128K context windows, and the 4B model is listed at 3.3GB while the 12B model is listed at 8.1GB. <a href="https://ollama.com/library/qwen3">Qwen3</a> is another strong local option, with an 8B model listed at 5.2GB and a 40K context window.</p><p>Start with <code>gemma3:4b</code> on modest hardware. Use <code>gemma3:12b</code> or <code>qwen3:8b</code> when you have more memory and want better summaries.</p><h3>What you need</h3><h3>Minimum practical setup</h3><ul><li><p>A Windows, macOS, or Linux machine.<br></p></li><li><p>Python and a clean virtual environment.<br></p></li><li><p>FFmpeg.<br></p></li><li><p>WhisperX.<br></p></li><li><p>Ollama.<br></p></li><li><p>A local model such as <code>gemma3:4b</code>, <code>gemma3:12b</code>, or <code>qwen3:8b</code>.<br></p></li><li><p>Enough disk space for models, audio files, transcripts, and summaries.<br></p></li><li><p>A way to record meeting audio with consent.<br></p></li></ul><p>Whisper&#8217;s setup docs say it requires FFmpeg and show install commands for Ubuntu, Arch Linux, macOS with Homebrew, Windows with Chocolatey, and Windows with Scoop. Whisper&#8217;s own model table lists approximate VRAM requirements from about 1GB for tiny and base models to about 10GB for the large model, with the turbo model listed at about 6GB.</p><h3>Recommended setup</h3><ul><li><p>NVIDIA GPU with at least 8GB VRAM for a smoother WhisperX experience.<br></p></li><li><p>16GB system RAM minimum, 32GB preferred.<br></p></li><li><p>SSD storage.<br></p></li><li><p>A local folder structure for recordings, transcripts, summaries, and exports.<br></p></li><li><p>A local model with enough context for your transcript length.<br></p></li></ul><p>WhisperX says its faster-whisper backend requires less than 8GB GPU memory for <code>large-v2</code> with <code>beam_size=5</code>, and its setup docs recommend installing CUDA 12.8 for GPU acceleration while allowing CPU-only use.</p><div><hr></div><h4><em><strong>More on local AI hardware:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;02b06f27-0958-4ac9-8ba3-35b9d8c6fd18&quot;,&quot;caption&quot;:&quot;Running larger local language models at home in 2026 is easier than it was a year ago, but building the right machine has become a lot less forgiving. Software has improved. vLLM&#8217;s parallelism and scaling docs&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;These 3 dual GPU AI pc builds absolutely crush local LLMs in 2026&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-05-09T21:22:10.662Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ZhPn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926cb61e-307e-4df5-ae0f-ed4930172adb_2400x1559.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/dual-gpu-ai-pc-builds-local-llm-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:196145185,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Step 1: Create a clean project folder</h3><p>Use one folder per transcription setup. This keeps audio, transcripts, summaries, and scripts easy to back up or delete.</p><pre><code><code>mkdir local-meeting-transcription
cd local-meeting-transcription

mkdir audio
mkdir transcripts
mkdir summaries
mkdir scripts
</code></code></pre><p>Suggested structure:</p><pre><code><code>local-meeting-transcription/
  audio/
    2026-05-30-client-call.wav
  transcripts/
    2026-05-30-client-call.txt
    2026-05-30-client-call.srt
  summaries/
    2026-05-30-client-call-summary.md
  scripts/
    summarize_transcript.py
</code></code></pre><h3>Step 2: Install FFmpeg</h3><p>Whisper and WhisperX need FFmpeg for audio handling.</p><p>On Windows with Chocolatey:</p><pre><code><code>choco install ffmpeg
</code></code></pre><p>On Windows with Scoop:</p><pre><code><code>scoop install ffmpeg
</code></code></pre><p>On macOS with Homebrew:</p><pre><code><code>brew install ffmpeg
</code></code></pre><p>On Ubuntu or Debian:</p><pre><code><code>sudo apt update
sudo apt install ffmpeg
</code></code></pre><p>OpenAI&#8217;s Whisper docs list these FFmpeg install paths directly.</p><p>Test it:</p><pre><code><code>ffmpeg -version
</code></code></pre><p>If the terminal prints version information, FFmpeg is available.</p><h3>Step 3: Install WhisperX</h3><p>Create a Python environment first.</p><p>On Windows:</p><pre><code><code>python -m venv .venv
.\.venv\Scripts\activate
python -m pip install --upgrade pip
pip install whisperx
</code></code></pre><p>On macOS or Linux:</p><pre><code><code>python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install whisperx
</code></code></pre><p>WhisperX lists <code>pip install whisperx</code> as the recommended simple installation path.</p><p>Test it:</p><pre><code><code>whisperx --help
</code></code></pre><p>If you see the help output, the CLI is available.</p><h3>Step 4: Record or export your meeting audio</h3><p>Use the meeting platform&#8217;s built-in recording feature only when you have the right to do so and everyone who needs notice has received it. Consent rules vary by jurisdiction, and Otter&#8217;s own terms put responsibility for notices and consent on the user.</p><p>For a private local workflow, the safest technical pattern is:</p><ol><li><p>Record the meeting locally.</p></li><li><p>Save the file into <code>audio/</code>.</p></li><li><p>Rename it with a date and short description.</p></li><li><p>Keep the raw file until the transcript has been checked.</p></li><li><p>Delete or archive the raw file according to your own retention policy.</p></li></ol><p>Example filename:</p><pre><code><code>audio/2026-05-30-client-discovery-call.wav
</code></code></pre><h3>Step 5: Transcribe with WhisperX</h3><p>For a basic transcript:</p><pre><code><code>whisperx audio/2026-05-30-client-discovery-call.wav --model large-v2 --output_dir transcripts
</code></code></pre><p>For lower-memory machines, use a smaller model:</p><pre><code><code>whisperx audio/2026-05-30-client-discovery-call.wav --model small --compute_type int8 --output_dir transcripts
</code></code></pre><p>For CPU-only use:</p><pre><code><code>whisperx audio/2026-05-30-client-discovery-call.wav --compute_type int8 --device cpu --output_dir transcripts
</code></code></pre><p>WhisperX documents CPU usage with <code>--compute_type int8 --device cpu</code>, and recommends lowering batch size, using a smaller ASR model, or using <code>int8</code> when GPU memory is limited.</p><p>Expected output:</p><pre><code><code>transcripts/2026-05-30-client-discovery-call.txt
transcripts/2026-05-30-client-discovery-call.srt
transcripts/2026-05-30-client-discovery-call.json
</code></code></pre><p>The exact filenames depend on the input file and output settings.</p><h3>Step 6: Add speaker labels when needed</h3><p>For diarization, you need a Hugging Face token and accepted model conditions. WhisperX documents diarization with a Hugging Face access token and pyannote model agreement.</p><p>Run:</p><pre><code><code>whisperx audio/2026-05-30-client-discovery-call.wav --model large-v2 --diarize --hf_token YOUR_HUGGING_FACE_TOKEN --output_dir transcripts
</code></code></pre><p>If you know the number of speakers, help the diarizer:</p><pre><code><code>whisperx audio/2026-05-30-client-discovery-call.wav --model large-v2 --diarize --min_speakers 2 --max_speakers 2 --hf_token YOUR_HUGGING_FACE_TOKEN --output_dir transcripts
</code></code></pre><p>Use this only when speaker labels are worth the extra setup. For many solo interviews, voice notes, and simple calls, plain transcription is enough.</p><h3>Step 7: Install Ollama and pull a local model</h3><p>Install Ollama. The project documents install paths in the <a href="https://github.com/ollama/ollama">Ollama GitHub README</a>, and Ollama also provides a <a href="https://ollama.com/install.ps1">Windows install script</a> and a <a href="https://ollama.com/install.sh">macOS and Linux install script</a>.</p><p>On Windows:</p><pre><code><code>irm https://ollama.com/install.ps1 | iex
</code></code></pre><p>On macOS or Linux:</p><pre><code><code>curl -fsSL https://ollama.com/install.sh | sh
</code></code></pre><p>Ollama&#8217;s GitHub README lists those install commands for Windows, macOS, and Linux.</p><p>Pull a summarization model:</p><pre><code><code>ollama pull gemma3:4b
</code></code></pre><p>For better summaries on stronger machines:</p><pre><code><code>ollama pull gemma3:12b
</code></code></pre><p>Or:</p><pre><code><code>ollama pull qwen3:8b
</code></code></pre><p>Test the model:</p><pre><code><code>ollama run gemma3:4b "Summarize this sentence in five words: The meeting covered pricing, delivery risks, and next steps."
</code></code></pre><h3>Step 8: Summarize the transcript locally</h3><p>Create this file:</p><pre><code><code>scripts/summarize_transcript.py
</code></code></pre><p>Paste:</p><pre><code><code>import json
import sys
from pathlib import Path
from urllib.request import Request, urlopen

OLLAMA_URL = "http://localhost:11434/api/chat"
MODEL = "gemma3:4b"

def ask_ollama(prompt: str) -&gt; str:
    payload = {
        "model": MODEL,
        "messages": [
            {
                "role": "system",
                "content": "You turn meeting transcripts into concise, accurate notes. Do not invent facts. Flag uncertainty."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        "stream": False
    }

    request = Request(
        OLLAMA_URL,
        data=json.dumps(payload).encode("utf-8"),
        headers={"Content-Type": "application/json"}
    )

    with urlopen(request) as response:
        data = json.loads(response.read().decode("utf-8"))
        return data["message"]["content"]

def chunk_text(text: str, max_chars: int = 12000) -&gt; list[str]:
    chunks = []
    current = []

    for paragraph in text.splitlines():
        candidate = "\n".join(current + [paragraph])
        if len(candidate) &gt; max_chars and current:
            chunks.append("\n".join(current))
            current = [paragraph]
        else:
            current.append(paragraph)

    if current:
        chunks.append("\n".join(current))

    return chunks

def main() -&gt; None:
    if len(sys.argv) != 3:
        print("Usage: python scripts/summarize_transcript.py transcripts/input.txt summaries/output.md")
        sys.exit(1)

    input_path = Path(sys.argv[1])
    output_path = Path(sys.argv[2])

    transcript = input_path.read_text(encoding="utf-8")
    chunks = chunk_text(transcript)

    chunk_summaries = []

    for index, chunk in enumerate(chunks, start=1):
        prompt = f"""
Summarize this meeting transcript chunk.

Return:
- Main points
- Decisions
- Action items with owner if stated
- Open questions
- Risks or blockers
- Notable quotes only if they matter

Do not invent names, owners, dates, or decisions.

Chunk {index} of {len(chunks)}:
{chunk}
"""
        print(f"Summarizing chunk {index} of {len(chunks)}...")
        chunk_summaries.append(ask_ollama(prompt))

    final_prompt = f"""
Create final meeting notes from these chunk summaries.

Return markdown with:
# Meeting summary
## Executive summary
## Decisions
## Action items
## Open questions
## Risks
## Follow-up email draft
## Items that need human verification

Do not invent facts. If the transcript does not name an owner, write "Owner not stated."

Chunk summaries:
{chr(10).join(chunk_summaries)}
"""
    final_summary = ask_ollama(final_prompt)
    output_path.write_text(final_summary, encoding="utf-8")
    print(f"Saved summary to {output_path}")

if __name__ == "__main__":
    main()
</code></code></pre><p>Run it:</p><pre><code><code>python scripts/summarize_transcript.py transcripts/2026-05-30-client-discovery-call.txt summaries/2026-05-30-client-discovery-call-summary.md
</code></code></pre><p>Open the Markdown file and check it against the transcript before sending it to anyone.</p><h3>Step 9: Add a human review pass</h3><p>Local AI reduces cloud exposure. It still needs judgment.</p><p>Review for:</p><ul><li><p>Wrong speaker labels.<br></p></li><li><p>Action items assigned to the wrong person.<br></p></li><li><p>Dates or numbers copied incorrectly.<br></p></li><li><p>Decisions that were only suggestions.<br></p></li><li><p>Missing objections.<br></p></li><li><p>Sensitive comments that should not be forwarded.<br></p></li><li><p>Accidental inclusion of private side comments.<br></p></li><li><p>Hallucinated follow-up wording.<br></p></li></ul><p>The best workflow is boring: transcript first, AI summary second, human review before sharing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZEdQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2087219,&quot;alt&quot;:&quot;Private local meeting transcription: replace Otter.ai with WhisperX&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/199890780?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Private local meeting transcription: replace Otter.ai with WhisperX" title="Private local meeting transcription: replace Otter.ai with WhisperX" srcset="https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!ZEdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e3c15f-5355-4da9-b3ae-01cdf5ae799d_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Learn when to use Otter.ai and when to keep meeting audio local with WhisperX transcription, pyannote speaker labels, and Ollama summaries. &#169; Popular AI</figcaption></figure></div><h3>Privacy, account risk, and lock-in</h3><p>A local workflow like this changes the control point.</p><p>With Otter.ai, the vendor controls the hosted notetaker, transcription account, storage, sharing features, integrations, plan limits, and terms. Otter&#8217;s terms reserve the right to modify or discontinue the service and to suspend or terminate access.</p><p>With WhisperX and Ollama, the files stay in your local folder after the initial software and model downloads. The remaining risks are different:</p><ul><li><p>You can still leak data by pasting transcripts into hosted tools.<br></p></li><li><p>You can still expose files through bad folder syncing.<br></p></li><li><p>A Hugging Face token is required for pyannote diarization setup.<br></p></li><li><p>Local transcripts can be stolen if your device is compromised.<br></p></li><li><p>Meeting participants may still need notice or consent.<br></p></li><li><p>Local models can summarize badly, especially with messy transcripts.<br></p></li></ul><p>The local path gives you more control, and it also makes you the administrator. That is the trade.</p><h3>Commercial vs local: which should you use?</h3><p>Use Otter.ai if your meetings are low sensitivity, you need automatic meeting joining, you need polished sharing, your team will not tolerate a command-line workflow, you want calendar and collaboration features, or the subscription cost is acceptable.</p><p>Use WhisperX plus Ollama if meetings contain sensitive client or business information, you want transcripts and summaries stored in your own folders, you process enough audio that subscriptions are annoying, you need repeatable local archives, or you do not want every meeting to become vendor-processed data.</p><p>A hybrid workflow can also make sense. Use Otter.ai for routine internal calls, use local transcription for sensitive meetings, and keep the local workflow as a fallback when a hosted tool changes price, limits, or policies.</p><h3>Common errors and fixes</h3><h4>Error: <code>ffmpeg not found</code></h4><p>What it means: WhisperX cannot find FFmpeg.</p><p>How to fix it: Install FFmpeg, then restart the terminal. Test with:</p><pre><code><code>ffmpeg -version
</code></code></pre><h4>Error: CUDA out of memory</h4><p>What it means: The model or batch size is too large for your GPU.</p><p>How to fix it: Use a smaller model, lower batch size, or switch to int8.</p><pre><code><code>whisperx audio/meeting.wav --model small --compute_type int8 --batch_size 4 --output_dir transcripts
</code></code></pre><p>WhisperX recommends reducing batch size, using a smaller ASR model, or using <code>int8</code> for lower GPU memory use.</p><h4>Error: diarization fails or asks for authentication</h4><p>What it means: The pyannote model needs accepted conditions and an access token.</p><p>How to fix it: Accept the model conditions, create a Hugging Face token, and pass it with <code>--hf_token</code>. pyannote&#8217;s model card states that users must accept conditions and create an access token.</p><h4>Error: Ollama summary is too vague</h4><p>What it means: The local model is too small, the transcript is too long, or the prompt is too loose.</p><p>How to fix it: Use <code>gemma3:12b</code> or <code>qwen3:8b</code>, chunk the transcript, and force the model to separate decisions, action items, risks, and unknowns.</p><h4>Error: speaker labels are wrong</h4><p>What it means: Diarization is imperfect, especially with overlapping speakers, poor audio, or similar voices.</p><p>How to fix it: Provide <code>--min_speakers</code> and <code>--max_speakers</code> when you know the number of speakers. Then manually correct important sections.</p><div><hr></div><h3>FAQ</h3><h4>Is local meeting transcription really private?</h4><blockquote><p>Local meeting transcription is private in the practical sense when the audio, transcript, and summary stay on your machine and you do not sync them to cloud storage or paste them into hosted AI tools. Your local device, backups, sync settings, and collaborators still matter.</p></blockquote><div><hr></div><h4>Can WhisperX transcribe live meetings?</h4><blockquote><p>This guide focuses on recorded audio. Live local transcription is possible with more tooling, but recorded audio is easier to verify, easier to archive, and less likely to fail during an important call.</p></blockquote><div><hr></div><h4>Is WhisperX better than Whisper?</h4><blockquote><p>For long meeting audio, WhisperX is usually the better workflow tool because it adds batched inference, word-level timestamps, voice activity detection, and optional diarization. OpenAI Whisper is simpler and still useful for basic transcription.</p></blockquote><div><hr></div><h4>Do I need a GPU?</h4><blockquote><p>No, but a GPU helps. WhisperX documents CPU mode with <code>--compute_type int8 --device cpu</code>. Expect slower processing on CPU.</p></blockquote><div><hr></div><h4><em><strong>More on this subject:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;66edb83d-18b4-4011-bcba-bf75d694b8ca&quot;,&quot;caption&quot;:&quot;For anyone building a cheap local AI box in 2026, the first rule has not changed. VRAM matters more than gamer marketing. A Llama 3.1 8B Q4 build in Ollama is 4.9GB. A Gemma 3 12B Q4 build lands at 8.1GB, while its Q8 &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best budget GPUs for local LLMs in 2026: 5 smart buys for Ollama&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-21T13:31:02.258Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!vIue!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ed22ac-c47a-4628-85f2-763942f38049_2303x1478.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-budget-gpus-local-llms-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:194906880,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h4>Can I run the whole workflow without any account?</h4><blockquote><p>You can transcribe without an account if you skip pyannote diarization. If you want pyannote speaker diarization, you need a Hugging Face account, token, and model access approval during setup.</p></blockquote><div><hr></div><h4>What is the best local model for meeting summaries?</h4><blockquote><p>Start with <code>gemma3:4b</code> if your machine is modest. Use <code>gemma3:12b</code> or <code>qwen3:8b</code> if you have enough memory and want better summaries. Gemma 3 models on Ollama list 128K context windows for the 4B, 12B, and 27B versions.</p></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>Replace Otter.ai with a local workflow when the meeting is sensitive enough that uploading it feels wrong. WhisperX handles the transcription, pyannote can add speaker labels when needed, and Ollama can summarize the transcript without sending it to a hosted chatbot.</p><p>Do not pretend this is as smooth as Otter.ai. The local path takes setup work, and diarization still needs review.</p><p>For client calls, private strategy sessions, internal investigations, interviews, legal-adjacent work, and any meeting where the transcript should stay under your control, private local meeting transcription is the better default.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/private-local-meeting-transcription-whisper-ollama/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/private-local-meeting-transcription-whisper-ollama/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Why AI writing gets too compressed, and how to fix it]]></title><description><![CDATA[AI-assisted writing can look polished while leaving readers confused. This guide shows how to restore clarity without dulling the prose.]]></description><link>https://www.popularai.org/p/ai-writing-too-compressed</link><guid isPermaLink="false">https://www.popularai.org/p/ai-writing-too-compressed</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Tue, 26 May 2026 10:43:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JjTz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JjTz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JjTz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!JjTz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!JjTz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!JjTz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JjTz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1870332,&quot;alt&quot;:&quot;AI writing is too compressed: how to make it clear again&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/199302887?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="AI writing is too compressed: how to make it clear again" title="AI writing is too compressed: how to make it clear again" srcset="https://substackcdn.com/image/fetch/$s_!JjTz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!JjTz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!JjTz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!JjTz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd25a28b2-605e-4ede-9add-8c76e576a347_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AI writing often sounds sharp while hiding the logic readers need. Here&#8217;s how to spot over-compressed prose and make it clear. &#169; Popular AI</figcaption></figure></div><p>AI writing often fails in a very specific way: the sentence looks sharp, but the meaning is underdeveloped. The model compresses several possible ideas into one tidy word, then leaves the reader to rebuild the missing logic.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/ai-writing-too-compressed?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/ai-writing-too-compressed?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>A sentence like this can sound polished at first glance:</p><blockquote><p>&#8220;People are taught what to repeat, what to pass, and which questions are considered respectable.&#8221;</p></blockquote><p>The weak spot is &#8220;what to pass.&#8221;</p><p>Pass what? A test? A social filter? A bureaucratic checkpoint? An approved answer from one institution to another? The word fits the rhythm of the sentence, but the reader does not get the bridge that makes the phrase meaningful.</p><p>That is the failure mode we&#8217;ll discuss here: <strong>the AI has compressed the reasoning, but the reader only sees the residue.</strong></p><div><hr></div><h4><em><strong>More on AI writing:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;11136a9d-1d89-4aa0-9267-7288e3e76833&quot;,&quot;caption&quot;:&quot;If you use AI to draft client work, newsletter posts, product copy, or articles, punctuation can give the game away before the reader has even decided what they think of the piece. One of the clearest tells is punch-&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to free your AI writing from em dashes and punch-up punctuation&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-04T14:26:00.000Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!-q3f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2356e4-e165-42c4-b2f3-7d8968888193_2560x1544.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/how-to-spot-ai-writing-by-its-em-dashes-and-punch-up-punctuation&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191311826,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Why AI writing gets too compressed</h3><p>AI writing gets too compressed because large language models are good at producing locally plausible language. They are less reliable at checking whether every implied connection is visible to the reader.</p><p>Modern models generate text through token prediction, then post-training pushes them toward more helpful, instruction-following behavior. OpenAI&#8217;s <a href="https://arxiv.org/abs/2303.08774">GPT-4 technical report</a> describes GPT-4 as a Transformer-style model pre-trained to predict the next token, then shaped through post-training for better factuality and adherence to desired behavior. The report also warns that GPT-4 is not fully reliable and has a limited context window.</p><p>That matters for writing. A model can produce a phrase that fits the pattern, tone, and local rhythm of a sentence without checking whether the phrase gives a human reader enough context.</p><p>Reasoning models add another layer. OpenAI says in its <a href="https://developers.openai.com/api/docs/guides/reasoning">reasoning models documentation</a> that these models use hidden reasoning tokens to break down prompts and consider approaches, but those reasoning tokens are not visible through the API. Users may get a final answer or a summary, not the full path that produced the wording.</p><p>So the problem is not always that the model had no reason for using a certain word. The problem is that the reason may remain inside the model&#8217;s latent associations, hidden reasoning, style pattern, or context window. The reader gets the final resulting wording, without any explanation.</p><p>That is why AI prose can sound confident while still feeling strangely thin. The sentence has a polished surface. The reasoning underneath has been squeezed until the reader can barely see or follow it.</p><h3>Why &#8220;what to pass&#8221; feels wrong here</h3><p>The sentence is trying to describe institutional training, social conditioning, or credentialed conformity. In that context, &#8220;pass&#8221; could mean passing a test, moving through a gatekeeping system, passing along an approved answer, passing as respectable, or passing a filter without being rejected.</p><p>The model likely sensed that all of those meanings belonged near the idea. Instead of choosing one, it used the short verb &#8220;pass&#8221; and hoped the surrounding rhythm would carry it.</p><p>A clearer version would make the mechanism visible:</p><blockquote><p>&#8220;They are taught which answers to repeat, which tests to pass, and which questions to avoid if they want to remain respectable.&#8221;</p></blockquote><p>Or, if the intended meaning is institutional filtering:</p><blockquote><p>&#8220;They are taught which answers will pass through institutional filters, and which questions respectable people are expected to avoid.&#8221;</p></blockquote><p>Both rewrites cost more words. But they at least give the reader the omitted object, mechanism, and consequence.</p><p>That is a tradeoff of optimizing for token efficiency. Bad AI writing often sounds concise because it has deleted the connective tissue. Strong editing removes waste. Weak editing removes essential parts that tell the reader what the sentence actually means.</p><h3>The model optimizes rhythm before reader clarity</h3><p>LLMs are excellent at completing patterns. In the example sentence, the repeated structure is powerful:</p><blockquote><p>what to repeat, what to pass, which questions</p></blockquote><p>The rhythm makes the sentence sound deliberate. But parallelism can hide weak logic. A vague word feels acceptable because it occupies the right slot.</p><p>This is why AI prose can look finished while still being unclear. The sentence has surface order. It lacks semantic accountability.</p><p>A human reader does not need every thought explained at maximum length. The reader does need enough information to identify the action, the object, and the reason the sentence belongs in the argument. When the prose withholds those pieces, the sentence starts to feel like words are strung together for no real discernible reason.</p><p>The problem becomes more obvious in sentences with repeated verbs or abstract nouns. To the model, words like &#8220;repeat,&#8221; &#8220;pass,&#8221; and &#8220;ask&#8221; all sound like they belong in the same category of concepts. The cadence tells the reader the sentence has structure. But cadence is not the same as clarity.</p><p>If one word in a parallel sequence does not name a clear action, the whole sentence can become less trustworthy. The reader may not stop and analyze the phrase, but they will feel the gap.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>&#8220;Concise&#8221; prompts can make the model cut the wrong material</h3><p>When users ask for writing to be &#8220;concise,&#8221; &#8220;tight,&#8221; &#8220;punchy,&#8221; or &#8220;more polished,&#8221; the model often removes explanation first. It keeps the cadence and deletes the part that tells readers what the sentence means.</p><p>That is the wrong kind of compression. Good editing removes redundancy. Bad AI editing removes premises.</p><p>A better instruction is:</p><blockquote><p>Make this clearer and stronger. Do not remove context that a reader needs in order to understand the claim.</p></blockquote><p>This instruction works better because it defines the goal as comprehension, not compression. &#8220;Tighten this&#8221; can push the model toward shorter sentences that sound sharper but carry less meaning. &#8220;Make this clearer and stronger&#8221; gives the model permission to add words when the added words explain the mechanism.</p><p>The distinction matters for anyone using AI to edit articles, essays, newsletters, scripts, or opinion pieces. A model can easily improve flow while making the argument less legible. It can smooth the prose so well that the missing logic becomes harder to notice.</p><p>That is one of the more dangerous forms of AI-assisted editing. The output feels better to read at first glance, but the idea has become harder to defend.</p><h3>Vague verbs let the model borrow from overlapping meanings</h3><p>Words like &#8220;pass,&#8221; &#8220;signal,&#8221; &#8220;align,&#8221; &#8220;optimize,&#8221; &#8220;moderate,&#8221; &#8220;support,&#8221; &#8220;address,&#8221; &#8220;handle,&#8221; and &#8220;engage&#8221; are attractive to LLMs because they work in many contexts. That flexibility is exactly why they can become weak.</p><p>A human editor should ask which meaning is intended.</p><p>&#8220;Pass&#8221; might mean &#8220;pass a credentialing test.&#8221; &#8220;Signal&#8221; might mean &#8220;show loyalty to the approved view.&#8221; &#8220;Align&#8221; might mean &#8220;change outputs to match a policy.&#8221; &#8220;Moderate&#8221; might mean &#8220;remove, downrank, block, label, or filter.&#8221;</p><p>The fix is not always to use simpler words. The fix is to use words that name the actual action.</p><p>This is why AI writing often loses force. A sentence can use serious, professional language while avoiding the concrete thing that happened. &#8220;The platform addressed the issue&#8221; sounds responsible. It does not tell the reader whether the platform answered a question, fixed a bug, denied a claim, changed a policy, delayed a decision, or buried the complaint.</p><p>The same problem appears in policy writing, corporate updates, academic prose, and AI-generated analysis. The vague verb carries the emotional tone of authority without carrying the mechanical explanation.</p><p>Readers notice this, even when they cannot immediately name the problem. They feel that the sentence is asking for trust without giving them the evidence needed to grant it.</p><h3>Post-training rewards polished helpfulness</h3><p>Instruction-tuned models are trained to follow user intent better than raw base models. The <a href="https://arxiv.org/abs/2203.02155">InstructGPT paper</a> describes a process using supervised demonstrations and human-ranked outputs to make models better aligned with user intent.</p><p>That can make the output more useful. It can also make prose sound more agreeable, complete, and confident than it deserves to sound. The model learns the shape of a helpful answer: smooth structure, clean phrasing, few hesitations, orderly points.</p><p>For writing, that can become a trap. The model may produce a sentence that looks editorially &#8220;done&#8221; before the thought is actually explained.</p><p>This is especially common when the user asks for a polished rewrite. The model has learned that a helpful answer should appear organized and confident. It will often deliver that appearance quickly. But a smooth answer is not always a clear answer. A complete-looking paragraph can still leave the key mechanism implied.</p><p>That is why AI-assisted editing needs a second pass aimed at meaning. The first pass can improve rhythm. The second pass has to ask whether each sentence still earns its place.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nga3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nga3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!Nga3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!Nga3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!Nga3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nga3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2079524,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/199302887?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nga3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!Nga3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!Nga3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!Nga3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea9a409-f324-4859-9d56-a4b1c9e9258e_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Learn why AI writing gets too compressed, how vague verbs weaken meaning, and how to use semantic decompression to fix it. &#169; Popular AI</figcaption></figure></div><h3>The model assumes shared context the reader does not have</h3><p><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Anthropic&#8217;s context engineering guidance</a> describes a common failure mode where prompts give vague high-level guidance and falsely assume shared context. It also notes that minimal context does not necessarily mean short context. The model still needs enough information to follow the desired behavior.</p><p>This applies to public writing too. A sentence can be short and still require too much private context. The reader should not need to guess which version of the idea the model had in mind.</p><p>Writers often know more than they put on the page. AI models can exaggerate that problem because they generate language from patterns and associations that are not visible to the reader. The model may have a cluster of related meanings around a word, but the final sentence exposes only one compressed token.</p><p>The reader is left to infer the rest.</p><p>That may be fine in a private note. It is not fine in public writing where the goal is persuasion, explanation, or analysis. Readers are more patient when they feel guided. They are less patient when the prose sounds polished but makes them do the author&#8217;s work.</p><h3>The fix: use a semantic decompression pass</h3><p>Do not ask the model only to &#8220;improve&#8221; the writing. That invites polish. Ask it to decompress the meaning.</p><p>Use this prompt after a draft is written:</p><pre><code><code>Review this draft for over-compressed language.

Find phrases where the wording is concise but the reader may not know exactly what is meant.

For each issue:
1. Quote the phrase.
2. Explain what is ambiguous or under-explained.
3. Rewrite it so the actor, action, object, mechanism, and consequence are visible.
4. Preserve the author&#8217;s argument and force. Do not soften the point.

Pay special attention to vague verbs such as pass, signal, align, support, address, handle, engage, optimize, moderate, process, and navigate.
</code></code></pre><p>This works because it gives the model a specific editing target. OpenAI&#8217;s <a href="https://developers.openai.com/api/docs/guides/prompt-engineering">prompt engineering guidance</a> recommends using instructions, examples, context, and clear message structure to guide model behavior. It also describes few-shot examples as a way to steer a model toward the desired pattern.</p><p>The model needs to know that the job is not &#8220;make it sound better.&#8221; The job is &#8220;make the implied logic visible.&#8221;</p><p>That one change can dramatically improve AI-assisted writing. Instead of rewarding the model for polish alone, it rewards the model for exposing the actor, action, object, mechanism, and consequence. Those five pieces are the difference between a sentence that merely sounds persuasive and a sentence that actually carries the reader through the idea.</p><h3>A better prompt for rewriting AI-assisted prose</h3><p>Use this when asking an LLM to edit your article, essay, newsletter, or polemic:</p><pre><code><code>Edit this draft for clarity, force, and reader comprehension.

Do not make the writing bland.
Do not soften strong claims unless they are unsupported.
Do not compress meaning into vague verbs or abstract nouns.

For every sentence, check whether a smart reader can identify:
- who is acting
- what they are doing
- what object or idea the action applies to
- what mechanism makes the claim true
- why the sentence matters

If a phrase depends on context that is not visible to the reader, expand it.

After the rewrite, add a short ambiguity audit listing the phrases you changed because they were too compressed.
</code></code></pre><p>That final &#8220;ambiguity audit&#8221; is important. It forces the model to expose the edits instead of silently smoothing over them.</p><p>The audit also gives the writer a way to regain control. AI rewriting can feel authoritative because it returns a clean version of the prose. But clean prose can hide editorial choices. When the model lists the phrases it changed and explains why, the author can decide whether the edit preserved the intended meaning.</p><p>That is the right relationship between writer and model. The model can flag ambiguity. The writer still decides the argument.</p><h3>Use examples to train the style you want</h3><p>A model will often follow examples better than abstract style commands. OpenAI&#8217;s <a href="https://developers.openai.com/api/docs/guides/prompt-engineering">guide to prompt engineering</a> describes few-shot prompting as including input and output examples so the model can pick up the pattern.</p><p>Give it examples like this:</p><pre><code><code>Bad:
The platform aligned the model.

Better:
The platform changed the model&#8217;s outputs to match its policy rules, so some requests that previously worked now get filtered, refused, or redirected.

Bad:
The system handles controversial content.

Better:
The system classifies controversial content, applies a policy label, and then decides whether to answer, refuse, downrank, or redirect the user.
</code></code></pre><p>This gives the model a concrete pattern: keep the force, add the missing mechanism.</p><p>Examples matter because the model can imitate structure more reliably when the desired pattern is visible. &#8220;Make it clearer&#8221; is useful, but it can still produce generic polish. Showing a bad sentence and a better sentence tells the model what kind of clarity you want.</p><p>The example also protects the tone. Many AI rewrites become bland when they clarify. The goal is to keep the argument sharp while making the hidden logic visible.</p><h3>Watch for danger words that sound clearer than they are</h3><p>Some words are not bad by themselves. They become bad when they replace the actual idea.</p><p>&#8220;Pass&#8221; should trigger a simple question: pass what, through what, by whose standard? Better options include &#8220;pass a test,&#8221; &#8220;pass a filter,&#8221; &#8220;pass inspection,&#8221; &#8220;pass as acceptable,&#8221; or &#8220;pass through a gatekeeping process.&#8221;</p><p>&#8220;Signal&#8221; needs the same pressure. Signal what, to whom, and why? Stronger versions might name loyalty, compliance, status, or ideological belonging.</p><p>&#8220;Align&#8221; should make the editor ask what rule, incentive, metric, policy, or authority controls the change. More precise language might say &#8220;match the policy,&#8221; &#8220;satisfy the benchmark,&#8221; &#8220;obey the moderation rule,&#8221; or &#8220;fit the company&#8217;s risk tolerance.&#8221;</p><p>&#8220;Moderate&#8221; should name the action being taken. The system might remove, filter, downrank, block, label, demonetize, restrict, delay, or refuse.</p><p>&#8220;Support&#8221; needs a concrete mechanism. Does the actor fund, host, promote, enable, document, maintain, integrate, defend, or recommend something?</p><p>&#8220;Address&#8221; is one of the easiest words to abuse. Ask what the actor actually did. Did they answer, deny, fix, explain, postpone, evade, patch, rewrite, or enforce?</p><p>These verbs are common in institutional and AI-generated prose because they sound responsible without committing to a concrete mechanism. Replace them with actions readers can picture.</p><p>The test is simple. If the reader cannot picture the action, the word is probably doing too much.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share Popular AI&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share Popular AI</span></a></p><div><hr></div><h3>Clarity does not mean over-explaining</h3><p>The fix is not to make every sentence long. The fix is to include the missing piece.</p><p>A sentence needs enough information for the reader to understand the claim without guessing. <a href="https://www.stylemanual.gov.au/writing-and-designing-content/clear-language-and-writing-style/plain-language-and-word-choice">Plain-language guidance</a> makes the same point from the reader&#8217;s side: use familiar words, clarify expressions readers may not know, and write so people can understand the message quickly.</p><p>Compare these:</p><p>Too compressed:</p><blockquote><p>The model was aligned.</p></blockquote><p>Over-explained:</p><blockquote><p>The model underwent a complex post-training procedure involving various forms of evaluation and behavioral adjustment in order to better conform with desired rules and expected outputs in a range of situations.</p></blockquote><p>Clear:</p><blockquote><p>The model was changed after training so its answers would follow the platform&#8217;s policy rules more consistently.</p></blockquote><p>The clear version is not longer than it needs to be. It names the action, the purpose, and the control point.</p><p>That is the standard. Clarity does not require stuffing every sentence with background. It requires giving the reader the one missing piece that makes the claim intelligible.</p><p>A strong sentence can still be short. &#8220;The platform filtered the request because it matched a policy rule&#8221; is clear because it names the actor, action, mechanism, and reason. &#8220;The platform handled the request&#8221; is shorter, but it hides the actual event.</p><p>Concise writing is valuable when it removes clutter. It becomes a problem when it removes the bridge between claim and meaning.</p><h3>A workflow for AI power users</h3><p>Use this five-pass workflow when you care about strong writing.</p><p>Start by drafting for argument first. Do not let the model polish too early. Start with the point you actually want to make.</p><p>Prompt:</p><pre><code><code>Help me develop the argument before editing the prose. Identify the thesis, the strongest supporting claims, the weakest claims, and the missing mechanisms.
</code></code></pre><p>Next, rewrite for clarity, not politeness.</p><p>Prompt:</p><pre><code><code>Rewrite this for clarity and force. Keep the argument sharp. Do not make it more neutral unless the claim is unsupported.
</code></code></pre><p>Then run the semantic decompression pass. Use the earlier prompt to find compressed phrases.</p><p>This pass should catch sentences that sound good but leave the reader asking, &#8220;What does that mean?&#8221;</p><p>After that, run the reader objection pass.</p><p>Prompt:</p><pre><code><code>Read this as a skeptical but fair reader. List every sentence where you would ask: what exactly does this mean, who is doing it, how does it work, or why should I believe it?
</code></code></pre><p>Finally, do the human edit.</p><p>This is where the writer earns the piece. AI can find ambiguity, but the author has to choose the intended meaning.</p><p>Do not accept every AI rewrite. Use the model as an editor that flags weak spots, not as an authority that decides the final argument.</p><p>This workflow slows the editing process in a useful way. It separates idea development from surface polish, then separates polish from meaning. That prevents the model from giving you a clean version of an underdeveloped thought.</p><h3>Keep context clean</h3><p>Compressed writing gets worse when the model is carrying too much unrelated context. A long-running chat full of old instructions, old drafts, audience notes, and unrelated strategy can push the model toward invisible assumptions.</p><p>For writing workflows, keep separate chats or projects for research, drafting, editing, and private strategy. Popular AI&#8217;s guide to <a href="https://www.popularai.org/p/context-contamination-why-ai-feels-off-topic">context contamination</a> explains why dumping everything into one AI workspace often makes the output feel subtly off-topic or over-assumptive.</p><p>For sensitive drafts, local models can also help because they give you more control over context boundaries, storage, and workflow separation. They will not magically fix vague prompting, but they make it easier to keep private notes and public drafts apart.</p><p>Clean context helps because writing is highly sensitive to assumptions. If the model is carrying background material that the reader will never see, the prose may start leaning on that hidden material. The model may refer to ideas too briefly, skip needed transitions, or choose vague words because the surrounding chat made the meaning feel obvious.</p><p>The reader does not have that chat. The article has to stand on its own.</p><h3>The simple rule for clearer AI writing</h3><p>Every important sentence should answer at least three questions. What exactly is happening? Who or what is doing it? What mechanism makes it true?</p><p>For sharper analysis, add two more. Who benefits? What changes for the reader?</p><p>When a sentence fails those questions, the model may have produced compressed language. Expand it before publishing.</p><p>This does not mean every sentence needs to become a miniature essay. It means every important claim needs enough visible logic to carry the reader forward. When the writer knows the mechanism but the sentence hides it, the reader has to guess. When the model knows only a vague association, the sentence may sound good while carrying very little meaning.</p><p>Either way, the solution is the same. Name the action. Name the object. Name the mechanism. Give the reader the bridge.</p><h3>Bottom line</h3><p>The awkwardness in AI writing often comes from missing reader-visible reasoning. The model has produced a sentence that fits the style pattern, but it has not carried the reader across the gap.</p><p>Do not ask AI merely to be concise. Ask it to be explicit where the meaning depends on hidden context.</p><p>Concise writing is good when it removes waste. It is bad when it removes the bridge.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/ai-writing-too-compressed/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/ai-writing-too-compressed/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.org/t/start-here">Start here</a> | <a href="https://popularai.org/t/local-ai">Local AI</a> | <a href="https://popularai.org/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.org/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.org/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[GGUF Loader Agentic Mode: local coding agents without cloud accounts]]></title><description><![CDATA[A practical guide to GGUF Loader Agentic Mode, including local setup, safe workspace rules, Claude Code comparisons, and privacy tradeoffs.]]></description><link>https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent</link><guid isPermaLink="false">https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Wed, 20 May 2026 13:31:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6Ic0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Ic0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Ic0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!6Ic0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!6Ic0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!6Ic0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Ic0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eca4adae-46db-4d86-958e-89993baebd13_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1888256,&quot;alt&quot;:&quot;How to use GGUF Loader Agentic Mode safely for local coding&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/198398535?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How to use GGUF Loader Agentic Mode safely for local coding" title="How to use GGUF Loader Agentic Mode safely for local coding" srcset="https://substackcdn.com/image/fetch/$s_!6Ic0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!6Ic0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!6Ic0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!6Ic0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feca4adae-46db-4d86-958e-89993baebd13_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">GGUF Loader Agentic Mode gives local GGUF models file access for coding tasks. Here is what it does well, where it falls short, and how to use it safely. &#169; Popular AI</figcaption></figure></div><p>GGUF Loader Agentic Mode is for developers who want a coding agent that can work on local files without sending a repository through a hosted AI account.</p><p>The feature lets a local model read, create, edit, and organize files inside a selected workspace folder. That makes GGUF Loader more than a chat window. It becomes a small local coding assistant that can touch files directly.</p><p>That is also where caution matters.</p><p>A local file-writing agent can save time on boilerplate, docs, cleanup, and small code edits. It can also make bad changes quickly if you point it at a messy repo with no rollback plan.</p><p>The short version: GGUF Loader Agentic Mode is useful, private, and interesting, but it should be treated as an early local agent workflow rather than a mature replacement for Claude Code.</p><div><hr></div><h4><em><strong>More on GGUF Loader:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e875a3ae-9c4f-49c5-aacc-cd08d8fd2254&quot;,&quot;caption&quot;:&quot;If you want the usefulness of a capable assistant while keeping your workflow local, GGUF Loader is built for that job. It is a cross platform desktop app for Windows, Linux, and macOS that loads GGUF format language models and lets you run them on your own machine, with a simple download, load, chat loop. The project positions itself as pri&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to run GGUF models locally with GGUF Loader&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-02-21T00:20:20.375Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!eM7y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60d5adca-b2d1-4554-ae47-525e803310e8_1312x736.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/run-gguf-models-locally-gguf-loader&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:188431547,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h3>Quick verdict</h3><blockquote><p><strong>Best for:</strong> small local coding tasks, boilerplate generation, README drafts, project cleanup, config edits, and private repo experiments.</p></blockquote><blockquote><p><strong>Skip it if:</strong> you need Claude Code-level reasoning, reliable test execution, deep multi-file refactors, or a mature permission system.</p></blockquote><blockquote><p><strong>Main strength:</strong> GGUF Loader gives local GGUF models file operations through a simple desktop app.</p></blockquote><blockquote><p><strong>Main weakness:</strong> the quality of the agent depends heavily on the model, hardware, prompt discipline, and how safely you define the workspace.</p></blockquote><h3>What is GGUF Loader?</h3><p><a href="https://github.com/GGUFloader/gguf-loader">GGUF Loader</a> is a free, open source desktop app for running GGUF-format large language models locally. The project&#8217;s FAQ describes it as a desktop application for local LLM chat, with support for Windows, Linux, and macOS, and says it is released under the MIT License through the <a href="https://github.com/GGUFloader/gguf-loader/blob/main/docs/faq.md">GGUF Loader FAQ</a>.</p><p>GGUF itself is a local model format used by tools in the llama.cpp ecosystem. Hugging Face has <a href="https://huggingface.co/docs/hub/gguf">GGUF documentation</a> for browsing GGUF files, viewing metadata, and working with quantized local models.</p><p>The important part for users is simple: you download a model file, load it locally, and run inference on your own machine. The GGUF Loader FAQ says that after a model is downloaded, everything runs locally with no internet connection required.</p><p>GGUF Loader also has a <a href="https://huggingface.co/Hussain2050/GGUF-Loader">Hugging Face project listing</a>, which describes it as a local, open source app without cloud integration.</p><h3>What is GGUF Loader Agentic Mode?</h3><p>Agentic Mode was introduced in the <a href="https://github.com/GGUFloader/gguf-loader/discussions/8">GGUFLoader v2.1.1 Agentic Mode discussion</a>. The release notes describe it as a mode that gives the assistant file system access, tool execution, workspace awareness, and the ability to perform coding tasks.</p><p>That means the model can do more than answer questions. It can work inside a selected workspace folder and help with tasks like creating files, updating code, organizing folders, writing documentation, and generating simple project pieces.</p><p>The release discussion gives examples such as building APIs, creating unit tests, refactoring folders, updating configuration files, and generating project files.</p><p>That makes GGUF Loader Agentic Mode a local coding agent. It is not only a local chatbot that talks about code. It can act on files.</p><h3>What GGUF Loader Agentic Mode does well</h3><p>GGUF Loader Agentic Mode has a clear appeal: it brings coding-agent behavior to a local model workflow.</p><p>For small jobs, that is valuable. It can help create boilerplate files, summarize a tiny repo, add a function, update a README, draft config files, or organize a project folder. Those are the kinds of tasks where local models can be useful if the scope is tight.</p><p>It also fits privacy-sensitive workflows. The GGUF Loader FAQ says inference can run locally after the model is downloaded, while the Hugging Face listing says the app has no cloud integration. That is the core reason to care about it.</p><p>The workflow is also approachable. You do not need to build your own llama.cpp stack, wire an editor extension, or expose a local OpenAI-compatible endpoint. You can load a GGUF model in a desktop interface, select a workspace, and begin testing file-aware prompts.</p><p>That simplicity is the product&#8217;s best feature.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h3>Where GGUF Loader falls short</h3><p>The main limitation is model quality.</p><p>A 7B-class local model can be helpful, but it will not behave like a frontier hosted coding model. It may miss instructions, over-edit files, fail to reason across multiple files, or produce plausible code that needs careful review.</p><p>Agentic Mode also appears less mature than established coding-agent tools. Claude Code, Aider, Continue, and other tools have more developed workflows around diffs, permissions, terminals, IDEs, and test loops.</p><p>The v2.1.1 release is exciting, but it should be treated as an early file-agent feature. Use it with tight boundaries, small tasks, and version control.</p><h3>Pricing and plans</h3><p>GGUF Loader itself is free and open source. The FAQ says it is released under the MIT License.</p><p>That does not mean the whole workflow has zero cost. Local inference costs show up in hardware, storage, setup time, and model selection.</p><p>A quantized 7B model can be practical on ordinary hardware, but larger models need more RAM, more disk space, and sometimes GPU acceleration. The FAQ says GPU acceleration is optional and can improve performance, while CPU use is supported.</p><p>Compared with hosted tools, the tradeoff is clear. GGUF Loader can reduce subscription and account dependency, but hosted agents often provide stronger models, better tooling, and more reliable coding behavior.</p><h3>Privacy, data use, and account risk</h3><p>Privacy is the strongest argument for GGUF Loader Agentic Mode.</p><p>The GGUF Loader FAQ says that once a model is downloaded, the app can work locally with no internet connection required. The Hugging Face listing also describes the app as having no cloud integration.</p><p>That matters for local code, prototypes, private scripts, and sensitive notes. A local model workflow means prompts and repository content do not have to be sent to a hosted AI provider for inference.</p><p>That is a major difference from Claude Code-style tools. Anthropic&#8217;s <a href="https://code.claude.com/docs/en/overview">Claude Code overview</a> describes Claude Code as an agentic coding tool that can read a codebase, edit files, run commands, and integrate with developer tools. Anthropic&#8217;s <a href="https://code.claude.com/docs/en/data-usage">Claude Code data usage documentation</a> says Claude Code runs locally on the user&#8217;s machine, but sends user prompts and model outputs over the network to interact with the LLM.</p><p>The data-policy difference is real. Anthropic says consumer Claude users can choose whether data is used to improve future Claude models. For commercial users, Anthropic says it does not train generative models using code or prompts sent to Claude Code under commercial terms unless the customer opts in.</p><p>Anthropic also lists retention periods for Claude Code data based on account type and settings, including 30-day retention for consumer users who do not allow model-improvement use and 5-year retention for users who do.</p><p>GGUF Loader avoids that hosted-model data path when it is run offline. The tradeoff is capability. You keep code local, but you accept the limits of your local model and hardware.</p><h3>Control and lock-in</h3><p>GGUF Loader gives users more control in four ways.</p><p>First, there is no required cloud model account for inference once the model is downloaded.</p><p>Second, model files are portable. GGUF is a widely used local inference format, and Hugging Face supports GGUF browsing and model-file inspection.</p><p>Third, the code is open source under MIT.</p><p>Fourth, Agentic Mode works around a selected workspace folder rather than a vendor-hosted repo environment.</p><p>The remaining lock-in is softer. You still depend on GGUF Loader&#8217;s implementation, release quality, chosen model, and local runner stack. If the app breaks, you can move the model elsewhere, but the exact agent workflow may not carry over cleanly.</p><p>Developers who want to inspect or install from source can use the <a href="https://github.com/GGUFloader/gguf-loader.git">GGUF Loader Git repository</a>, while the main project page remains available through the <a href="https://github.com/GGUFloader/gguf-loader">GGUF Loader GitHub repo</a>.</p><div class="callout-block" data-callout="true"><h3>Scorecard</h3><p><strong>Capability: 3/5.</strong> Strong for local file-aware assistance, small project scaffolds, and controlled edits. Weak for difficult reasoning compared with hosted frontier coding agents.</p><p><strong>Cost-to-capability: 4/5.</strong> The app is free, and 7B-class GGUF models are practical on ordinary machines. The real cost is hardware headroom and setup time.</p><p><strong>Privacy and control: 4/5.</strong> Local inference and workspace-based file access are the main advantages. The score is not higher because users still need to verify the app, model source, and file-write behavior before trusting it with important code.</p><p><strong>Reliability and transparency: 3/5.</strong> The project is open source and documented, but Agentic Mode is young. Treat it as useful local tooling rather than a mature enterprise agent framework.</p><p><strong>Vendor leverage and account risk: 5/5.</strong> There is no hosted model account required for offline local use after model download.</p><p><strong>Ease of use: 4/5.</strong> The Windows executable path is beginner-friendly, and pip and source options exist. The learning curve returns when you tune models, GPU acceleration, or agent workflows.</p></div><h3>How to use GGUF Loader Agentic Mode safely</h3><p>The mistake is pointing an agent at a real repo and asking it to &#8220;fix everything.&#8221;</p><p>Use a staged workflow instead.</p><h3>What you need before starting</h3><p>You need:</p><ol><li><p><strong>GGUF Loader v2.1.1 or newer</strong><br>The v2.1.1 release is the Agentic Mode release, and the discussion lists Windows executable, pip, and source install paths.</p></li><li><p><strong>A GGUF model</strong><br>GGUF Loader recommends Mistral-7B Instruct for Agentic Mode in its release discussion.</p></li><li><p><strong>A test workspace</strong><br>Create a new folder that contains only files you are willing to let an agent read and edit.</p></li><li><p><strong>Git or another rollback method</strong><br>Use version control even for throwaway tests. Local file agents need a clean undo path.</p></li><li><p><strong>No secrets in the workspace</strong><br>Do not include <code>.env</code>, API keys, private credentials, customer data, SSH keys, production configs, or tokens.</p></li></ol><h3>What you will have when finished</h3><p>You will have a local GGUF model running inside GGUF Loader with Agentic Mode enabled for a chosen workspace folder. The agent should be able to inspect local files and perform a small, reversible coding task.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Step 1: Install or launch GGUF Loader</h3><p>On Windows, use the v2.1.1 executable from the release page. The release discussion says to run <code>GGUFLoader_v2.1.1.exe</code>, then use &#8220;More info&#8221; and &#8220;Run anyway&#8221; if Windows shows a security warning for the new app.</p><p>For pip installation, the release discussion gives:</p><pre><code><code>pip install ggufloader==2.1.1
ggufloader
</code></code></pre><p>For source installation, it gives:</p><pre><code><code>git clone https://github.com/GGUFloader/gguf-loader.git
cd gguf-loader
git checkout v2.1.1
python launch.bat
</code></code></pre><p>On Linux or macOS, the release discussion lists:</p><pre><code><code>./launch.sh
</code></code></pre><h3>Step 2: Download a model</h3><p>Start with the recommended model path before experimenting. The v2.1.1 release discussion recommends Mistral-7B Instruct for Agentic Mode and lists it as a 4.23GB download.</p><p>Save the model somewhere easy to find, such as:</p><pre><code><code>C:\AI\models\
</code></code></pre><p>or:</p><pre><code><code>~/AI/models/
</code></code></pre><p>Avoid downloading random model files with no license, provenance, or model card. Model weights carry their own terms, separate from GGUF Loader&#8217;s MIT software license.</p><h3>Step 3: Load the model</h3><p>Open GGUF Loader, choose the model file, and wait for it to load. The project&#8217;s main docs describe the basic flow as clicking &#8220;Load Model,&#8221; choosing a GGUF file, and opening it.</p><p>Test normal chat first:</p><pre><code><code>Reply with one sentence: you are running locally.
</code></code></pre><p>If the model cannot respond reliably in normal chat, do not move to Agentic Mode yet.</p><h3>Step 4: Create a safe workspace</h3><p>Create a new folder:</p><pre><code><code>agent-test-workspace/
</code></code></pre><p>Add a tiny project:</p><pre><code><code>agent-test-workspace/
  README.md
  src/
    calculator.py
</code></code></pre><p>Put this in <code>calculator.py</code>:</p><pre><code><code>def add(a, b):
    return a + b
</code></code></pre><p>Initialize git:</p><pre><code><code>cd agent-test-workspace
git init
git add .
git commit -m "Initial safe workspace"
</code></code></pre><p>This gives you a clean rollback point.</p><h3>Step 5: Enable Agentic Mode</h3><p>The v2.1.1 release discussion says to find &#8220;Agent Mode&#8221; in the left sidebar, check &#8220;Enable Agent Mode,&#8221; select or browse to your workspace folder, and wait for the &#8220;Agent: Ready&#8221; status.</p><p>Select only the test workspace. Do not select your home directory, downloads folder, full company repo folder, or desktop.</p><h3>Step 6: Start with read-only behavior</h3><p>Use this first prompt:</p><pre><code><code>Read the workspace and summarize the files you see. Do not create, edit, move, rename, or delete any files.
</code></code></pre><p>Expected result: the agent describes <code>README.md</code> and <code>src/calculator.py</code>.</p><p>If it tries to write files anyway, stop. That model or setup is not following instructions well enough for file operations.</p><h3>Step 7: Ask for a plan before edits</h3><p>Use this prompt:</p><pre><code><code>Propose a small change that adds a subtract function to src/calculator.py and updates README.md. Do not write files yet. Return a short plan first.
</code></code></pre><p>You want the model to show intent before action. This is the habit that keeps local agents useful instead of chaotic.</p><h3>Step 8: Approve one small edit</h3><p>Use this prompt:</p><pre><code><code>Apply only this change: add a subtract(a, b) function to src/calculator.py. Do not modify any other file.
</code></code></pre><p>Then inspect the diff:</p><pre><code><code>git diff
</code></code></pre><p>If the diff is clean, commit it:</p><pre><code><code>git add src/calculator.py
git commit -m "Add subtract function"
</code></code></pre><p>If the diff is bad, revert it:</p><pre><code><code>git restore src/calculator.py
</code></code></pre><h3>Step 9: Add documentation as a separate edit</h3><p>Use this prompt:</p><pre><code><code>Update README.md with a short usage example for add and subtract. Do not edit source files.
</code></code></pre><p>Inspect again:</p><pre><code><code>git diff
</code></code></pre><p>This is the right rhythm for local agents: small task, inspect, test, commit.</p><h3>Step 10: Test the result</h3><p>Run a simple Python check:</p><pre><code><code>python - &lt;&lt;'PY'
from src.calculator import add, subtract
assert add(2, 3) == 5
assert subtract(5, 3) == 2
print("OK")
PY
</code></code></pre><p>Expected output:</p><pre><code><code>OK
</code></code></pre><p>If the model broke imports or file structure, revert the change and try a smaller prompt.</p><h3>Safe prompts for GGUF Loader Agentic Mode</h3><p>Use these as defaults.</p><h3>First scan</h3><pre><code><code>Read this workspace and summarize its structure. Do not write, rename, move, or delete files.
</code></code></pre><h3>Plan before edits</h3><pre><code><code>Create a plan for the requested change. Do not modify files yet. Include the exact files you would edit.
</code></code></pre><h3>One-file edit</h3><pre><code><code>Edit only [FILE PATH]. Make the smallest change that satisfies this request: [REQUEST]. Do not modify any other file.
</code></code></pre><h3>Diff-aware review</h3><pre><code><code>Review the current changes and explain what changed. Do not make further edits.
</code></code></pre><h3>Documentation draft</h3><pre><code><code>Create or update README.md with setup and usage notes. Do not edit source code.
</code></code></pre><h3>Hard boundary</h3><pre><code><code>Never access files outside the selected workspace. Never read or write secrets, tokens, .env files, SSH keys, production credentials, customer data, or private documents. Ask before writing any file.
</code></code></pre><h3>Common problems and fixes</h3><h3>Problem: The model writes too much</h3><blockquote><p><strong>What it means:</strong> The task is too broad, or the model is weak at instruction following.</p><p><strong>Fix:</strong> Ask for a plan first, then approve one file at a time.</p><div><hr></div></blockquote><h3>Problem: The agent edits the wrong file</h3><blockquote><p><strong>What it means:</strong> The model misunderstood the project structure.</p><p><strong>Fix:</strong> Give the exact file path in the prompt and ask it to restate the target file before editing.</p><div><hr></div></blockquote><h3>Problem: The output is slow</h3><blockquote><p><strong>What it means:</strong> Your model may be too large for your hardware, or you may be running mostly on CPU.</p><p><strong>Fix:</strong> Start with a smaller Q4 model. If using NVIDIA GPU acceleration, verify your CUDA setup before assuming the app is broken.</p><div><hr></div></blockquote><h3>Problem: The model gives code in chat but does not edit files well</h3><blockquote><p><strong>What it means:</strong> Chat ability and file-editing ability are different. <a href="https://aider.chat/docs/llms.html">Aider&#8217;s LLM documentation</a> makes a similar point: weaker models may return code but fail to produce usable code edits.</p><p><strong>Fix:</strong> Use a stronger instruction-tuned coding model, or limit GGUF Loader to documentation and boilerplate tasks.</p><div><hr></div></blockquote><h3>Problem: You do not trust the agent with the repo</h3><blockquote><p><strong>What it means:</strong> Your instinct is healthy.</p><p><strong>Fix:</strong> Copy only the target files into a separate workspace. Use the existing Popular AI guide on <a href="https://www.popularai.org/p/how-to-run-gguf-models-locally-with">running GGUF models locally with GGUF Loader</a> as the broader setup reference.</p><div><hr></div></blockquote><h3>GGUF Loader Agentic Mode vs Claude Code</h3><p>Claude Code is much more mature as a coding agent. Anthropic describes it as an agentic coding tool that can read a codebase, edit files, run commands, and integrate with developer tools.</p><p>It also has a more developed permission model. The <a href="https://code.claude.com/docs/en/permission-modes">Claude Code permission mode documentation</a> says default mode reviews actions as they come, while looser modes allow more uninterrupted work. The <a href="https://code.claude.com/docs/en/how-claude-code-works">Claude Code workflow documentation</a> also describes checkpoints that snapshot file contents before edits, with limits for external side effects.</p><p>GGUF Loader&#8217;s advantage is different. It is local, simpler, and account-free after model download. The FAQ says offline use works once the model is downloaded.</p><p>Use Claude Code when you need stronger reasoning, command execution, test loops, and a polished agent workflow.</p><p>Use GGUF Loader Agentic Mode when the code is private, the task is small, and keeping files off hosted model infrastructure matters more than maximum reasoning quality.</p><p>A practical hybrid is also possible. Use GGUF Loader for local repo summaries, documentation drafts, and safe boilerplate. Use a hosted agent only for the parts where stronger reasoning justifies the data exposure.</p><h3>Best alternatives to GGUF Loader Agentic Mode</h3><h3>Aider</h3><p>Aider is better if you want a terminal coding assistant that works inside a git repo. Its docs say it can work with local models through Ollama and with local models that expose an OpenAI-compatible API.</p><p>Use Aider if you are comfortable in the terminal and want a coding-agent workflow built around git.</p><p>Skip it if you want a simple GUI and direct GGUF loading.</p><div><hr></div><h3>Continue with Ollama</h3><p>Continue is better if you want an IDE assistant. Its <a href="https://docs.continue.dev/guides/ollama-guide">Ollama guide</a> covers local AI development and lists macOS, Linux, Windows, 8GB minimum RAM, 16GB recommended RAM, and 10GB free storage as prerequisites. Continue&#8217;s <a href="https://docs.continue.dev/customize/models">model documentation</a> warns that local models can be challenging for agent mode because of limited tool calling and reasoning.</p><p>Use Continue if you want local chat and coding help inside VS Code or JetBrains.</p><p>Skip it if your main goal is a simple local file agent with a desktop GUI.</p><div><hr></div><h3>Claude Code</h3><p>Claude Code is better if you want the most capable coding-agent experience and can accept account dependency, cloud model calls, and the data terms attached to your plan. Anthropic&#8217;s docs say Claude Code sends prompts and model outputs over the network to interact with the LLM.</p><p>Use Claude Code for difficult refactors, test-driven work, and serious multi-file engineering.</p><p>Skip it for highly sensitive code unless your organization has the right commercial terms and data controls.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FE_S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FE_S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!FE_S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!FE_S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!FE_S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FE_S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2452785,&quot;alt&quot;:&quot;GGUF Loader Agentic Mode lets AI edit local files offline&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/198398535?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="GGUF Loader Agentic Mode lets AI edit local files offline" title="GGUF Loader Agentic Mode lets AI edit local files offline" srcset="https://substackcdn.com/image/fetch/$s_!FE_S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!FE_S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!FE_S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!FE_S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba85191a-e114-4042-aa97-6c5f70964ac6_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Want a local coding agent without a cloud AI account? GGUF Loader Agentic Mode can edit files in a workspace, but it needs strict guardrails. &#169; Popular AI</figcaption></figure></div><h3>Who should use GGUF Loader Agentic Mode?</h3><p>Use GGUF Loader Agentic Mode if you want a local coding assistant without a hosted model account, especially for private notes, scripts, prototypes, and small repositories. It is also a good fit if you prefer a GUI over terminal-first tools and are willing to inspect diffs before committing anything.</p><p>It is less suitable if you expect frontier-model coding quality, automated test execution, shell workflows, full IDE integration, or reliable multi-file refactors. It also demands comfort with local models, since performance depends on model size, quantization, RAM, and CPU or GPU acceleration.</p><p>The best user is practical and cautious. They want the privacy of a local model, but they also understand that local file agents need firm boundaries.</p><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>GGUF Loader Agentic Mode is a promising local agent for controlled file operations. Its best role is giving users a private, account-free agent for small coding tasks, repo cleanup, documentation, boilerplate, and local workflow experiments.</p><p>Use it with a strict workspace, no secrets, git commits after every good change, and prompts that force planning before writing.</p><p>Local file access is powerful. Treat it like a tool with teeth.</p></div><h3>FAQ</h3><h3>Is GGUF Loader Agentic Mode fully local?</h3><blockquote><p>Yes. GGUF Loader&#8217;s docs say that after a model is downloaded, everything runs locally with no internet connection required. Its Hugging Face listing also says there is no cloud integration.</p><div><hr></div></blockquote><h3>Can GGUF Loader Agentic Mode read and write files?</h3><blockquote><p>Yes. The v2.1.1 release discussion says Agentic Mode includes file system access and tool execution, with examples that create files, refactor folders, generate APIs, create unit tests, and update config files.</p><div><hr></div></blockquote><h3>Does Agentic Mode need a workspace folder?</h3><blockquote><p>Yes. The release discussion says users select or browse to a workspace folder when enabling Agent Mode. The Hugging Face listing also says file operations require explicit workspace folder selection.</p><div><hr></div></blockquote><h3>Is GGUF Loader better than Claude Code?</h3><blockquote><p>For raw coding-agent capability, Claude Code is more mature, has stronger hosted-model access, and has detailed permissions and checkpointing. GGUF Loader is better when local processing, account independence, and simple file operations matter more.</p><div><hr></div></blockquote><h3>What model should I use with GGUF Loader Agentic Mode?</h3><blockquote><p>The v2.1.1 release discussion recommends Mistral-7B Instruct for Agentic Mode. Treat that as the starter model, then test stronger coding-tuned GGUF models if your hardware can handle them.</p><div><hr></div></blockquote><h3>Is GGUF Loader free?</h3><blockquote><p>Yes. The project FAQ says GGUF Loader is free and open source under the MIT License.</p><div><hr></div></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/gguf-loader-agentic-mode-local-coding-agent/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.substack.com/t/start-here">Start here</a> | <a href="https://popularai.substack.com/t/local-ai">Local AI</a> | <a href="https://popularai.substack.com/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.substack.com/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.substack.com/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item><item><title><![CDATA[How to build a local AI PC under $1,000 in 2026]]></title><description><![CDATA[Before buying an &#8220;AI PC,&#8221; learn why VRAM, CUDA support, RAM, airflow, and used GPU pricing matter more for local LLMs.]]></description><link>https://www.popularai.org/p/best-budget-local-llm-pc-under-1000-2026</link><guid isPermaLink="false">https://www.popularai.org/p/best-budget-local-llm-pc-under-1000-2026</guid><dc:creator><![CDATA[Popular AI]]></dc:creator><pubDate>Tue, 19 May 2026 13:35:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1CQp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1CQp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1CQp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!1CQp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!1CQp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!1CQp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1CQp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2505929,&quot;alt&quot;:&quot;Best budget local LLM PC under $1,000 in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/198385359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best budget local LLM PC under $1,000 in 2026" title="Best budget local LLM PC under $1,000 in 2026" srcset="https://substackcdn.com/image/fetch/$s_!1CQp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!1CQp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!1CQp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!1CQp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6abea302-ae1f-49bf-81f6-0589609b42b9_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Build a budget local LLM PC under $1,000 in 2026 with used RTX 3090, 12GB GPU, and CPU-only fallback paths. &#169; Popular AI</figcaption></figure></div><p>The best budget local LLM PC under $1,000 in 2026 starts with one boring rule: buy as much NVIDIA VRAM as the budget can handle, then keep everything else practical.</p><p>That means this is a VRAM-first PC build, not a sticker-first &#8220;AI PC&#8221; build. If the goal is to run useful local models in Ollama, LM Studio, <code>llama.cpp</code>, Open WebUI, private document Q&amp;A, or coding-agent workflows, the graphics card matters more than RGB, premium motherboards, huge CPUs, or marketing around NPUs.</p><p>The difficult part here will be price. In 2026, used GPUs, RAM, and SSDs can swing enough to ruin a perfect parts list in a week. A used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> can still be the best value for local AI because it gives you 24GB of VRAM, but it only works as a sub-$1,000 build if the rest of the machine stays disciplined.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/best-budget-local-llm-pc-under-1000-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/best-budget-local-llm-pc-under-1000-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h4>Quick verdict</h4><blockquote><p>The best overall path under $1,000 is a used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> build, especially if you can reuse parts or buy a used CPU, motherboard, and RAM bundle. NVIDIA&#8217;s official <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/">RTX 3090 specifications</a> list 24GB of GDDR6X on a 384-bit memory interface, and MSI&#8217;s <a href="https://www.msi.com/Graphics-card/GeForce-RTX-3090-VENTUS-3X-24G-OC/Specification">RTX 3090 Ventus specification page</a> lists 350W power consumption with a 750W recommended PSU.</p></blockquote><blockquote><p>The safest all-new-ish budget path is an <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a> build. It is slower and more limited, but it keeps the total PC price realistic. NVIDIA&#8217;s <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3060-3060ti/">RTX 3060 family specifications</a> list the <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060</a> with a 12GB GDDR6 configuration, and MSI&#8217;s <a href="https://storage-asset.msi.com/datasheet/vga/global/GeForce-RTX-3060-GAMING-X-12G.pdf">RTX 3060 Gaming X 12G datasheet</a> lists 170W power consumption and a 550W recommended PSU.</p></blockquote><blockquote><p>The best stretch-budget path is an <a href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20">RTX 5060 Ti 16GB</a> or a heavily discounted <a href="https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20">RTX 4060 Ti 16GB</a>. NVIDIA announced the <a href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20">RTX 5060 Ti 16GB</a> at a $429 starting price in April 2025 in its <a href="https://nvidianews.nvidia.com/news/nvidia-blackwell-geforce-rtx-arrives-for-every-gamer-starting-at-299">Blackwell GeForce launch announcement</a>, and NVIDIA&#8217;s current <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5060-family/">RTX 5060 family page</a> lists the <a href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20">RTX 5060 Ti</a> with 16GB and 8GB GDDR7 options. The older <a href="https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4060-4060ti/">RTX 4060 Ti page</a> still matters because the 16GB version can be a sensible local AI card when the price is right.</p></blockquote><blockquote><p>The best CPU-only fallback is a used office tower with 64GB RAM, but only when speed is secondary. CPU-only local LLMs can work for private notes, slow chat, testing small quantized models, and learning. They are a poor fit for anyone expecting a snappy coding assistant.</p></blockquote><blockquote><p>The thing to skip is an 8GB &#8220;AI PC&#8221; sold as a local LLM workstation. Microsoft&#8217;s <a href="https://www.microsoft.com/en-us/windows/windows-11-specifications">Windows 11 specifications</a> say Copilot+ PCs require a 40+ TOPS NPU, 16GB RAM, and 256GB storage. Those requirements matter for Windows AI features, but they do not replace the GPU VRAM needed for larger local LLMs.</p></blockquote><h3>Who this guide is for</h3><p>This guide is for readers who want a real local LLM PC for Ollama or LM Studio chat, private document Q&amp;A, local coding models, Open WebUI, <code>llama.cpp</code> experiments, small local agents, occasional ComfyUI work, or a fallback when hosted AI tools become expensive, restricted, slow, or unreliable.</p><p>It is also for anyone who wants a local AI desktop that can be built, maintained, and upgraded like a normal PC. The target is a practical machine, not a workstation fantasy build.</p><p>This is not the right guide for training frontier models, running huge 70B models smoothly on a single cheap GPU, or building a silent living-room PC around a 350W used card. A sub-$1,000 desktop can be useful, private, and flexible, but it will not replace the best hosted models for every task.</p><p>For broader context, Popular AI has a related guide on <a href="https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026">why your first budget local AI PC should still start with a used RTX 3090</a> and a separate guide explaining <a href="https://www.popularai.org/p/why-ollama-and-llama-cpp-crawl-when-models-spill-into-ram-and-how-to-fix-it">why Ollama and llama.cpp slow down when models spill into RAM</a>.</p><h3>Why VRAM decides the build</h3><p>For local LLMs, VRAM decides what fits. If the model, context, and cache fit in GPU memory, the experience can feel responsive. If too much work spills into system RAM, replies slow down hard.</p><p>That is why a used 24GB <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> can beat a newer 8GB GPU for local AI. The 3090 is older, hotter, louder, and riskier on the used market, but 24GB of VRAM is the feature that changes what the machine can actually do.</p><p>A 12GB GPU can run useful smaller quantized models. A 16GB GPU gives more room and usually comes with better power behavior. A 24GB GPU gives the most breathing room in this budget range, especially for larger quantized chat models, coding models, longer context, and local tools running at the same time.</p><p>The simplest buying rule is this: fit comes before speed. A model that fits comfortably on a slower card often feels better than a larger model forced into painful offloading.</p><h3>CUDA support still keeps NVIDIA in front</h3><p>NVIDIA remains the easiest default recommendation for most budget local LLM PC builds because so many tools, tutorials, and troubleshooting paths assume CUDA.</p><p>Ollama says its <a href="https://docs.ollama.com/gpu">GPU support</a> includes NVIDIA GPUs with compute capability 5.0 or newer and driver version 531 or newer. The <code>llama.cpp</code> <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md">build documentation</a> documents several acceleration backends, including CUDA, HIP, Vulkan, Metal, OpenCL, and more.</p><p>AMD and Intel GPUs can work in some local AI setups, especially for experiments and specific workflows. For a first budget local LLM desktop, NVIDIA is still the safer recommendation because there are fewer software surprises.</p><h3>System RAM still matters</h3><p>A good local LLM desktop should ideally have 64GB of system RAM. That gives the operating system, browser tabs, model files, vector databases, coding tools, Docker containers, and CPU fallback room to breathe.</p><p>The problem is 2026 RAM pricing. The Verge reported on <a href="https://www.theverge.com/news/850376/framework-ram-memory-ddr5-price-hikes">Framework&#8217;s memory price hikes</a> amid broader memory shortage pressure, which is exactly the kind of market weirdness that can blow up a budget PC build.</p><p>That changes the advice. If 64GB RAM is wildly expensive when you buy, start with 32GB and leave two slots open. Used DDR4 can also make sense if you are building on AM4 or an older Intel platform from a reputable seller.</p><p>Do not sacrifice the GPU budget just to force overpriced new RAM into the build. VRAM is still the part that determines which models feel usable.</p><h3>Keep the CPU boring</h3><p>For a GPU-based local LLM build, a <a href="https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20">Ryzen 5 5600</a>, <a href="https://www.amazon.com/AMD-Ryzen-3600-12-Thread-Processor/dp/B07STGGQ18?tag=popularai-20">Ryzen 5 3600</a>, <a href="https://www.amazon.com/INTEL-i5-12400F-2-5GHz-6xxChipset-BX8071512400F/dp/B09NPJRDGD?tag=popularai-20">Intel i5-12400F</a>, or similar 6-core chip is enough for most people. AMD lists the <a href="https://www.amd.com/en/support/downloads/drivers.html/processors/ryzen/ryzen-5000-series/amd-ryzen-5-5600.html">Ryzen 5 5600</a> as a 6-core, 12-thread, 65W AM4 processor.</p><p>That is the right kind of CPU for this budget. Spend on VRAM before spending on extra CPU cores. A bigger processor will not fix a GPU with too little memory.</p><h3>Power and airflow are part of the build</h3><p>An <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> is a serious power and cooling part. MSI&#8217;s <a href="https://www.amazon.com/s?k=RTX+3090+Ventus+24G&amp;tag=popularai-20">RTX 3090 Ventus</a> spec lists 350W power consumption and a 750W recommended PSU. Treat that 750W recommendation as the floor for a used-card build, not as a luxury target.</p><p>For an <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> build, use a quality 750W PSU at minimum. An 850W unit is better if the price difference is small. Avoid cheap, old, unknown, or mystery-brand power supplies. The PSU is the last place to gamble when the GPU is the most expensive part of the machine.</p><p>Case airflow matters too. Many used 3090 cards are long, thick, and hot. Check card length, card thickness, front fan clearance, and power connector space before buying a case.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Popular AI is reader-supported. To receive new posts and support our work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>How these builds were chosen</h3><p>These builds are selected for local AI usefulness, not gaming benchmark glory. The goal is the best mix of VRAM per dollar, software support, current used-market reality, power and cooling sanity, upgrade path, and risk management.</p><p>A GPU can look strong in games and still be a weak local LLM choice if it has too little VRAM. That is why 8GB cards fall down the list, even when they are newer.</p><p>Pricing is also treated as a moving target. As of late April 2026, used GPUs, RAM, and SSDs remain volatile enough that exact shopping carts age quickly. The recommendations below use price bands instead of pretending every reader will see the same checkout total.</p><div><hr></div><p><em>Disclosure: This post includes Amazon affiliate links. If you buy through them, Popular AI may earn a small commission at no extra cost to you.</em></p><div><hr></div><h3>Build path 1: Used RTX 3090 local LLM PC under $1,000</h3><p>This is the best path if the goal is maximum local LLM usefulness for the money.</p><h4>Target parts</h4><p><strong>GPU:</strong> Used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> 24GB</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0zWF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0zWF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0zWF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0zWF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0zWF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg" width="614" height="277.54459459459457" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:669,&quot;width&quot;:1480,&quot;resizeWidth&quot;:614,&quot;bytes&quot;:92090,&quot;alt&quot;:&quot;A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding." title="A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding." srcset="https://substackcdn.com/image/fetch/$s_!0zWF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0zWF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0zWF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0zWF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7836cf47-6019-4521-8b86-849cf4770c17_1480x669.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Target price: $600 to $750 if possible. Walk away if the card is close to $900 unless the rest of the system is unusually cheap.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find used RTX 3090 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20"><span>Find used RTX 3090 deals on Amazon</span></a></p><div><hr></div><p><strong>CPU:</strong> <a href="https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20">Ryzen 5 5600</a>, <a href="https://www.amazon.com/AMD-Ryzen-3600-12-Thread-Processor/dp/B07STGGQ18?tag=popularai-20">Ryzen 5 3600</a>, <a href="https://www.amazon.com/INTEL-i5-12400F-2-5GHz-6xxChipset-BX8071512400F/dp/B09NPJRDGD?tag=popularai-20">Intel i5-12400F</a>, or similar</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u5gL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg 424w, https://substackcdn.com/image/fetch/$s_!u5gL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg 848w, https://substackcdn.com/image/fetch/$s_!u5gL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!u5gL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u5gL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg" width="411" height="440.49305555555554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:926,&quot;width&quot;:864,&quot;resizeWidth&quot;:411,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The RTX 3090, RTX 3060 12GB, and modern 16GB cards define the real sub-$1,000 local AI tradeoff: more VRAM, lower cost, or less used-hardware risk.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The RTX 3090, RTX 3060 12GB, and modern 16GB cards define the real sub-$1,000 local AI tradeoff: more VRAM, lower cost, or less used-hardware risk." title="The RTX 3090, RTX 3060 12GB, and modern 16GB cards define the real sub-$1,000 local AI tradeoff: more VRAM, lower cost, or less used-hardware risk." srcset="https://substackcdn.com/image/fetch/$s_!u5gL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg 424w, https://substackcdn.com/image/fetch/$s_!u5gL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg 848w, https://substackcdn.com/image/fetch/$s_!u5gL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!u5gL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71698676-27d3-4e6a-b278-2130365e6eb0_864x926.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Target price: $80 to $150, depending on used versus new.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find Ryzen 5 5600 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20"><span>Find Ryzen 5 5600 deals on Amazon</span></a></p><div><hr></div><p><strong>Motherboard:</strong> Used <a href="https://www.amazon.com/s?k=B450+motherboard&amp;tag=popularai-20">B450</a>, <a href="https://www.amazon.com/s?k=B550+motherboard&amp;tag=popularai-20">B550</a>, or <a href="https://www.amazon.com/s?k=LGA+1700+motherboard&amp;tag=popularai-20">LGA 1700</a> board</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=B550+motherboard&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AbGv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AbGv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AbGv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AbGv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AbGv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg" width="467" height="563.5559131134353" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1500,&quot;width&quot;:1243,&quot;resizeWidth&quot;:467,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;For local LLMs in 2026, the smartest budget build starts with boring CPUs, enough RAM, and the most NVIDIA VRAM you can afford.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.amazon.com/s?k=B550+motherboard&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="For local LLMs in 2026, the smartest budget build starts with boring CPUs, enough RAM, and the most NVIDIA VRAM you can afford." title="For local LLMs in 2026, the smartest budget build starts with boring CPUs, enough RAM, and the most NVIDIA VRAM you can afford." srcset="https://substackcdn.com/image/fetch/$s_!AbGv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AbGv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AbGv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AbGv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda620b26-b3f4-4e5b-a3a3-d7ce97f781ce_1243x1500.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Target price: $60 to $110.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=B550+motherboard&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find used B550 motherboards on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=B550+motherboard&amp;tag=popularai-20"><span>Find used B550 motherboards on Amazon</span></a></p><div><hr></div><p><strong>RAM:</strong> 32GB minimum, 64GB preferred<br>Target price: highly variable. Buy used if new 64GB pricing is unreasonable.</p><div><hr></div><p><strong>Storage:</strong> 1TB NVMe SSD<br>Target price: $80 to $150, depending on the current SSD market.</p><div><hr></div><p><strong>PSU:</strong> Quality 750W minimum, 850W preferred<br>Target price: $80 to $130.</p><div><hr></div><p><strong>Case:</strong> Airflow case that fits the card<br>Target price: $50 to $80.</p><div><hr></div><h4><em><strong>Consult our detailed build guide around the RTX 3090:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6650fce2-d97f-44af-96c1-dac4508bb3bf&quot;,&quot;caption&quot;:&quot;The best first local LLM PC build in 2026 is still refreshingly simple: buy a used RTX 3090 with 24GB of VRAM, pair it with 64GB of system RAM, and run the machine on one clean Linux install.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best budget local AI PC in 2026 starts with a used RTX 3090&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-23T18:41:00.962Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mNHY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a94f7b-e1fc-49d6-8df5-2afc01d93a4d_2400x1437.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/the-best-budget-local-llm-pc-in-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:191894407,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h4>Realistic total</h4><p>A clean <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> build under $1,000 usually requires at least one of these moves:</p><ul><li><p>Reuse an old case, SSD, motherboard, or RAM.<br></p></li><li><p>Buy a used CPU, motherboard, and RAM bundle.<br></p></li><li><p>Start with 32GB RAM and upgrade later.<br></p></li><li><p>Find an <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> near the lower end of the used price band.<br></p></li></ul><p>If every part is new except the GPU, the <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> path can drift above $1,000 because the card, PSU, and RAM eat the budget fast.</p><h4>What it runs well</h4><p>This is the best budget path for larger quantized local chat models than a 12GB GPU can comfortably handle, coding models with longer context, Open WebUI and Ollama workflows, private document search, local agents that need more breathing room, and ComfyUI experiments where 24GB VRAM helps.</p><h4>What it does badly</h4><p>It uses a lot of power. It can be loud. Used cards may have mining history, worn fans, old thermal pads, or memory-temperature issues. Some models will still be too large. The total build cost can break $1,000 if RAM prices are bad.</p><h4>Used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> buying rules</h4><ul><li><p>Ask for a timestamped photo or video of the card running.<br></p></li><li><p>Ask for <code>GPU-Z</code>, <code>nvidia-smi</code>, or another load-test screenshot.<br></p></li><li><p>Avoid &#8220;for parts,&#8221; &#8220;untested,&#8221; &#8220;no returns,&#8221; and suspiciously cheap listings.<br></p></li><li><p>Prefer sellers with real history.<br></p></li><li><p>Avoid blower-style server pull cards unless you understand noise and cooling.<br></p></li><li><p>Check card length before buying a case.<br></p></li><li><p>Budget for thermal pads or fan replacement if the price is unusually good.<br></p></li></ul><div class="callout-block" data-callout="true"><h4>Verdict</h4><p>Use the <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> path if you want the most useful local LLM PC near $1,000 and you are comfortable buying used hardware. Skip it if you need a quiet, low-power, warranty-backed machine.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find used RTX 3090 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20"><span>Find used RTX 3090 deals on Amazon</span></a></p></div><h3>Build path 2: RTX 3060 12GB local LLM PC under $1,000</h3><p>This is the safer budget build for people who want a working local LLM PC without putting the whole budget into a used 350W GPU.</p><h4>Target parts</h4><p><strong>GPU:</strong> <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cTwt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cTwt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cTwt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cTwt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cTwt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg" width="576" height="257.664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:671,&quot;width&quot;:1500,&quot;resizeWidth&quot;:576,&quot;bytes&quot;:150110,&quot;alt&quot;:&quot;A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding." title="A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding." srcset="https://substackcdn.com/image/fetch/$s_!cTwt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cTwt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cTwt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cTwt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe74f6651-65a9-42c7-ba85-2f619752262f_1500x671.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Target price: $150 to $250 used, or around $300 to $400 if new pricing is bad. Tom&#8217;s Hardware reported in April 2026 that <a href="https://www.tomshardware.com/pc-components/gpus/nvidia-rtx-3060-comeback-in-2026-could-alleviate-soaring-gpu-prices-and-memory-shortages-rumored-rtx-5050-9gb-abruptly-shelved-amid-speculation">RTX 3060 12GB cards were readily available</a> for $350 to $400 on Amazon and as low as $150 to $200 on second-hand marketplaces.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find used RTX 3060 12GB deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20"><span>Find used RTX 3060 12GB deals on Amazon</span></a></p><div><hr></div><p><strong>CPU:</strong> <a href="https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20">Ryzen 5 5600</a> or <a href="https://www.amazon.com/INTEL-i5-12400F-2-5GHz-6xxChipset-BX8071512400F/dp/B09NPJRDGD?tag=popularai-20">Intel i5-12400F</a> class CPU.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C6kR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C6kR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C6kR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C6kR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C6kR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C6kR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg" width="419" height="449.0671296296296" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:926,&quot;width&quot;:864,&quot;resizeWidth&quot;:419,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The RTX 3090, RTX 3060 12GB, and modern 16GB cards define the real sub-$1,000 local AI tradeoff: more VRAM, lower cost, or less used-hardware risk.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The RTX 3090, RTX 3060 12GB, and modern 16GB cards define the real sub-$1,000 local AI tradeoff: more VRAM, lower cost, or less used-hardware risk." title="The RTX 3090, RTX 3060 12GB, and modern 16GB cards define the real sub-$1,000 local AI tradeoff: more VRAM, lower cost, or less used-hardware risk." srcset="https://substackcdn.com/image/fetch/$s_!C6kR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C6kR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C6kR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C6kR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ff0748-0673-46e2-8ee6-ba3b1c6ab000_864x926.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20&quot;,&quot;text&quot;:&quot;Find Ryzeb 5 5600 deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20"><span>Find Ryzeb 5 5600 deals on Amazon</span></a></p><div><hr></div><p><strong>RAM:</strong> 32GB minimum, 64GB if the price is sane.</p><div><hr></div><p><strong>Storage:</strong> 1TB NVMe SSD.</p><div><hr></div><p><strong>PSU:</strong> 550W to 650W quality unit. MSI&#8217;s <a href="https://www.amazon.com/s?k=RTX+3060+Gaming+X+12G&amp;tag=popularai-20">RTX 3060 Gaming X 12G</a> datasheet lists 170W power consumption and a 550W recommended PSU.</p><div><hr></div><h4><em><strong>More on building an AI PC around the RTX 3060:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;4bdb17bc-71af-4a85-b954-c452cecfa002&quot;,&quot;caption&quot;:&quot;The RTX 3060 12GB refuses to fade away for local AI work because VRAM still decides what kind of ComfyUI workflow you can run before raw speed becomes the real bottleneck. NVIDIA&#8217;s own specs s&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Is the RTX 3060 12GB still worth buying for ComfyUI in 2026?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-17T14:08:16.466Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JN-k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b0dd23-9fd9-47da-b58e-96f076595aa5_2110x1187.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/rtx-3060-comfyui-performance-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:193975583,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h4>Realistic total</h4><p>This build can land around $650 to $900 depending on used parts, RAM pricing, SSD pricing, and whether the <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060</a> is bought used or new.</p><h4>What it runs well</h4><p>An <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a> build is good for 7B and 8B class quantized models, light coding assistants, private chat, basic document Q&amp;A, local AI learning, and some 13B class models with compromises.</p><h4>What it does badly</h4><p>This is not a run-everything card. Long context can become the problem. Larger coding models may feel constrained. ComfyUI workflows will hit VRAM limits sooner. You may outgrow it if local AI becomes part of daily work.</p><div class="callout-block" data-callout="true"><h4>Verdict</h4><p>Use the <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a> path if you want the cheapest sane local LLM PC. Skip it if you already know you want larger models, longer context, or more serious coding-agent work.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find used RTX 3060 12GB deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20"><span>Find used RTX 3060 12GB deals on Amazon</span></a></p></div><h3>Build path 3: RTX 5060 Ti 16GB or RTX 4060 Ti 16GB stretch build</h3><p>This is the modern 16GB path. It can work under $1,000, but only if the card price is disciplined and the rest of the build stays lean.</p><h4>Why 16GB is attractive</h4><p>A 16GB GPU sits between the cheap 12GB path and the used 24GB <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> path. It gives you more room than the <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a> without the heat, power draw, and used-card risk of the <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a>.</p><p>The <a href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20">RTX 5060 Ti</a> family includes 16GB and 8GB versions, and NVIDIA&#8217;s specifications list 4,608 CUDA cores on the <a href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20">RTX 5060 Ti</a>. The older <a href="https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20">RTX 4060 Ti 16GB</a> remains relevant when discounted, especially because NVIDIA lists it with 16GB or 8GB GDDR6 and a 128-bit memory interface.</p><h4>What to buy</h4><ul><li><p>Buy the <a href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20">RTX 5060 Ti 16GB</a> if it is close to the manufacturer&#8217;s suggested retail price.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JlgL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png 424w, https://substackcdn.com/image/fetch/$s_!JlgL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png 848w, https://substackcdn.com/image/fetch/$s_!JlgL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png 1272w, https://substackcdn.com/image/fetch/$s_!JlgL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JlgL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png" width="592" height="255.59044368600684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:506,&quot;width&quot;:1172,&quot;resizeWidth&quot;:592,&quot;bytes&quot;:526518,&quot;alt&quot;:&quot;For local LLMs in 2026, the smartest budget build starts with boring CPUs, enough RAM, and the most NVIDIA VRAM you can afford.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/198385359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="For local LLMs in 2026, the smartest budget build starts with boring CPUs, enough RAM, and the most NVIDIA VRAM you can afford." title="For local LLMs in 2026, the smartest budget build starts with boring CPUs, enough RAM, and the most NVIDIA VRAM you can afford." srcset="https://substackcdn.com/image/fetch/$s_!JlgL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png 424w, https://substackcdn.com/image/fetch/$s_!JlgL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png 848w, https://substackcdn.com/image/fetch/$s_!JlgL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png 1272w, https://substackcdn.com/image/fetch/$s_!JlgL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F542249d1-d484-4756-8c85-dbab6e6b8b08_1172x506.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 5060 Ti 16GB deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20"><span>Find RTX 5060 Ti 16GB deals on Amazon</span></a></p><div><hr></div></li><li><p>Buy the <a href="https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20">RTX 4060 Ti 16GB</a> only if it is meaningfully discounted.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yhjx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yhjx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yhjx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yhjx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yhjx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg" width="1500" height="584" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:584,&quot;width&quot;:1500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:176724,&quot;alt&quot;:&quot;A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding." title="A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding." srcset="https://substackcdn.com/image/fetch/$s_!yhjx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yhjx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yhjx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yhjx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fec7376-7b15-41b7-8fed-839d193a81c1_1500x584.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20&quot;,&quot;text&quot;:&quot;Find RTX 4060 Ti 16GB deals on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20"><span>Find RTX 4060 Ti 16GB deals on Amazon</span></a></p><div><hr></div></li><li><p>Avoid the 8GB version for a local LLM desktop unless the budget is extremely tight and you accept the ceiling.</p></li></ul><div><hr></div><h4><em><strong>More on budget GPUs for local AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;66082497-11f9-48c9-9268-4f663d01e012&quot;,&quot;caption&quot;:&quot;For anyone building a cheap local AI box in 2026, the first rule has not changed. VRAM matters more than gamer marketing. A Llama 3.1 8B Q4 build in Ollama is 4.9GB. A Gemma 3 12B Q4 build lands at 8.1GB, while its Q8 &#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best budget GPUs for local LLMs in 2026: 5 smart buys for Ollama&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-04-21T13:31:02.258Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!vIue!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ed22ac-c47a-4628-85f2-763942f38049_2303x1478.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-budget-gpus-local-llms-2026&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:194906880,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h4>Realistic total</h4><p>A careful 16GB build can fit around $850 to $1,050, depending on RAM and SSD pricing. If the GPU price is inflated, the build loses its point.</p><h4>What it runs well</h4><p>This path works well for 8B and 12B class local models with more comfort than 12GB, moderate coding models, LM Studio and Ollama daily use, lower-power local AI desktops, and small creator workflows.</p><h4>What it does badly</h4><p>It still does not replace 24GB VRAM. The 128-bit memory bus on these cards is not ideal for every workload. At bad pricing, a used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> may be the better AI buy.</p><div class="callout-block" data-callout="true"><h4>Verdict</h4><p>Use the 16GB path if you want a newer, lower-power, warranty-backed local LLM PC. Skip it if the card is priced too close to a used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a>.</p></div><h3>Build path 4: CPU-only local LLM PC under $1,000</h3><p>This is the fallback path. It can be useful, but it should be treated as a stepping stone or privacy-first machine.</p><h4>When CPU-only makes sense</h4><p>CPU-only local LLMs make sense when you mainly want privacy, run small quantized models, do not care about speed, already own the PC, want to learn before buying a GPU, or are building a home server that also handles file storage, backups, or light automation.</p><p>LM Studio&#8217;s <a href="https://lmstudio.ai/docs/app/system-requirements">system requirements</a> recommend at least 16GB of RAM on Windows and at least 4GB of dedicated VRAM for GPU use. The <a href="https://github.com/ggml-org/llama.cpp">llama.cpp project</a> is designed for local LLM inference across a wide range of hardware, including CPU and hybrid CPU plus GPU setups.</p><div><hr></div><h4>What to buy</h4><ul><li><p>Used office tower with a 6-core or 8-core CPU.<br></p></li><li><p>32GB RAM minimum.<br></p></li><li><p>64GB RAM if affordable.<br></p></li><li><p>1TB NVMe or SATA SSD.<br></p></li><li><p>Case and PSU that leave room for a future GPU.</p></li></ul><div><hr></div><h4><em><strong>More on CPUs for local AI:</strong></em></h4><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b2505098-34e8-43e0-9585-d801cd1d1160&quot;,&quot;caption&quot;:&quot;If you care about running local LLMs without being boxed in by API limits, feature removals, or policy changes, CPU choice still matters. The GPU still does most of the heavy lifting in a sensible local AI build&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;showDescription&quot;:true,&quot;showImage&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;The best CPU for running local LLMs: top AMD vs Intel processors ranked&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:362090995,&quot;name&quot;:&quot;Popular AI&quot;,&quot;bio&quot;:&quot;Popular AI covers local AI for power users who want more autonomy, hardware-specific fixes, accessible user guides, build advice, and clear analysis of the AI changes that actually matter.&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d33e76e-6901-474e-b732-a93e6bca8acd_514x514.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-26T14:48:48.300Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3ZfR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a4fd65-8759-4663-94b8-73a686cfb188_2400x1444.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.popularai.org/p/best-cpu-for-running-local-llms-top&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:192086772,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:1,&quot;publication_id&quot;:5553661,&quot;publication_name&quot;:&quot;Popular AI&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ea4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0dc4955-a9ab-44cd-b158-63f55cabea52_514x514.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><h4>What to avoid</h4><ul><li><p>Do not spend $900 on a CPU-only local LLM PC if a used GPU build is available.<br></p></li><li><p>Do not buy a small-form-factor office PC unless you know it can accept the GPU you may want later.<br></p></li><li><p>Do not mistake NPU marketing for local LLM capability.<br></p></li></ul><div class="callout-block" data-callout="true"><h4>Verdict</h4><p>Use CPU-only if it is a cheap stepping stone or a privacy-first fallback. Avoid it as the main local LLM workstation if you want speed.</p></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NX1m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NX1m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!NX1m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!NX1m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!NX1m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NX1m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1974886,&quot;alt&quot;:&quot;A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.popularai.org/i/198385359?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding." title="A budget local LLM PC lives or dies by the GPU: VRAM matters more than flashy &#8220;AI PC&#8221; branding." srcset="https://substackcdn.com/image/fetch/$s_!NX1m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!NX1m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!NX1m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!NX1m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03c4eeb6-5f41-4223-afd0-0abdf67ac340_1672x941.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The RTX 3090, RTX 3060 12GB, and modern 16GB cards define the real sub-$1,000 local AI tradeoff: more VRAM, lower cost, or less used-hardware risk. &#169; Popular AI</figcaption></figure></div><h3>Recommended parts strategy by budget</h3><h4>Around $500</h4><p>Buy used.</p><p>Best path:</p><ul><li><p>Used office tower.<br></p></li><li><p>32GB RAM.<br></p></li><li><p>1TB SSD.<br></p></li><li><p>No GPU, or a used <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a> if you find a real deal.<br></p></li></ul><p>This is a learning machine. It is not a serious long-term local AI workstation.</p><div><hr></div><h4>Around $750</h4><p>Best path:</p><ul><li><p><a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a>.<br></p></li><li><p><a href="https://www.amazon.com/AMD-5600-12-Thread-Unlocked-Processor/dp/B09VCHR1VH?tag=popularai-20">Ryzen 5 5600</a> or used equivalent.<br></p></li><li><p>32GB RAM.<br></p></li><li><p>1TB SSD.<br></p></li><li><p>Quality 550W to 650W PSU.<br></p></li></ul><p>This is the most realistic cheap local LLM PC.</p><div><hr></div><h4>Around $1,000</h4><p>Best path:</p><ul><li><p>Used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> if the rest of the build is cheap.<br></p></li><li><p><a href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20">RTX 5060 Ti 16GB</a> if new-card pricing is reasonable.<br></p></li><li><p><a href="https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20">RTX 4060 Ti 16GB</a> if heavily discounted.<br></p></li></ul><p>At this tier, avoid RGB spending, premium motherboards, liquid cooling, oversized CPUs, and &#8220;AI PC&#8221; branding.</p><h3>What to avoid buying</h3><p>Avoid an 8GB GPU as the centerpiece. An 8GB card can run small models, but it is a weak 2026 local LLM build target unless the price is extremely low. You will hit the VRAM wall too quickly.</p><p>Avoid a new <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> at inflated third-party prices. The <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> makes sense as a used-value card. It loses its charm when priced like a collector item.</p><p>Avoid tiny office PCs for full-size GPU builds. Small-form-factor Dell, HP, and Lenovo office machines can be great cheap desktops, but many cannot fit a full-size GPU or a proper PSU.</p><p>Avoid overbuying the CPU. A 12-core or 16-core processor looks impressive in a parts list, but it will not fix too little VRAM.</p><p>Avoid buying an &#8220;AI PC&#8221; purely because of the NPU. A 40+ TOPS NPU can matter for Windows AI features, but it does not give you 12GB, 16GB, or 24GB of GPU VRAM for local LLMs.</p><h3>Windows or Linux for a local LLM PC?</h3><p>Use Windows if you want the easiest desktop experience with LM Studio, NVIDIA drivers, and general software compatibility. This is usually the better starting point for first-time builders who want to test models quickly.</p><p>Use Linux if the machine will become a dedicated local AI box for Ollama, Open WebUI, Docker, SSH access, and server-style workflows. Linux can feel cleaner once the hardware is stable and the machine has one job.</p><p>The practical path is simple. Start on Windows if you are learning. Move to Linux if the machine becomes a dedicated local AI server. Use one clean OS install. Avoid turning the first local LLM build into a triple-boot science project.</p><h3>Best software stack for this PC</h3><h4>Beginner stack</h4><ul><li><p>LM Studio.<br></p></li><li><p>Ollama.<br></p></li><li><p>Open WebUI later.<br></p></li></ul><p>This is the easiest route for testing models and learning what your hardware can handle.</p><div><hr></div><h4>Practical local server stack</h4><ul><li><p>Linux.<br></p></li><li><p>Ollama.<br></p></li><li><p>Open WebUI.<br></p></li><li><p>Optional <code>llama.cpp</code>.<br></p></li></ul><p>This works well if the machine will sit on your network and serve other devices.</p><div><hr></div><h4>Power-user stack</h4><ul><li><p>Linux.<br></p></li><li><p><code>llama.cpp</code>.<br></p></li><li><p>vLLM for supported models and more advanced serving.<br></p></li><li><p>Docker where it actually helps.<br></p></li><li><p>Manual model management.<br></p></li></ul><p>This path is stronger, but it is not where most first-time budget builders should start.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Popular AI&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Popular AI</span></a></p><div><hr></div><h3>FAQ</h3><h3>Can you build a good local LLM PC under $1,000 in 2026?</h3><blockquote><p>Yes, but the word &#8220;good&#8221; needs discipline. A used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> build can be excellent for the money if you buy carefully and keep the rest of the parts cheap. A new-parts build under $1,000 is more likely to land on an <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a> or a 16GB midrange GPU.</p></blockquote><h3>Is a used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> still worth it for local LLMs?</h3><blockquote><p>Yes, if the price is right and you accept used-card risk. The 24GB VRAM is the reason to buy it. Heat, power draw, age, and seller risk are the reasons to inspect carefully.</p></blockquote><h3>Is 12GB VRAM enough for local LLMs?</h3><blockquote><p>It is enough to start. An <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a> can run useful smaller quantized models, but it is not ideal for larger models, long context, or heavier coding workflows.</p></blockquote><h3>Is 16GB VRAM enough for local AI?</h3><blockquote><p>16GB is a strong middle ground. It is more comfortable than 12GB, easier to cool than a 3090, and often available in newer cards. It still does not give the same headroom as 24GB.</p></blockquote><h3>How much RAM should a local LLM PC have?</h3><blockquote><p>32GB is the minimum practical target for a budget build. 64GB is better, but 2026 RAM pricing can make that painful. Leave room to upgrade if the budget forces a 32GB starting point.</p></blockquote><h3>Should you buy AMD or Intel GPUs for a budget local LLM PC?</h3><blockquote><p>For most first-time local LLM builders, NVIDIA is still the safer choice because CUDA support is widely assumed. AMD and Intel can work in some setups, but the friction is higher.</p></blockquote><h3>Is a Copilot+ PC good for local LLMs?</h3><blockquote><p>A Copilot+ PC is built around Windows AI features and a 40+ TOPS NPU requirement. That does not replace the GPU VRAM needed for serious local LLM work.</p></blockquote><h3>Should you buy a prebuilt PC instead?</h3><blockquote><p>Only if the price is close to the cost of parts and the GPU has enough VRAM. Many prebuilts under $1,000 use 8GB GPUs, weak power supplies, cramped cases, or proprietary parts that make upgrades annoying.</p></blockquote><div class="callout-block" data-callout="true"><h3>Final recommendation</h3><p>The best budget local LLM PC under $1,000 in 2026 comes down to three realistic choices.</p><p>Buy the used <a href="https://www.amazon.com/s?k=RTX+3090+24GB&amp;tag=popularai-20">RTX 3090</a> build if you want the most local AI capability near the budget limit and can handle used hardware risk.</p><p>Buy the <a href="https://www.amazon.com/s?k=RTX+3060+12GB&amp;tag=popularai-20">RTX 3060 12GB</a> build if you want the cheapest sane local LLM desktop and are comfortable with smaller models.</p><p>Buy the <a href="https://www.amazon.com/s?k=RTX+5060+Ti+16GB&amp;tag=popularai-20">RTX 5060 Ti 16GB</a> or <a href="https://www.amazon.com/s?k=RTX+4060+Ti+16GB&amp;tag=popularai-20">RTX 4060 Ti 16GB</a> build if you want a newer, lower-power machine and the card is priced well.</p><p>Skip 8GB GPU builds, overbuilt CPUs, tiny office PCs with no GPU path, and NPU-branded &#8220;AI PCs&#8221; sold as if they solve local LLM hardware reality.</p><p>For local AI, the old rule still holds: fit comes first. Speed is nice. VRAM decides whether the model runs comfortably at all.</p></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.popularai.org/p/best-budget-local-llm-pc-under-1000-2026/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.popularai.org/p/best-budget-local-llm-pc-under-1000-2026/comments"><span>Leave a comment</span></a></p><div><hr></div><p style="text-align: center;"><em><strong>Explore more from Popular AI:</strong></em></p><p style="text-align: center;"><strong><a href="https://popularai.substack.com/t/start-here">Start here</a> | <a href="https://popularai.substack.com/t/local-ai">Local AI</a> | <a href="https://popularai.substack.com/t/walkthroughs">Fixes &amp; guides</a> | <a href="https://popularai.substack.com/t/ai-builds-gear">Builds &amp; gear</a> | <a href="https://popularai.substack.com/t/popular-ai-podcast">Popular AI podcast</a></strong></p>]]></content:encoded></item></channel></rss>