Welcome back
Curated from 200+ sources across AI & machine learning

Andreessen Horowitz is set to co-lead the round, with Nvidia and Thrive Capital also expected to participate


So I've been putting this off for months because every tutorial made it sound like you need a PhD and a startup budget to even begin. Turns out that's bullshit. Started yesterday at 2pm with literally just OpenAI's API and a Python script. No frameworks, no fancy vector databases, just me trying to make something that could answer questions about my company's support docs. First attempt was embarrassing. The thing would confidently tell customers we sold motorcycles (we don't, we make accounting software). But I kept going. By 9pm I had something that actually worked. Like, genuinely helpful responses that pulled the right info from our knowledge base. The secret wasn't some complex architecture, it was just understanding the basic flow. You feed the user question to a search function that finds relevant docs. Those docs get stuffed into a prompt with the original question. Send it all to GPT. Done. Obviously this is the kiddie pool version and I'm already hitting walls (the thin
Hi ! I just finished building a workstation specifically for local inference and wanted to get your thoughts on my setup and model recommendations. •GPU: AMD Radeon AI PRO R9700 (32GB GDDR6 VRAM) •CPU: AMD Ryzen 7 9700X •RAM: 64GB DDR5 •OS: Fedora Workstation •Software: LM Studio (Vulkan backend), wanna test LLAMA •Performance: Currently hitting a steady ~120 tok/s on simple prompts. (qwen3.6-35b-a3b) What is the largest model architecture you recommend running comfortably? Should I be focusing on Q4_K_M quantizations ? submitted by /u/jsorres [link] [comments]

Tech workers in China are being instructed by their bosses to train AI agents to replace them—and it’s prompting a wave of soul-searching among otherwise enthusiastic early adopters. Earlier this month a GitHub project called Colleague Skill, which claimed workers could use it to “distill” their colleagues’ skills and personality traits and replicate them with…

The Mac Mini computer is now all but out of stock because the machine is the most cost-effective way to run locally hosted AI agents.

Article URL: https://tessl.io/blog/a-proposed-framework-for-evaluating-skills-research-eng-blog/ Comments URL: https://news.ycombinator.com/item?id=47832351 Points: 2 # Comments: 0
arXiv:2604.15646v1 Announce Type: new Abstract: Clinicians exploring oncology trial repositories often need ad-hoc, multi-constraint queries over biomarkers, endpoints, interventions, and time, yet writing SQL requires schema expertise. We demo FD-NL2SQL, a feedback-driven clinical NL2SQL assistant for SQLite-based oncology databases. Given a natural-language question, a schema-aware LLM decomposes it into predicate-level sub-questions, retrieves semantically similar expert-verified NL2SQL exemplars via sentence embeddings, and synthesizes executable SQL conditioned on the decomposition, retrieved exemplars, and schema, with post-processing validity checks. To improve with use, FD-NL2SQL incorporates two update signals: (i) clinician edits of generated SQL are approved and added to the exemplar bank; and (ii) lightweight logic-based SQL augmentation applies a single atomic mutation (e.g., operator or column change), retaining variants only if they return non-empty results. A second LL
For people just starting out in GPU kernel engineering or LLM inference (FlashAttention / FlashInfer / SGLang / vLLM style work), most job postings still list “C++17, CuTe, CUTLASS” as hard requirements. At the same time NVIDIA has been pushing CuTeDSL (the Python DSL in CUTLASS 4.x) hard since late 2025 as the new recommended path for new kernels — same performance, no template metaprogramming, JIT, much faster iteration, and direct TorchInductor integration. The shift feels real in FlashAttention-4, FlashInfer, and SGLang’s NVIDIA collab roadmap. Question for those already working in this space: For someone starting fresh in 2026, is it still worth going deep on legacy C++ CuTe/CUTLASS templates, or should they prioritize CuTeDSL → Triton → Mojo (and keep only light C++ for reading old code)? Is the “new stack” (CuTeDSL + Triton + Rust/Mojo for serving) actually production-viable right now, or are the job postings correct that you still need strong C++ CUTLASS skills to get hire

Article URL: https://github.com/moeen-mahmud/remen Comments URL: https://news.ycombinator.com/item?id=47825712 Points: 1 # Comments: 0
A couple of early-to-mid-stage startups I'm consulting with are asking the same question: their AI/ML team wants production Postgres data, and nobody's quite sure how to give it to them. I've handled this before for BI teams — read replica with a generous `max_standby_streaming_delay` and `hot_standby_feedback` on, accepting the occasional bloat on the primary. Worked fine. But the AI/ML ask feels different in ways I can't fully articulate yet, which is part of why I'm asking. A few things I'm trying to calibrate: Where does the agent actually connect? Primary with RLS, read replica, warehouse (Snowflake/BigQuery/Redshift), lakehouse (Iceberg/Delta on S3), or something else? If you're not doing this — is it compliance, cost fear, bad experiences (runaway queries, PII in prompts), or something else? And the one I'm most curious about: does this actually feel different from giving BI tools DB access, or is it the same problem wearing new clothes? Not looking for product recommendations.
the link will be in the comments plz give me advice and everything if anyone has experience with this. I am super excited to get into this world. idk if Friday is allowed its a total rip off but oh well lol submitted by /u/Time_Appeal2458 [link] [comments]

Google's A2UI 0.9 is a framework-agnostic standard that lets AI agents generate UI elements on the fly, tapping into an app's existing components across web, mobile, and other platforms. The article Google launches generative UI standard for AI agents appeared first on The Decoder.

This week: AI-enabled market entry, vision intelligence, true chemical operations autonomy, LNG tankers, Gemini embodied reasoning, smaller/cheaper/recycled EVs

A research team developed an OpenClaw agent for smart glasses to find out how continuously perceiving AI changes the way people use agentic AI systems. The article Always-on Ray-Ban Meta glasses powered by OpenClaw speed up everyday tasks in new study appeared first on The Decoder.

Anthropic's Opus 4.7 matches its predecessor's per-token price, but each request ends up costing significantly more. The reason: a new tokenizer that breaks the same text into up to 47 percent more tokens. Early measurements show what that shift means in practice for Claude Code users. The article First token counts reveal Opus 4.7 costs significantly more than 4.6 despite Anthropic's flat pricing appeared first on The Decoder.
AI news from 200+ sources
Get Started Free
Investing.com -- Alphabet’s Google is in talks with Marvell Technology to develop two new chips designed to run AI models more efficiently, The Information reported Sunday, sending the chipmaker’s shares higher in premarket trading today.
Talks signal push to reduce reliance on external chip suppliers

Article URL: https://gizmodo.com/salesforce-announces-huge-ai-initiative-and-calls-it-headless-360-2000748243 Comments URL: https://news.ycombinator.com/item?id=47829523 Points: 4 # Comments: 0

Mark Zuckerberg and Jack Dorsey have different visions for how to use AI for management purposes, but both imagine a system of heightened control.
Only counting those categorized as cs.LG. I'm sure there are multiple other subcategories with even more ML papers uploaded such as cs.AI, and math.OC How are you keeping up with the research in this field? submitted by /u/NeighborhoodFatCat [link] [comments]
submitted by /u/BrightOpposite [link] [comments]

Article URL: https://9to5mac.com/2026/04/19/apple-local-ai-server-hosting-new-business-model/ Comments URL: https://news.ycombinator.com/item?id=47827682 Points: 2 # Comments: 0
submitted by /u/BousWakebo [link] [comments]

On the latest episode of Equity, we discuss OpenAI's latest acquisitions and whether they address "two big existential problems" for the company.

A lot of AI startups exist partly because the foundation models haven't expanded into their category yet. As many jokingly acknowledge, that won't last forever.

Welcome back to TechCrunch Mobility, your hub for the future of transportation and now, more than ever, how AI is playing a part.

Vercel, a major development platform that hosts and deploys web apps, was compromised, and the hackers are attempting to sell stolen data. A person claiming to be a member of ShinyHunters, which was behind the recent hack of Rockstar Games, posted some data online, including employee names, email addresses, and activity time stamps. Vercel confirmed in a post on X that a "security incident" had occurred, and that it impacted a "limited subset" of its customers. Vercel said that a compromised third-party AI tool was the avenue for attack, though it did not specify which third-party was involved. We've identified a security incident that inv … Read the full story at The Verge.

NVIDIA, ticker NasdaqGS:NVDA, has denied recent rumors that it is seeking a major acquisition of a PC manufacturer such as Dell or HP. The company continues to gain traction in AI infrastructure, with Foxconn citing strong demand tied to the NVIDIA ecosystem. NVIDIA has expanded its partnership with QNX to support safety critical industrial edge AI systems across sectors like robotics and medical devices. NVIDIA shares most recently traded at $201.68, with NasdaqGS:NVDA up 6.9% over the...