Technology

AI & Machine Learning

Models, breakthroughs, and the race to AGI

Stories
200
stories
Sources
19
sources
Page
Page 5 of 10
Updated hourly

Why Owl Post covers AI & Machine Learning

AI moves faster than any single feed can keep up with. Frontier model releases, new benchmarks, capability scares, regulation moves, and the steady drip of papers that actually matter — the signal-to-noise ratio is brutal, and most coverage is either uncritical hype or reflexive doomerism. Owl Post reads across hundreds of sources every day, filters out the takes that don't pass smell tests, and surfaces what genuinely shifted: model releases worth paying attention to, capability jumps with real-world implications, and policy moves with teeth.

The voice you read it in is yours. Pick a deep, contextualized voice if you want explanations that respect a smart audience without dumbing down. Pick a measured, analytical voice if you want context and nuance over hot takes. Pick a sober, no-hype voice if you want the analyst's read on what's real. Same news, the way you actually like to read it.

Three to five stories every weekday morning. Written in your voice. In your inbox. In 3 minutes.

No Template Fits? Generate Your Own Awesome DESIGN.md with .NET and Ollama

No Template Fits? Generate Your Own Awesome DESIGN.md with .NET and Ollama

Introduction I found that lots of excellent DESIGN.md examples are already available in this awesome repo: https://github.com/VoltAgent/awesome-design-md Light introduction: this repository is a curated collection of ready-to-use DESIGN.md files extracted from real websites, designed to help AI agents generate UI with better visual consistency. Big credit to the maintainers and contributors of awesome-design-md for pushing this direction forward and making design-system knowledge easier to reuse. That said, sometimes I want to build something not included in the repository. Especially when I am in an old-school mood and want a retro flavor that is uniquely mine. ;) So in this tutorial, we build a fun experiment: a custom DESIGN.md generator app in C#. The app will: Crawl a target page. Extract visual tokens (colors, fonts, semantic classes). Ask LLM to synthesize a full DESIGN.md. Ask LLM to generate a verification index.html from that DESIGN.md. Save both files locally. We will use: .NET 8 OllamaSharp http://localhost:11434 nemotron-3-super:cloud Target URL -> DesignMdGenerator.CrawlAndExtractMetadataAsync -> tokenized design metadata (title/colors/fonts/classes) -> Ollama (nemotron-3-super:cloud) -> DESIGN.md -> Ollama verification round -> index.html -> save to Output/DesignMdGenerator Visual Studio or VS Code .NET 8 SDK Ollama running locally Model available in your Ollama environment Run Ollama with the model used in this demo: ollama run nemotron-3-super:cloud If your Ollama service is already running at http://localhost:11434, you are good. dotnet new console -n DesignMdGeneratorDemo cd DesignMdGeneratorDemo dotnet add package OllamaSharp Your .csproj should include at least: Exe net8.0 enable enable Program.cs This version is focused only on the DESIGN.md demo flow. using DesignMdGeneratorDemo.Services; Console.OutputEncoding = System.Text.Encoding.UTF8; string targetUrl = args.Length > 0 ? args[0] : "https://www.yahoo.com"; string modelName = args.Length >

dev.to

DeepSeek API Complete Guide: Setup, Pricing, and Best Practices

DeepSeek API Complete Guide: Setup, Pricing, and Best Practices Why DeepSeek API is Gaining Traction Among DevelopersIf you've been following the AI landscape, you've likely heard about DeepSeek. This open-weight model has been making waves for its impressive performance on reasoning tasks, coding, and mathematical problem-solving—often rivaling much larger proprietary models at a fraction of the cost. For developers, the DeepSeek API opens up a world of possibilities without requiring you to self-host a massive model.In this DeepSeek API guide, we'll walk through everything you need to get started: from your first API call to understanding the pricing structure, and some best practices I've picked up along the way. Whether you're building a coding assistant, a chatbot, or just experimenting, this DeepSeek tutorial will have you operational in minutes.## Getting Started: DeepSeek Setup in Under 5 MinutesThe DeepSeek setup process is refreshingly straightforward. Unlike some APIs that require complex authentication flows, DeepSeek follows the familiar OpenAI-compatible format, which means if you've worked with GPT APIs before, you're already 90% of the way there.### Step 1: Get Your API Key- Head to the DeepSeek platform and create an account.- Navigate to the API section in your dashboard.- Generate a new API key. Copy it immediately—you won't be able to see it again.### Step 2: Make Your First API CallHere's a simple Python example using the requests library. This is the core of any DeepSeek API guide—a clean, working snippet you can run right away: import requests import jsonurl = "https://api.deepseek.com/chat/completions" headers = { "Authorization": "Bearer YOUR_DEEPSEEK_API_KEY", "Content-Type": "application/json" }payload = { "model": "deepseek-chat", "messages": [ {"role": "system", "content": "You are a helpful coding assistant."}, {"role": "user", "content": "Write a Python function to check if a string is a palindrome."} ], "temperature": 0.7, "max_tokens

dev.to

Dev Opportunity Radar #1: A $100K AI Grant, Two Fellowships, and an AI Agent Resource

TL;DR I've missed a lot of opportunities simply because I didn't know they existed. So every Friday, I'll share opportunities, programs, events, resources, and other interesting finds that I come across. I know I'll miss things, so if you discover something worth sharing, drop it in the comments. If I feature your find in a future edition, I'll make sure to credit you. If you discovered it, the recognition belongs to you. Hopefully this becomes less of my radar and more of our radar over time. This week's edition includes a contributor-focused fellowship, a $100,000 AI research grant, a founder fellowship, and a resource for people interested in building AI agents. ⚡ Quick Scan Flow Fellowship Interactivity Research Grants by Thinking Machines Commit Fellowship Hands-on AI Agents 🧭 Why I'm Starting This 🤝 Let's Build This Together 🌟 Community Finds 👋 Until Next Friday Opportunity Organization Type Deadline Flow Fellowship Flow Research Fellowship May 31 Interactivity Research Grants Thinking Machines Research Grant ($100K) June 19, 2026 Commit Fellowship MLH & Transcend Network Founder Fellowship May 31 Resource Highlight: Hands-on AI Agents - a free book and code repository for learning modern AI agent frameworks. Here are a few opportunities I came across this week that I thought were worth sharing. Who it's for: People interested in contributing to projects across AI, product, research, systems, content, and media. What stands out: Unlike many programs that focus primarily on learning, this fellowship focuses on contributing to real projects and shipping public work. Format: 12-week cohort with mentorship and project contributions. Location: Global Deadline: May 31 🔗 Learn More | Apply Who it's for: Researchers exploring human-AI interaction and collaboration. What stands out: Up to $100,000 in funding plus $25,000 in Tinker credits for projects focused on improving how humans and AI work together. Areas of interest: Multimodal interaction, generative UI, AI

dev.to

How I Built 15 SEO Tool Sites Solo: My GEO Content Factory SOP

I started building English tool sites in early 2026. The goal was simple: build sites that get indexed by both Google AND AI Overviews, with minimum cost. 4 months later, I have 15 tool sites under one domain (deepfms.com) — converters, dev tools, USB-C calculators. Some are doing 200+ daily PV. Some got cited by AI Overview within days. Here's my complete SOP, soup to nuts. KGR = allintitle results ÷ monthly search volume KGR 1: Don't bother. Established sites own the SERP. Start with a scenario, not a tool ❌ "I want to build a PDF compressor" ✅ "What online tools do people need when building a resume?" Pull keyword lists — SEMrush, Ahrefs, or free tools like Ubersuggest Manually check Google's top 10 — If the top results are old/broken sites, that's your signal Calculate KGR — Filter for <0.25 Reverse-engineer low-authority competitors — Find their feature gaps Real Keyword Examples (All KGR < 0.25) Keyword KGR Verdict usb-c to ethernet adapter not working 0.12 ✅ Go usb-c docking station keeps disconnecting 0.08 ✅ Go usb-c cable types explained 0.18 ✅ Go Once you pick a keyword, don't write a blog post. Build a tool page instead. AI loves tools: When someone searches "convert kg to lbs", AI Overviews cite a working converter, not a 2000-word article Higher dwell time: Open tool → use → copy result. That's 40+ seconds of engagement. Google favors utility: Core Web Vitals updates have been kind to tool pages over content-only pages Static HTML + CSS + JS → Single page app CSS variables (dark/light mode) JSON-LD structured data (WebApplication Schema) 100% frontend, zero server cost Every tool is a single .html file. Self-contained. No framework, no build step. GEO (Generative Engine Optimization) isn't replacing SEO. It's complementing it. SEO targets Google rankings. GEO targets AI Overview citations. 1. ✅ Meta Description (120-160 chars, natural keyword inclusion) 2. ✅ Unique H1 (core keyword in question format) 3. ✅ GEO Summary (H2 sections that answer directly)

dev.to

Keeping Claude Code Context Alive Across a Desktop, a Laptop, and a VPS

I work from two computers — a desktop during the day, a laptop at night. Both run Claude Code. Both need to know what the other one did. For months the answer was "tell the second machine what you did on the first," which is exactly the kind of chore that eventually kills a workflow. This is the setup that finally replaced all the manual context-passing. It's not clever, but it works, and I haven't lost a thread in about six weeks. My name is Fillip Kosorukov. I'm a solo founder building a couple of SaaS products, none of which would ship without AI-assisted coding. Everything here runs on Ubuntu, Python 3, and a small pile of shell scripts. Claude Code has session memory inside a given conversation, and per-project CLAUDE.md files that travel with the repo. What it doesn't have, out of the box, is a durable cross-machine working memory — the sort of thing where you can tell it "we decided X yesterday on the other computer" and it already knows. My fix has three parts: A knowledge directory on the VPS that every machine syncs to (~/knowledge/) A session-end hook that auto-commits and pushes that directory A startup ritual that pulls the latest state and reads the recent CHANGELOG The ritual takes less than a minute on either machine and gives the assistant a useful cold-start state. ~/knowledge/ ├── INDEX.md ├── CHANGELOG.md # append-only, every agent writes when finishing meaningful work ├── scratch.md # Karpathy append-and-review note ├── meta/ │ └── sources-of-truth.md # which file owns which category of information ├── / │ ├── rules.md # durable behavior rules │ ├── hypotheses.md # unconfirmed patterns │ ├── knowledge.md # confirmed facts │ ├── decisions/YYYY-MM-DD-topic.md │ └── raw/ # ingested source material Nothing special. Markdown I can grep. That's the point. When I want to know whether a fact exists, I can ripgrep the tree from any machine in any shell. The VPS is the source of truth. Desktop and laptop are clones. Git does the heavy lifting. On each mac

dev.to

Day 22 of 60: I Built a Production Background Task Pipeline That Processes AI Jobs Asynchronously

** The Problem** Processing large documents through an AI model takes 30-60 seconds. A synchronous API makes the client wait the entire time. Browsers timeout. Users think it crashed. The experience feels broken. Yesterday I learned why this happens. Today I built the fix. ## What I Built A document processing pipeline with FastAPI and PostgreSQL that handles long-running AI tasks in the background. Three task types. Immediate response. Full audit trail. Client submits a document and gets a job ID in milliseconds. Claude processes it in the background. Client polls for the result when ready. Everything logged to PostgreSQL. ## Three Task Types Summarise: submit a document and receive a structured summary with executive overview and key points. Focus area is configurable. Extract: specify which fields you want pulled out. Company name, net profit, revenue, key risks. Claude reads the document and returns exactly what you asked for as structured JSON. Evaluate: provide a list of criteria and Claude checks whether the document meets each one. Returns a pass or fail with reasoning for every criterion. *## How the Pipeline Works POST /jobs/summarise ↓ Job ID returned in milliseconds status: pending ↓ Claude processes in background status: running ↓ GET /jobs/{job_id} status: completed full result ready Optional webhook support means the server can call your endpoint when the job completes and no polling needed. ## The Database Behind It Every job is stored in PostgreSQL with full audit trail: Job ID, status at every stage, task type, input data as JSONB, result as JSONB, error message if failed, webhook URL, created timestamp, completed timestamp. If a job fails the error is recorded for debugging. Nothing disappears silently. ## Real Use Cases Contract review for law firms. Loan application screening for SACCOs. Insurance claims processing. Report summarisation for enterprises. The same pipeline. Different documents. Different criteria. One system. ## What I Learned Fas

dev.to

101. AI Agents: When LLMs Start Taking Actions

Everything you have built so far is reactive. User sends a message. System processes it. System sends a response. Done. An agent is different. An agent receives a goal, not a message. It decides what steps to take to achieve that goal. It uses tools. It observes the results. It adjusts its plan. It continues until the goal is achieved or it determines the goal cannot be achieved. "Summarize this document" is a task. One call. One response. "Research recent papers on transformer efficiency, write a comparison table, and save it as a CSV" is a goal. An agent needs to search the web multiple times, decide which papers are relevant, extract data from multiple sources, format it consistently, handle failures, and write to disk. Five to twenty tool calls. Dynamic decisions at each step. This is the frontier of AI engineering. Agents are brittle. They fail in surprising ways. They are also what makes AI systems feel genuinely useful rather than just responsive. print("Agent vs Non-Agent:") print() print("NON-AGENT (chain/pipeline):") print(" - Fixed sequence of steps") print(" - Steps determined at design time") print(" - No ability to react to intermediate results") print(" - Predictable, debuggable, less capable") print() print("AGENT:") print(" - LLM decides what to do at each step") print(" - Steps determined at runtime based on observations") print(" - Can loop, backtrack, try alternative approaches") print(" - Powerful, unpredictable, capable of novel solutions") print() agent_properties = { "Perception": "Receives inputs: user goal, tool results, memory", "Reasoning": "LLM decides what to do next given current state", "Action": "Executes tools, writes files, calls APIs, searches", "Memory": "Maintains context across multiple steps", "Goal": "Works toward a specified objective, not just responding", } print("The five properties of an agent:") for prop, description in agent_properties.items(): print(f" {prop: Dict: return { "name": self.name, "description": self.descr

dev.to

Hermes Agent Needs a Flight Recorder - So I Built One

This is a submission for the Hermes Agent Challenge Autonomous agents can now write code, call tools, browse the web, mutate files, and delegate to subagents. But when they fail, they fail invisibly. "An agent ran overnight, caught an unhandled exception loop, and burned $50 in tokens while corrupting our staging database." If you've spent more than a week building production systems with autonomous agents, you've lived some version of this nightmare. Most agent runtimes don't crash cleanly. They slide into retry storms, silently ignore failed tool calls, or recurse through delegation loops until budgets evaporate. Airplanes have flight recorders. Distributed systems have OpenTelemetry. Autonomous agents need TraceGuard. TraceGuard is a lightweight Python library and CLI that acts as an isolated, non-invasive execution flight recorder for autonomous agent runtimes. It consumes append-only JSONL execution traces and detects the three silent killers of agentic workflows: Retry Storms Silent Failures Recursive Delegation Cycles traceguard traces/my_agent_run.jsonl --strict # exit 0 = clean · exit 1 = WARN · exit 2 = CRITICAL Instead of scraping human-readable terminal logs, TraceGuard turns runtime execution into a structured, replayable execution event contract. GitHub: https://github.com/Ale007XD/traceguard Modern agent frameworks can browse the web, write files, execute shell commands, and coordinate sub-agents. But when something goes wrong, you're usually left with a giant wall of terminal output and one impossible question: What actually happened? Not what the LLM said. Not the final output. The actual execution state: What tool calls executed? Which failures were silently ignored? Where did the retry loop begin? Which sub-agent delegated back into itself? Distributed systems engineers solved these problems decades ago using structured traces, append-only logs, and replayable execution histories. Agent runtimes are now complex enough to require the same disciplin

dev.to
The Collapse Equation

The Collapse Equation

In September 2025, Anthropic's threat intelligence team detected something unprecedented. A Chinese state-sponsored hacking group, which the company designated GTG-1002, had manipulated Claude Code into attempting infiltration of roughly thirty global targets spanning major technology companies, financial institutions, chemical manufacturers, and government agencies. The AI executed 80 to 90 per cent of the campaign autonomously, with human operators intervening at perhaps four to six critical decision points per intrusion. At peak operation, the system generated thousands of requests per second. This was attack velocity that human hackers could never match. “This is believed to be the first documented case of a large-scale cyberattack executed without substantial human intervention,” Anthropic stated in its November 2025 disclosure. The attack successfully breached four organisations before detection. The methodology itself was chilling in its sophistication. To convince Claude to engage in the attack, human operators claimed they were employees of legitimate cybersecurity firms and convinced the AI it was being used in defensive security testing. This “social engineering” of the AI model allowed the threat actor to fly under the radar long enough to launch their campaign. They jailbroke the system by breaking attacks into small, seemingly innocent tasks that the AI would execute without being provided the full context of their malicious purpose. Welcome to the new mathematics of digital security, where the fundamental equation that has governed cybersecurity for decades is being rewritten in real time. The old formula was brutal but stable: defenders must succeed continuously whilst attackers need only breach once. Now, artificial intelligence is accelerating that asymmetry to a breaking point that may arrive faster than new safeguards can be deployed. The cybersecurity profession has long operated under what military strategists call the “defender's dilemma.” Pro

dev.to

How do you decide what to give to Claude Code, and what to do yourself?

At a recent event, someone asked me two questions. I could not answer them well at the time: How do you decide which tasks to give to Claude Code, and which ones to do yourself? And are you afraid that the AI will do all the creative work, so your only job becomes managing agents? I gave a generic answer at the moment. But after thinking about it, I believe the answer is simple. Let me start with a statement: software engineering is a craft, not creative work. We practice and develop our skills, repeat the same routines, and learn from our mistakes. But most of the time, we use building blocks that other people invented and tested before us: design patterns, infrastructure tools, etc. The most useful thing we can do is learn from others’ experiences and find a way to use them in our own projects. So we can divide our work into three groups: Routine — the daily, mechanical work. Writing code and tests based on a design that already exists. Using the patterns already established and proven on the project. Engineering — this is where the constraints matter. There are often several possible solutions. Your job is to pick the one that fits the project best, or to see that any of them will work and choose the simplest one. When the design is right, the task is almost finished. You just need to write it down. Creativity — making something that did not exist before. Many engineers, including me, may never do this in their whole career, and that is completely fine: the value of our profession is in execution, not invention. Here is an example. The idea of Claude Code as an agentic tool in the terminal was creativity. The level of abstraction is exactly right for this kind of generic developer tool. It is a simple idea, but I had always used IDEs, and I never thought that coding using the terminal could be more efficient. The first time I tried it, I understood right away that this was the right direction. Everything after that idea is engineering and routine. With this frame

dev.to

AI Leaderboards, Intel Spies, and the Great Bubble Debate

AI Leaderboards, Intel Spies, and the Great Bubble Debate AI developments are moving fast this week, from new ranking systems to government scrutiny and ongoing debates about market fundamentals. Here's what builders and developers need to know. What happened: A new AI leaderboard at aiqrank.com ranks AI models without traditional benchmarks, focusing on real-world performance metrics. Why it matters: Developers can quickly compare model capabilities across practical use cases rather than synthetic tests, helping inform tool selection for projects. Context: The leaderboard emphasizes production-ready performance over lab conditions. What happened: Intel has launched a new initiative targeting critics of AI data centers, according to reporting on kenklippenstein.com. Why it matters: This signals increased corporate and government attention on AI infrastructure debates, potentially affecting developer activism and project visibility. What happened: A Substack post explores potential locations of an AI bubble, questioning current market dynamics. Why it matters: Understanding bubble risks helps developers and startups gauge investment flows and funding sustainability for AI projects. What happened: Airesistlist.org curates resources and tools for those opposing harmful AI applications. Why it matters: Developers seeking ethical guidelines or alternatives to mainstream AI tools can find curated resources and community-driven projects. What happened: Discussion shifts as commenters note declining attention to AI bubble concerns on Hacker News. Why it matters: Reduced public discourse might indicate confidence in AI growth, but developers should stay vigilant about market volatility affecting long-term projects. Sources: Hacker News AI

dev.to

How to Automate Mobile App Testing Without Writing a Single Line of Code

You don't need to be a developer to automate your mobile app testing. Not in 2026. For years, automated testing was gated behind programming skills. If you wanted to automate a login flow, you needed to write Python or Java, configure Appium, learn XPath, and debug flaky selectors. If your job title was "Manual QA Tester" or "Product Manager" or "QA Lead without a coding background", automation was something your engineering team did not something you could touch. That's changed. A new generation of no-code testing tools has made it possible for anyone who can describe a user flow in plain language to automate it. No scripts. No selectors. No environment variables. This guide walks you through exactly how to automate mobile app testing without coding what's possible, how it works, the different approaches available, and a complete step-by-step walkthrough using Drizz's Vision AI platform, with links to the official documentation so you can follow along. If you're new to mobile testing in general, our Best Mobile Test Automation Frameworks (2026) guide provides the broader landscape. No-code mobile testing lets QA testers, PMs, and non-developers create and maintain automated test suites without writing scripts. Three approaches dominate the space: record-and-replay, visual flow builders, and plain English / Vision AI. Record-and-replay tools are easiest to start but break frequently and create heavy maintenance burdens. Visual flow builders offer more control but still depend on element selectors under the surface. Plain English + Vision AI (Drizz) is the most resilient approach tests describe what you see on screen, and the AI identifies elements visually without selectors. Read our deep dive on how Vision Language Models power this technology. Drizz consists of two components: Drizz Desktop for local test creation and validation, and Drizz Cloud for scaled execution, reporting, and CI/CD integration. Traditional mobile test automation was built by developers, for

dev.to

Version 1 Demo: Healthcare AI Microservices Prototype + Roadmap for Version 2 🚀

I’ve worked for several companies in healthcare, and in my most recent work, I’ve been deeply engaged in AI + healthcare projects. Healthcare + AI is incredibly powerful, so I’m going to build something special based on my experience. Version 1 Demo: Microservices-based RAG system that ingests patient data, embeddings notes, medications, labs, and generates AI answers. Demo uses sample/fake patient data to explore how AI can help care teams quickly access information and reduce repetitive administrative tasks. Architecture Diagram (Version 1): +-------------------+ | Care Staff / User | +---------+---------+ | v +-------------------+ | API Gateway | | (/query endpoint) | +---------+---------+ | v +-------------------+ | AI Service | | (LLM + RAG) | +---------+---------+ | v +-------------------+ | Vector Search DB | | Postgres + pgvector | +---------+---------+ ^ | +-------------------+ | Data Service | | FHIR JSON / Sample| | Patient Data | +-------------------+ Version 2 Roadmap (short-term plan): Integrate real FHIR APIs (sandbox / test server) Multi-patient and multi-user support with role-based access Advanced AI: note summarization, medication & lab alerts Workflow automation: reminders, follow-ups, actionable suggestions I’m still experimenting with what the ultimate demo will become, but my goal is to solve real pain points in healthcare workflows and demonstrate the potential of AI in healthcare operations. Looking for feedback, suggestions, or collaborations! How would you improve the demo? Are there any workflow challenges in healthcare AI you’d like to see addressed? Thoughts on making this attractive to recruiters and investors? Any feedback, thoughts, or ideas are highly welcome. Let’s make something impactful together!

dev.to

The Copilot Trap: Turning AI Code Assistants into Mentors Instead of Crutches

The proliferation of AI-driven development tools like GitHub Copilot and Gemini has fundamentally transformed the speed at which software is written. For experienced engineers, these tools are immense productivity multipliers that handle boilerplate code and automate repetitive tasks seamlessly, allowing them to focus on high-level architecture. However, for engineering students and novice programmers, over-relying on generative AI can quickly turn into a dangerous crutch that halts cognitive development and logical growth. Copying and pasting code blocks generated by AI without analyzing their structural behavior creates a fragile foundation. Beginners who skip the foundational struggle of debugging simple syntax errors often find themselves completely paralyzed when the AI introduces a subtle, logical bug that requires deep systems knowledge to fix. The industry in 2026 does not need programmers who merely act as a bridge between an AI prompt and a text editor; it needs engineers who understand the mechanics behind the code. To survive in an AI-augmented industry, the next generation of developers must treat artificial intelligence as a personalized mentor rather than a shortcut to finish assignments. This critical and interactive approach to utilizing modern tools is heavily championed within the technology modules at https://unair.ac.id/. Students are trained to write their own logical algorithms first, then leverage AI to review their work, explain complex error stacks, or suggest optimization strategies, ensuring that human intelligence remains the primary architect of the system.

dev.to

How I productionized my multi-agent AI support copilot in Teams and Azure

TL;DR Built a .NET A2A demo to validate the triage pattern before deploying the Python system. If the shape only works in one stack, it is not a portable pattern. Teams is the ingest channel for this deployment, not a hard requirement. The bot posts to a channel-agnostic /ingress endpoint; any other ingest can do the same. Teams timeout budgets forced a full async reply architecture. Adaptive card size limits forced progressive disclosure: compact badge up front, everything else behind toggles. RSC permissions only activate on manifest install, not Entra consent alone. Getting that order wrong costs you a 403 and a bug you cannot reproduce. Containerization was table stakes. The real work was auth, telemetry, storage, and making every platform permission explicit. Part 2 covered the runtime failures and the hardening work that followed. This post is about the next step: productionization. Once the system was capable of producing credible triage results repeatedly, the question changed again. It was no longer "can this architecture work?" It was "can this behave like a deployed product?" That question turned out to be broader than "put it in Docker": channels had timeout budgets Teams cards had presentation limits attachments arrived in platform-specific shapes storage and audit needed durable homes deployment needed images, identities, secrets, probes, and update flow admin approval and manifest install were part of the runtime story, not just setup trivia In other words: the hard part was not just getting the system to reason. It was getting the system to operate in the real environment it was supposed to serve. For this deployment, that environment is Microsoft Teams as the ingest channel and Azure as the runtime. Before going further into the productionization lessons, I want to point to something concrete. When agents run inside an LLM session, it is hard to tell whether a failure is a routing problem or a model problem. Before deploying, I wanted proof that the

dev.to

DevRel now that generative AI is here

Being a DevRel engineer is like being a jack of all trades, covering at least seven distinct areas. 1 2 How is developer education going to be disrupted? How must developer education change because of gen AI? The formats and approaches need to change, documentation website, 3 Documentation search MCP tool where every result contains a citation Full-blown guided tutorial via agent skills Agent skills as docs as code, evolved from docs as code Despite this disruption, developer education becomes more, not less, important. How is developer success going to be disrupted? 4 How must developer success change because of gen AI? opposite and double down on in-person events and communities, AI-native engineers. Remember that humans are still the ones operating gen AI tools, Baseline developer success activities are still relevant, How is developer marketing going to be disrupted? How must developer marketing change because of gen AI? Just as SEO shaped what developers found on Google, 5 awareness and evaluation stages of a developer journey. learn, build, and scale stages of the developer journey, Developer marketing catered exclusively to human developers. developer personas. Most DevRel teams are focused on external developers. 6 Which parts of developer education do not change because of gen AI? in common when they learn: Which parts of developer success do not change because of gen AI? Which parts of developer marketing do not change because of gen AI? information valve approach with a marketing funnel 7 Will gen AI disrupt DevRel? Will gen AI therefore make DevRel irrelevant? more work to be done, This was largely based on a hallway conversation that I had with Mike Swift, “swyx”, Thanks also to Owanate Amachree and Fabian Hug for your reviews and comments! Developer documentation, developer tools, developer community, and strategy. Developer Relations, by Caroline Lewko and James Parton GEO is to ChatGPT as SEO is to Google. Answer Engine Optimisation. GEO - Wikipedia.

dev.to
TinySearch: Let Small Local LLMs Search the Web Without Burning Context

TinySearch: Let Small Local LLMs Search the Web Without Burning Context

I’ve been playing around with local LLM agents a lot lately.. Mostly smaller models, MCP tools, Cline/Roo-style workflows, and home lab setups. Not the “infinite context, infinite budget” world. More like: “Can this 4B/9B model actually use the web without getting buried alive by garbage context?” That was the problem that kept annoying me. Most web-search tools technically work, but they often dump way too much raw page text into the model. You ask a simple question, and suddenly your local model is trying to reason through cookie banners, broken markdown, SEO filler, navigation menus, duplicated paragraphs, and five pages of irrelevant junk. For small models, that is painful. They do not need “the whole web”. They need a small, useful, source-grounded slice of the web that matches the actual query. So I built TinySearch. GitHub: https://github.com/MarcellM01/TinySearch TinySearch is a small open-source MCP research tool that: searches the web crawls selected pages chunks the extracted content reranks the useful parts returns a compact, source-grounded prompt for your model The flow is basically: search -> crawl -> rerank -> return grounded prompt That is the whole idea. TinySearch does not answer the question itself. It prepares the evidence. Your actual LLM then answers using that evidence. That matters because I do not want another LLM layer summarizing summaries. I want the model to receive clean, ranked, URL-attached context and reason from there. The original pain was simple: I wanted local agents to have web research without insane context overhead. When a tool dumps entire pages into context, it creates three problems: it burns tokens for no reason it confuses smaller models it makes agent workflows feel heavier than they need to be TinySearch is for that annoying middle ground where you want web research, but you do not want to set up a whole search stack or pay for a commercial API for every agent query. It makes sense for: local LLM agents MCP workflows

dev.to

Sequoia's 'This is AGI' talk, distilled — what it means if you build on the models

Sequoia's AI Ascent 2026 keynote ("This is AGI") is worth 32 minutes of your time. I distilled it into the parts that actually change how you build. Short version up top, then the framework. The one reframe that matters Most of us have only lived through communication revolutions — the internet, cloud, mobile. They changed how information is distributed. AI is a computation revolution: it changes how information is processed. Different shape of wave entirely. Why you should care: in a computation revolution the capability floor moves under your feet every day. The thing you built last week can be irrelevant this week. Sequoia isn't proposing a technical definition. Their commercial one is the useful one: if you can dispatch an agent to do a job, it recovers from failure, and persists until the job is done — that's AGI. Three inflection points got us here: pre-training (ChatGPT), reasoning (o1), long-horizon agents (Claude Code). We went from horses 40% faster to cars 40x faster. This is the part for builders. Three pillars: Moats → go customer-back, not tech-out. Capabilities change faster than customers do. What you build may be obsolete tomorrow; how tightly you wrap around the customer is durable. Affordance. Claude Code is insanely powerful and has almost no affordance — open a terminal for the average Fortune 500 employee and watch. The opportunity is building the path of least resistance for your customer's specific problem. Diffusion gap. Capabilities diffuse into the market far slower than they're created. Every day the labs outrun the enterprise, that gap — your opportunity — widens. And: no lead is safe. You can't pass 15 cars in the sun, but you can in the rain. Right now it's a downpour of new capability. An agent = a brain (model), arms and legs (tools), and persistence (harness). The headline metric: how long a model stays on task without going off the rails went from tens of minutes a year ago to hours today. "SaaS is dead" is backwards — tool value e

dev.to

Get AI & Machine Learning delivered to your inbox

Owl Post delivers a personalized ai & machine learning digest every morning, curated by AI, written in your voice.

Get your free digest
More in Technology