一站式 Web3 探索中心 | 去中心化應用商店 & Web3 線下活動 | OKX

熱門話題

CodecFlow

AI Operators 和 Robotics on @Solana 的執行層 CA：69LjZUUzxj3Cb3Fxeo1X4QpYEQTboApkhXTysPpbpump

VLAs 仍然非常新，很多人发现很难理解 VLAs 和 LLMs 之间的区别。這裡深入探討這些 AI 系統在推理、感知和行動方面的不同。第一部分。讓我們分解關鍵區別，以及圍繞 LLM 包裝的 AI 代理與使用 VLA 模型的操作代理之間的不同： 1. 感知：它們如何感知世界代理（LLM）：處理文本或結構化數據，例如 JSON、API，有時還包括圖像。就像一個大腦處理乾淨、抽象的輸入。想像一下閱讀手冊或解析電子表格。適合結構化環境，但受限於輸入的數據。操作員（VLA）：從攝像頭獲取原始實時像素，以及傳感器數據（例如觸摸、位置）和本體感知（對運動的自我意識）。就像用眼睛和感官在世界中導航，適應動態、混亂的環境，如用戶界面或物理空間。 2. 行動：它們如何互動代理：通過調用函數、工具或 API 來行動。想像它像一個經理發送精確的指令，比如“通過 Expedia API 預訂航班。”這是有意的，但依賴於預構建的工具和清晰的接口。操作員：執行連續的低級動作，如移動鼠標光標、打字或控制機器人關節。就像一個熟練的工人直接操控環境，適合需要實時精確的任務。 3. 控制：它們如何做出決策代理：遵循一個緩慢的反思循環：計劃、調用工具、評估結果、重複。它是受限於令牌（受限於文本處理）和網絡（等待 API 響應）。這使得它在實時任務中顯得方法論但緩慢。操作員：在緊密的反饋循環中進行逐步決策。想像一下玩家對螢幕上的內容做出即時反應。這種速度使得流暢的互動成為可能，但需要強大的實時處理能力。 4. 學習數據：什麼推動它們的訓練代理：在大量文本語料庫、指令、文檔或 RAG（檢索增強生成）數據集上進行訓練。它從書籍、代碼或常見問題中學習，擅長對結構化知識進行推理。操作員：從演示（例如人類執行任務的視頻）、遠程操作日誌或獎勵信號中學習。就像通過觀察和實踐學習，適合那些明確指令稀缺的任務。 5. 失敗模式：它們的弱點代理：容易出現幻覺（編造答案）或脆弱的長遠計劃，如果一步失敗就會崩潰。就像一個過度思考或誤讀情況的戰略家。操作員：面臨協變量偏移（當訓練數據與真實世界條件不匹配）或控制中的累積錯誤（小錯誤積累）。就像一個司機在不熟悉的道路上失去控制。 6. 基礎設施：它們背後的技術代理：依賴於提示/路由器來決定調用哪些工具，工具註冊表用於可用功能，以及記憶/RAG 用於上下文。這是一個模組化的設置，就像一個指揮中心協調任務。操作員：需要視頻攝取管道、實時控制的動作伺服器、安全保護以防止有害行為，以及重放緩衝區來存儲經驗。這是一個為動態環境構建的高性能系統。 7. 各自的優勢：它們的甜蜜點代理：在具有乾淨 API 的工作流程中占主導地位（例如，自動化業務流程）、對文檔進行推理（例如，總結報告）或代碼生成。它是結構化、高級任務的首選。操作員：在混亂、沒有 API 的環境中表現出色，如導航笨重的用戶界面、控制機器人或處理遊戲般的任務。如果涉及與不可預測系統的實時互動，VLA 是王者。 8. 心智模型：規劃者 + 執行者將 LLM 代理視為規劃者：它將複雜任務分解為清晰、邏輯的目標。 VLA 操作員是執行者，通過直接與像素或物理系統互動來執行這些目標。一個檢查者（另一個系統或代理）監控結果以確保成功。 $CODEC

Codecflow Optr 提供了一種統一的方法來構建能夠在數字和物理環境中觀察、推理和行動的智能體。無論是自動化桌面工作流程、控制機器手臂，還是在模擬中進行測試，它都使用相同的思維模型和基本元素。

Dips in a bull market are meant to be bought, especially on projects with big catalysts We all know that AI is the narrative of this cycle, started by ai16z and Virtuals last year. My bet is that the market will focus on more complex and sophisticated technologies such as VLAs, and let me tell you why. LLMs (Large Language Models) mainly read and write text: they’re great at explaining, planning, and generating instructions, but they don’t by themselves control motors or interact with the physical world (as you may have experienced with chatgpt). VLAs (Vision Language Action models) differ from LLMs as they are multimodal systems that look at things (vision), understand instructions (language), and directly produce actions. It's like telling a robot to pick up a red cup and then moving its arm to do it. VLAs are trained on examples that pair images/video + instructions + real action traces (how a robot actually moved), and they must run fast and safely in real time. LLMs on their side are trained on huge text collections and focus on reasoning and language tasks. TL;DR LLMs think and speak whil VLAs see, reason, and act. As you can see, VLAs are a major addition to LLMs and will notably enable the next 0 to 1 innovation in the overall economy that will be robotics. A majority of investment funds are allocating a large part of their investments into this sector, seen as the next logical evolution in the AI industry. I already made a post a while ago on the current leader in the crypto market, @codecopenflow, which did not raise capital (fair launch) yet is shipping cutting-edge products and currently sitting at $23M FDV. For information, other crypto competitors raised $20m ( @openmind_agi) at what is probably a $200M to $300M ++ FDV while no product or community has been built and shipped yet. What makes Codec a leading project in the sector is that they tackle a crucial bottleneck in robotics and AI, which is the difficulty to have all the AI tools interact together. Let me explain. Their latest release, OPTR (operator), is a toolkit that helps build operators capable of interacting on multiple platforms such as robots, desktops, browsers, or simulations. The objective of an operator is to see, reason, and act (VLA) in both digital (computers) and physical (robots) worlds. This toolkit serves as core infrastructure for robotic teams aiming to test their product and enhance the overall process by providing a unified experience instead of separate ones for web browsers, simulations, or robots. This essentially makes the operator adaptive and autonomous regardless of its environment. So you get it, it will save big time for companies and developers who previously had to go through each step manually and where you can save time you can save money. It will also enable Codec to build their own operator projects and launch new capacities relatively fast onto the market, notably through their marketplace. TL;DR: You probably have seen videos of robots folding tissues, sorting boxes, or jumping on various elements. They have all been trained for this very specific use case, and unfortunately, one skill cannot be re-used in another environment like a human could do. OPTR from Codec solves this by making skills transferrable among environments and situations, making training and development a lot faster and cheaper for enterprises. This is why Codec is so interesting in unifying the digital world with the physical world. $CODEC, Coded.