• AI Compass
  • Posts
  • Convergence AI Proxy 1.0 Challenges OpenAI's Operator.

Convergence AI Proxy 1.0 Challenges OpenAI's Operator.

Convergence introduced Proxy 1.0, an AI agent designed to automate web-based tasks.OpenAI launched the SWE-Lancer benchmark to evaluate AI models on real-world freelance software tasks. ByteDance introduced Phantom, advancing AI-powered video generation.

Hello, AI Enthusiasts

In Today’s AI Compass :

  • Convergence launched Proxy 1.0 : London based startup tops AI web agent benchmarks at $20/month, giving OpenAI's Operator some real competition.

  • OpenAI introduced SWE-Lancer benchmark: Can frontier LLMs earn $1 million from real-world freelance software engineering?

  • ByteDance introduces Phantom: A groundbreaking paper focusing on subject-consistent video generation.

  • Quick Industry Moves: Claude's getting web search and reasoning, Grok's going desktop, and Moonshot's tackling long-context scaling.

Read Time: 5 minutes

TOP NEWS

CONVERGENCE

source: convergence

Convergence, a London-based startup, has introduced Proxy 1.0, an AI agent designed to automate web-based tasks. This AI Agent is ten times cheaper than similar offerings OpenAI Operator and is available globally with free tier. Proxy 1.0 can perform actions like clicking, scrolling, and typing, potentially revolutionizing how tasks such as job applications and marketing campaigns are managed. Try here

Details:

  • Benchmark Performance: Leading WebVogager scores across multi-step web tasks, outperforming OpenAI's Operator

  • Core Technology: Built-in task learning system that remembers and optimizes repeatable workflows

  • Automation Capabilities: Handles everything from form-filling to database queries with minimal human input

  • Pricing & Availability: $20/month with free tier, global launch (vs. OpenAI's US-only, premium-priced Operator)

  • Coming Soon: Parallel processing support, custom workflow marketplace, document analysis features

    Proxy turns regular tasks into set-and-forget automations.

    source: convergence

    Want to browse the web? Proxy handles that like a human would

    source: convergence

OPENAI

source: OpenAI

OpenAI has launched the SWE-Lancer benchmark, designed to evaluate AI models on real-world freelance software engineering tasks from Upwork, valued at $1 million in total payouts. This new benchmark tests AI's capability across the full engineering stack, highlighting both technical and managerial tasks to assess AI's economic impact in software development.
GitHub repo for SWE-Lancer : This repo contains the dataset and code for the paper.

Details:

  • Benchmark: SWE-Lancer for evaluating AI coding performance

  • Task Source: Over 1,400 freelance tasks from Upwork

  • Total Value: Collectively worth $1 million

  • Task Range: From $50 bug fixes to $32,000 feature implementations

  • Scope: Covers both coding and managerial decision-making tasks

  • Objective: Map AI performance to real-world economic value and promote research on AI's role in software engineering

Current frontier models performance

Anthropic Claude 3.5 sonnet was able to solve more problems comapre to OpenAI
Gpt-4o and o1.

source: OpenAI


BYTEDANCE

source : x :@ai_for_success

ByteDance introduces Phantom, a groundbreaking paper focusing on subject-consistent video generation through cross-modal alignment. Phantom allows for the creation of videos where the subject's identity is preserved across frames, utilizing both single-reference and multi-reference approaches for enhanced video quality. Read More

Details:

  • Identity-Preserving Video Generation: Generates videos using a facial reference image while strictly maintaining the subject’s identity and following the given prompt.

  • Single-Reference Subject-to-Video Generation: Creates videos from a single reference image, preserving details of objects, clothing, animals, virtual characters, and more.

  • Multi-Reference Subject-to-Video Generation: Uses multiple reference images to generate realistic interactions, such as group scenes, product demonstrations, and virtual try-ons.

    Identity-Preserving Video Generation ( source : x :@ai_for_success )

    Single-Reference Subject-to-Video Generation ( source : x :@ai_for_success )


    Multi-Reference Subject-to-Video Generation ( source : x :@ai_for_success )


Free AI Training Course

  1. DeepLearning.AI launched a new free short course in AI, “Attention in Transformers: Concepts and Code in PyTorch.” learn here.

In the Spotlight

  • Anthropic Teases Major Claude Upgrade with New iOS App Features

    Code from Claude's iOS application reveals new icons labeled "Steps," "Think," and "MagnifyingGlass," hinting at upcoming enhancements in reasoning and web search capabilities. These features suggest Anthropic is preparing to expand Claude’s cognitive and research abilities, potentially making it a more powerful AI assistant.

  • Grok AI Expands Beyond X with Standalone Desktop Apps

    Grok, X’s AI assistant, is breaking out of the platform with dedicated MacOS and Windows applications. This expansion marks a significant step toward broader accessibility, allowing users to interact with Grok outside of X's ecosystem for a more seamless AI experience along with their web based chatbot grok.com 

  • Moonshot AI Unveils MoBA Architecture for Faster Long-Context Processing

    Moonshot AI has introduced MoBA architecture, delivering a 6.5x speed boost for processing million-token contexts. Already implemented at Kimi.ai, this high-performance AI framework is now open-source on GitHub, enabling developers to explore its potential for large-scale AI tasks.

That’s a Wrap

That's it for today!

Thank you for reading today’s newsletter!
If you have any feedback or suggestions on how I can improve, feel free to reply and let us know.

Rate Your Experience:

Login or Subscribe to participate in polls.

Reply

or to participate.