120 place 85

173 Self-invoking code benchmarks help you decide which LLMs to use for your programming tasks

Ben Dickson @ VentureBeat · 01/10/2025 08:58 EDT

LLMs are good at coding simple functions. But how good are they at calling their own functions to solve complex problems?

Share (58) Tweet

To see detailed statistics for the news please log in »

Read the original

Add your comment

You must be logged in with Facebook to read and write comments.

A newsletter a day!

You may get 10 most important news around midday in daily newsletter. Press the button and we will send you the most important news only, no spam attached.

or register

Tech News

LIKE us on Facebook so you won't miss the most important news of the day!

News from the same source
VentureBeat

1

Newark apartment complex bought for much less than prior value

George Avalos @ Silicon Valley 1 place · 02/07/2106 01:28 EDT

An East Bay apartment complex has been bought at a price that's well below its prior value. Read more ›

Share (0) Tweet

🔮

Your personal horoscope »

18.06.2026 ♉︎ Good afternoon, Taurus! Today you can expect a rich and harmonious day, during which many... Read more ›

2

PG&E buys San Jose building to bolster South Bay operations

George Avalos @ Silicon Valley 2 place · 02/07/2106 01:28 EDT

A PG&E Corp. unit has bought a San Jose building in a move to bolster the utility's South Bay operations. Read more ›

Share (0) Tweet

3

Smaller businesses are struggling to breach Canada’s defence sector: report

Trevor Nichols @ BetaKit 1 place · today 11:53 EDT

Industry leaders say country must move faster to bring defence-adjacent firms into the fold. Read more ›

Share (0) Tweet

0 newcommer

4

Anbernic’s next Android handheld has strong Nintendo Switch Lite vibes

Ryan McNeal @ Android Authority 1 place · today 11:53 EDT

Anbernic shares first official look at the RG 55G1. Read more ›

Share (0) Tweet

0 newcommer

5

"One view of the future is that Roblox grows and eats gaming" - Epic Games boss Tim Sweeney delivers impassioned speech about the future of gaming

Sherif Saed @ Eurogamer.net 1 place · today 11:50 EDT

Epic Games boss Tim Sweeney took to the State of Unreal stage yesterday to deliver a rallying cry of sorts to the games industry, which he described as being "in a time of both crisis and also opportunity". And guess where the opportunity is? But we'll come back to that. Read more Read more ›

Share (0) Tweet

0 newcommer

6

Apple Announces Major App Store Changes on iOS in Brazil

Joe Rossignol @ MacRumors 1 place · today 11:50 EDT

Apple today announced that developers in Brazil will be allowed to distribute iPhone apps through alternative app marketplaces on iOS, and accept payments through third-party platforms. In other words, developers in Brazil will be able to circumvent the App Store and Apple's in-app purchase system, but there are still fees. Alternative app marketplaces will have to be authorized by Apple and will need to meet ongoing requirements. For apps that... Read more ›

Share (0) Tweet

0 newcommer

7

Последняя экспонента

sslock @ Habr 1 place · today 11:49 EDT

Технологическая сингулярность была одной из последних больших форм веры в прогресс, ещё пытавшейся выглядеть как теория. В её основе лежал простой образ: если технологическое развитие ускоряется, если каждая новая ступень делает следующую не просто возможной, а более быстрой, то исторический график рано или поздно перестаёт быть продолжением прежнего тренда и начинает напоминать экспоненту. А экспонента в историческом воображении всегда выглядит как особый тип обещания: долго почти ничего, потом — почти... Read more ›

Share (0) Tweet

0 newcommer

8

Binance EU access in doubt as Greece MiCA license faces rejection

Yasmin Werner @ 150sec 1 place · today 11:46 EDT

Binance, the world’s largest cryptocurrency exchange, faces a major European setback as its Greek Markets in Crypto-Assets (MiCA) licence application reportedly nears rejection before the deadline. The EU’s MiCA transitional period is set to conclude on July 1, 2026. Under this new framework, any exchange operating in the Union must hold an authorized license from ... Read more ›

Share (0) Tweet

0 newcommer

9

Ethereum Foundation loses another key leader as co-executive director Hsiao-Wei Wang resigns

Margaux Nijkerk @ CoinDesk 1 place · today 11:41 EDT

Wang's departure follows the resignation of fellow co-executive director Tomasz Stańczak and marks the latest in a string of high-profile exits at the EF. Read more ›

Share (0) Tweet

0 fresh

10

Pixi wants to replace your boring text messages with AR characters that react to you

Manisha Priyadarshini @ Digital Trends 1 place · today 11:41 EDT

Pixi Platforms launched an iMessage app that lets users send intelligent AR characters capable of reacting to real world surroundings and facial expressions. Read more ›

Share (0) Tweet

0 fresh

11

Apache Camel под .NET, разбор по косточкам: HTTP-коннектор без ASP.NET MVC + паттерн Content-Based Router

grelikt @ Habr 2 place · today 11:40 EDT

Серия: redb ecosystem / redb.Route deep-diveВ redb.Route — нашем ESB в стиле Apache Camel под .NET — маршрут всегда читается одинаково: From(источник) → [процессоры] → To(приёмник). Сегодня берём один простой паттерн интеграции и один коннектор и разбираем оба до самого дна. Читать далее Read more ›

Share (0) Tweet

0 fresh

12

GTA 6 may be far away, so Rockstar gave GTA 5 a fresh coat of paint

Moinak Pal @ Digital Trends 2 place · today 11:39 EDT

Rockstar has released a free upgrade for GTA 5 owners, bringing enhanced graphics, faster loading times, new vehicles, and next-gen features to PC players ahead of GTA 6. Read more ›

Share (0) Tweet

0 fresh

13

Как мы прокачали конверсию в PREMIER: разбор AB‑тестов

KostyaAB (RUTUBE) @ Habr 3 place · today 11:38 EDT

Это реальные истории. Из уважения к читателю — все данные настоящие. Из уважения к коммерческой тайне — все цифры агрегированы.Всем привет! Меня зовут Костя, я работаю в онлайн-кинотеатре PREMIER менеджером по продукту. В 2025–2026 годах мы провели серию AB‑тестов, чтобы улучшить пользовательский опыт и повысить ключевые метрики сервиса. В этой статье разберу четыре показательных эксперимента. Без воды только то, что сработало. Поехали! За кулисами A/B‑тестов Premier Read more ›

Share (0) Tweet

0 fresh

14

I quit my customer service job to make AI videos full-time on YouTube. People don't realize how expensive they are to produce.

Daniel T. Allen,Jessica Orwig @ Business Insider 1 place · today 11:34 EDT

This girl isn't real. She's an AI-generated character named Chloe and created by Jonathan Laramy, who runs the YouTube channel "Chloe VS History." Read more ›

Share (0) Tweet

0 fresh

15

No more lightbulbs, much more sports: Five predictions for Roku’s future

Janko Roettgers @ The Verge 1 place · today 11:30 EDT

This is Lowpass by Janko Roettgers, a newsletter on the ever-evolving intersection of tech and entertainment, syndicated just for The Verge subscribers once a week. When Fox announced its acquisition of Roku earlier this week, executives of both companies were quick to promise that not much would change in the near future. Sure, getting its […] Read more ›

Share (0) Tweet

0 fresh

16

I've lived in the Hamptons for most of my life. Here are 5 mistakes I see visitors make every summer.

Vanessa Gordon @ Business Insider 2 place · today 11:25 EDT

As a local who's lived in the Hamptons for most of my life, I've seen visitors make many of the same mistakes year after year. Read more ›

Share (0) Tweet

0 fresh

17

Playnance’s $GCOIN Token Lists on KoinBX as India Community Surpasses 130 Partners

Nickie Louise @ Startups News 1 place · today 11:23 EDT

Playnance announced Thursday that its native token, $GCOIN, has been listed on KoinBX, giving users of the company’s blockchain-powered Web3 gaming ecosystem a new venue to trade the token. The listing comes as Playnance gains traction in India, where more ... Read more ›

Share (0) Tweet

0 fresh

18

Leaked iPhone Ultra renders suggest Apple has been closely watching Samsung

Shimul Sood @ Android Authority 2 place · today 11:21 EDT

The race for thinner phones is getting delightfully ridiculous. Read more ›

Share (0) Tweet

0 fresh

19

GTA 6 pre-orders open June 25 as Rockstar ramps up its launch campaign

Sudhanshu Kumar Mangalam @ Digital Trends 3 place · today 11:18 EDT

Rockstar Games has confirmed that GTA 6 pre-orders will open on June 25, giving fans another reason to believe the highly anticipated game remains on track for its November 2026 release. Read more ›

Share (0) Tweet

0 fresh

20

Best USB Chargers 2026: Our tested phone and laptop charger picks, from compact GaN to budget charging bliss

Tom's Hardware 1 place · today 11:15 EDT

We tested 20 laptop and phone chargers, ranging from cheap no-name 15W options to 140W beasts. Find out what stood out as the best. Read more ›

Share (0) Tweet

0 fresh

The most popular news from the same source for the last week
VentureBeat

1

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

VentureBeat · 06/11/2026 13:23 EDT

Context windows are becoming a computational bottleneck. The longer an agent runs, the more tokens accumulate from retrieved documents, reasoning traces and conversation history, and the more memory and compute that growing context demands. Most existing solutions either degrade model accuracy, require the full context to load before compression begins, or produce memory savings that don't translate into real speedups in standard serving infrastructure.A research team from NYU, Columbia, Princeton,... Read more ›

Share (0) Tweet

2

Xiaomi's new open source, agentic AI coding harness MiMo Code beats Claude Code at ultra-long, 200+ step tasks

VentureBeat · 06/11/2026 19:14 EDT

Xiaomi's MiMo AI team has open-sourced MiMo Code V0.1.0, a terminal-native AI coding assistant that the Chinese electronics giant says outperforms Anthropic's Claude Code on key agentic coding benchmarks, especially on long-horizon, multi-step tasks (200+ steps) — at least, according to its own internal beta release and survey of 576 developers. It's also bundling limited-time free access to MiMo-V2.5, its multimodal flagship model with a million-token context window, requiring no... Read more ›

Share (0) Tweet

3

Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights

VentureBeat · 06/11/2026 19:37 EDT

Agent skills have become an important part of real-world AI applications, providing a mechanism — a set of instructions saved in a folder of text-based markdown (.md) files, usually — for models to adapt to specific enterprise use cases and complex workflows. However, optimizing these skills is a slow process and faulty process, as they cannot be trained in the same way as the parameters of the underlying AI model.... Read more ›

Share (0) Tweet

4

PixelRAG beats text parsers on accuracy and cuts AI agent token costs 10x

VentureBeat · 06/12/2026 11:39 EDT

Most enterprise RAG pipelines start the same way: a text parser converts web pages and documents into plain text so they can be chunked and indexed for retrieval. That conversion step destroys retrieval signals — and according to new research, it's responsible for the majority of wrong answers.A research team from UC Berkeley, Princeton University, EPFL and Databricks published a paper this week introducing PixelRAG, a system that skips that... Read more ›

Share (0) Tweet

5

NanoClaw and JFrog launch 'immune system' to block AI agents from downloading malicious code

VentureBeat · 06/12/2026 12:46 EDT

The creators of the hit, enterprise-friendly, open source OpenClaw variant NanoClaw are partnering with software supply chain management leader JFrog have to launch a new, joint security integration they say will protect NanoClaw autonomous agents from malicious code injection. "These agents are doing things that you cannot necessarily control, and you cannot necessarily train," said Gal Marder, Chief Strategy Officer at JFrog, in an exclusive interview with VentureBeat.Available immediately, the... Read more ›

Share (0) Tweet

6

Google researchers introduce 'faithful uncertainty', allowing LLMs to offer best guesses instead of hallucinations

VentureBeat 3 place · 06/12/2026 17:27 EDT

Large language models continue to struggle with hallucinations, presenting a major roadblock for real-world enterprise applications. Reducing these errors is a messy business, forcing model developers to navigate a strict tradeoff where eliminating factual errors often suppresses valid answers.In a new paper, Google researchers introduce the concept of "faithful uncertainty," a metacognitive technique that aligns a model's response with its internal confidence. This alignment allows the model to offer appro Read more ›

Share (0) Tweet

7

Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

VentureBeat 2 place · 06/12/2026 17:55 EDT

Moonshot AI released Kimi K2.7-Code this week, an open-source update to its K2 coding model family, claiming leaner reasoning and double-digit performance gains.K2.7-Code is built on the same trillion-parameter mixture-of-experts architecture as its predecessor K2.6, and drops in via an OpenAI-compatible API — which matters for teams already running K2.6 in production gateways.When K2.6 launched in April, it topped OpenRouter's weekly LLM leaderboard — a ranking based on actual API... Read more ›

Share (0) Tweet

8

Anthropic blocks all public access to Claude Fable 5, Mythos 5 following US government order — what enterprises should do

VentureBeat 1 place · 06/13/2026 08:24 EDT

The US government last night issued an unprecedented export control directive ordering Anthropic to immediately suspend all access to its top-tier Claude Fable 5 and Claude Mythos 5 models for foreign nationals, citing unspecified national security authorities. In response, Anthropic has blocked all public access to both models, globally — meaning no users around the world can access them at this time, even paying enterprise customers and Anthropic employees internally.... Read more ›

Share (0) Tweet

9

MCP solved tool calling. A2A solved coordination. What solves transport?

VentureBeat 1 place · 06/14/2026 00:00 EDT

The history of distributed computing is one of protocol proliferation followed by consolidation. Common Object Request Broker Architecture (CORBA), Distributed Component Object Model (DCOM), Java remote method invocation (RMI), and early simple object access protocol (SOAP) competed for the enterprise integration market in the late 1990s before representational state transfer (REST) quietly won by being simpler and HTTP-native. Extensible Messaging and Presence Protocol (XMPP), Internet Relay Chat (IRC), an Read more ›

Share (0) Tweet

10

Attackers scale deception with AI. Defenders need truth at machine speed.

VentureBeat · 06/15/2026 03:00 EDT

Presented by SplunkAI has changed the economics of cyber deception.An attacker can now generate thousands of convincing phishing lures, fake identities, and tailored pretexts before a defender finishes a single change-control cycle. That is the new security challenge: deception got faster and cheaper, while verification did not.Much of the discussion around AI for defense centers on detection models. Detection matters, but it is not the only bottleneck. The deeper constraint... Read more ›

Share (0) Tweet

Mashable	0%
AlleyWatch	0%
Droid Life	0%
ScienceDaily	0%
Tech Wire Asia	0%
View sources »

Don't miss the most important news!

173 Self-invoking code benchmarks help you decide which LLMs to use for your programming tasks

Add your comment

News from the same source VentureBeat

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

The most popular news from the same source for the last week VentureBeat

1

2

3

4

5

6

7

8

9

10

News from the same source
VentureBeat

The most popular news from the same source for the last week
VentureBeat