120 place 85

173 Self-invoking code benchmarks help you decide which LLMs to use for your programming tasks

VentureBeat
Ben Dickson @ VentureBeat · 01/10/2025 08:58 EDT

Self-invoking code benchmarks help you decide which LLMs to use for your programming tasks

LLMs are good at coding simple functions. But how good are they at calling their own functions to solve complex problems?

To see detailed statistics for the news please log in »

Read the original

Add your comment
You must be logged in with Facebook to read and write comments.

A newsletter a day!

You may get 10 most important news around midday in daily newsletter. Press the button and we will send you the most important news only, no spam attached.

or register

LIKE us on Facebook so you won't miss the most important news of the day!

News from the same source
VentureBeat VentureBeat
Silicon Valley
George Avalos @ Silicon Valley 1 place · 02/07/2106 01:28 EDT

Newark apartment complex bought for much less than prior value

An East Bay apartment complex has been bought at a price that's well below its prior value. Read more

0

🔮
18.06.2026 ♉︎ Good afternoon, Taurus! Today you can expect a rich and harmonious day, during which many... Read more ›
Silicon Valley
George Avalos @ Silicon Valley 2 place · 02/07/2106 01:28 EDT

PG&E buys San Jose building to bolster South Bay operations

A PG&E Corp. unit has bought a San Jose building in a move to bolster the utility's South Bay operations. Read more

0

BetaKit
Trevor Nichols @ BetaKit 1 place · today 11:53 EDT

Smaller businesses are struggling to breach Canada’s defence sector: report

Industry leaders say country must move faster to bring defence-adjacent firms into the fold. Read more

0 newcommer

Eurogamer.net
Sherif Saed @ Eurogamer.net 1 place · today 11:50 EDT

"One view of the future is that Roblox grows and eats gaming" - Epic Games boss Tim Sweeney delivers impassioned speech about the future of gaming

Epic Games boss Tim Sweeney took to the State of Unreal stage yesterday to deliver a rallying cry of sorts to the games industry, which he described as being "in a time of both crisis and also opportunity". And guess where the opportunity is? But we'll come back to that. Read more Read more

0 newcommer

MacRumors
Joe Rossignol @ MacRumors 1 place · today 11:50 EDT

Apple Announces Major App Store Changes on iOS in Brazil

Apple today announced that developers in Brazil will be allowed to distribute iPhone apps through alternative app marketplaces on iOS, and accept payments through third-party platforms. In other words, developers in Brazil will be able to circumvent the App Store and Apple's in-app purchase system, but there are still fees. Alternative app marketplaces will have to be authorized by Apple and will need to meet ongoing requirements. For apps that... Read more

0 newcommer

Habr
sslock @ Habr 1 place · today 11:49 EDT

Последняя экспонента

Технологическая сингулярность была одной из последних больших форм веры в прогресс, ещё пытавшейся выглядеть как теория. В её основе лежал простой образ: если технологическое развитие ускоряется, если каждая новая ступень делает следующую не просто возможной, а более быстрой, то исторический график рано или поздно перестаёт быть продолжением прежнего тренда и начинает напоминать экспоненту. А экспонента в историческом воображении всегда выглядит как особый тип обещания: долго почти ничего, потом — почти... Read more

0 newcommer

150sec
Yasmin Werner @ 150sec 1 place · today 11:46 EDT

Binance EU access in doubt as Greece MiCA license faces rejection

Binance, the world’s largest cryptocurrency exchange, faces a major European setback as its Greek Markets in Crypto-Assets (MiCA) licence application reportedly nears rejection before the deadline.  The EU’s MiCA transitional period is set to conclude on July 1, 2026. Under this new framework, any exchange operating in the Union must hold an authorized license from ... Read more

0 newcommer

CoinDesk
Margaux Nijkerk @ CoinDesk 1 place · today 11:41 EDT

Ethereum Foundation loses another key leader as co-executive director Hsiao-Wei Wang resigns

Wang's departure follows the resignation of fellow co-executive director Tomasz Stańczak and marks the latest in a string of high-profile exits at the EF. Read more

0 fresh

Digital Trends
Manisha Priyadarshini @ Digital Trends 1 place · today 11:41 EDT

Pixi wants to replace your boring text messages with AR characters that react to you

Pixi Platforms launched an iMessage app that lets users send intelligent AR characters capable of reacting to real world surroundings and facial expressions. Read more

0 fresh

Habr
grelikt @ Habr 2 place · today 11:40 EDT

Apache Camel под .NET, разбор по косточкам: HTTP-коннектор без ASP.NET MVC + паттерн Content-Based Router

Серия: redb ecosystem / redb.Route deep-diveВ redb.Route — нашем ESB в стиле Apache Camel под .NET — маршрут всегда читается одинаково: From(источник) → [процессоры] → To(приёмник). Сегодня берём один простой паттерн интеграции и один коннектор и разбираем оба до самого дна. Читать далее Read more

0 fresh

Digital Trends
Moinak Pal @ Digital Trends 2 place · today 11:39 EDT

GTA 6 may be far away, so Rockstar gave GTA 5 a fresh coat of paint

Rockstar has released a free upgrade for GTA 5 owners, bringing enhanced graphics, faster loading times, new vehicles, and next-gen features to PC players ahead of GTA 6. Read more

0 fresh

Habr
KostyaAB (RUTUBE) @ Habr 3 place · today 11:38 EDT

Как мы прокачали конверсию в PREMIER: разбор AB‑тестов

Это реальные истории. Из уважения к читателю — все данные настоящие. Из уважения к коммерческой тайне — все цифры агрегированы.Всем привет! Меня зовут Костя, я работаю в онлайн-кинотеатре PREMIER менеджером по продукту. В 2025–2026 годах мы провели серию AB‑тестов, чтобы улучшить пользовательский опыт и повысить ключевые метрики сервиса. В этой статье разберу четыре показательных эксперимента. Без воды только то, что сработало. Поехали! За кулисами A/B‑тестов Premier Read more

0 fresh

Business Insider
Daniel T. Allen,Jessica Orwig @ Business Insider 1 place · today 11:34 EDT

I quit my customer service job to make AI videos full-time on YouTube. People don't realize how expensive they are to produce.

This girl isn't real. She's an AI-generated character named Chloe and created by Jonathan Laramy, who runs the YouTube channel "Chloe VS History." Read more

0 fresh

The Verge
Janko Roettgers @ The Verge 1 place · today 11:30 EDT

No more lightbulbs, much more sports: Five predictions for Roku’s future

This is Lowpass by Janko Roettgers, a newsletter on the ever-evolving intersection of tech and entertainment, syndicated just for The Verge subscribers once a week. When Fox announced its acquisition of Roku earlier this week, executives of both companies were quick to promise that not much would change in the near future. Sure, getting its […] Read more

0 fresh

Business Insider
Vanessa Gordon @ Business Insider 2 place · today 11:25 EDT

I've lived in the Hamptons for most of my life. Here are 5 mistakes I see visitors make every summer.

As a local who's lived in the Hamptons for most of my life, I've seen visitors make many of the same mistakes year after year. Read more

0 fresh

Startups News
Nickie Louise @ Startups News 1 place · today 11:23 EDT

Playnance’s $GCOIN Token Lists on KoinBX as India Community Surpasses 130 Partners

Playnance announced Thursday that its native token, $GCOIN, has been listed on KoinBX, giving users of the company’s blockchain-powered Web3 gaming ecosystem a new venue to trade the token. The listing comes as Playnance gains traction in India, where more ... Read more

0 fresh

Digital Trends
Sudhanshu Kumar Mangalam @ Digital Trends 3 place · today 11:18 EDT

GTA 6 pre-orders open June 25 as Rockstar ramps up its launch campaign

Rockstar Games has confirmed that GTA 6 pre-orders will open on June 25, giving fans another reason to believe the highly anticipated game remains on track for its November 2026 release. Read more

0 fresh

The most popular news from the same source for the last week
VentureBeat VentureBeat
VentureBeat
VentureBeat · 06/11/2026 13:23 EDT

Context windows are becoming a computational bottleneck. The longer an agent runs, the more tokens accumulate from retrieved documents, reasoning traces and conversation history, and the more memory and compute that growing context demands. Most existing solutions either degrade model accuracy, require the full context to load before compression begins, or produce memory savings that don't translate into real speedups in standard serving infrastructure.A research team from NYU, Columbia, Princeton,... Read more

0

VentureBeat
VentureBeat · 06/11/2026 19:14 EDT

Xiaomi's MiMo AI team has open-sourced MiMo Code V0.1.0, a terminal-native AI coding assistant that the Chinese electronics giant says outperforms Anthropic's Claude Code on key agentic coding benchmarks, especially on long-horizon, multi-step tasks (200+ steps) — at least, according to its own internal beta release and survey of 576 developers. It's also bundling limited-time free access to MiMo-V2.5, its multimodal flagship model with a million-token context window, requiring no... Read more

0

VentureBeat
VentureBeat · 06/11/2026 19:37 EDT

Agent skills have become an important part of real-world AI applications, providing a mechanism — a set of instructions saved in a folder of text-based markdown (.md) files, usually — for models to adapt to specific enterprise use cases and complex workflows. However, optimizing these skills is a slow process and faulty process, as they cannot be trained in the same way as the parameters of the underlying AI model.... Read more

0

VentureBeat
VentureBeat · 06/12/2026 11:39 EDT

Most enterprise RAG pipelines start the same way: a text parser converts web pages and documents into plain text so they can be chunked and indexed for retrieval. That conversion step destroys retrieval signals — and according to new research, it's responsible for the majority of wrong answers.A research team from UC Berkeley, Princeton University, EPFL and Databricks published a paper this week introducing PixelRAG, a system that skips that... Read more

0

VentureBeat
VentureBeat · 06/12/2026 12:46 EDT

The creators of the hit, enterprise-friendly, open source OpenClaw variant NanoClaw are partnering with software supply chain management leader JFrog have to launch a new, joint security integration they say will protect NanoClaw autonomous agents from malicious code injection. "These agents are doing things that you cannot necessarily control, and you cannot necessarily train," said Gal Marder, Chief Strategy Officer at JFrog, in an exclusive interview with VentureBeat.Available immediately, the... Read more

0

VentureBeat
VentureBeat 3 place · 06/12/2026 17:27 EDT

Large language models continue to struggle with hallucinations, presenting a major roadblock for real-world enterprise applications. Reducing these errors is a messy business, forcing model developers to navigate a strict tradeoff where eliminating factual errors often suppresses valid answers.In a new paper, Google researchers introduce the concept of "faithful uncertainty," a metacognitive technique that aligns a model's response with its internal confidence. This alignment allows the model to offer appro Read more

0

VentureBeat
VentureBeat 2 place · 06/12/2026 17:55 EDT

Moonshot AI released Kimi K2.7-Code this week, an open-source update to its K2 coding model family, claiming leaner reasoning and double-digit performance gains.K2.7-Code is built on the same trillion-parameter mixture-of-experts architecture as its predecessor K2.6, and drops in via an OpenAI-compatible API — which matters for teams already running K2.6 in production gateways.When K2.6 launched in April, it topped OpenRouter's weekly LLM leaderboard — a ranking based on actual API... Read more

0

VentureBeat
VentureBeat 1 place · 06/13/2026 08:24 EDT

The US government last night issued an unprecedented export control directive ordering Anthropic to immediately suspend all access to its top-tier Claude Fable 5 and Claude Mythos 5 models for foreign nationals, citing unspecified national security authorities. In response, Anthropic has blocked all public access to both models, globally — meaning no users around the world can access them at this time, even paying enterprise customers and Anthropic employees internally.... Read more

0

VentureBeat
VentureBeat 1 place · 06/14/2026 00:00 EDT

The history of distributed computing is one of protocol proliferation followed by consolidation. Common Object Request Broker Architecture (CORBA), Distributed Component Object Model (DCOM), Java remote method invocation (RMI), and early simple object access protocol (SOAP) competed for the enterprise integration market in the late 1990s before representational state transfer (REST) quietly won by being simpler and HTTP-native. Extensible Messaging and Presence Protocol (XMPP), Internet Relay Chat (IRC), an Read more

0

VentureBeat
VentureBeat · 06/15/2026 03:00 EDT

Presented by SplunkAI has changed the economics of cyber deception.An attacker can now generate thousands of convincing phishing lures, fake identities, and tailored pretexts before a defender finishes a single change-control cycle. That is the new security challenge: deception got faster and cheaper, while verification did not.Much of the discussion around AI for defense centers on detection models. Detection matters, but it is not the only bottleneck. The deeper constraint... Read more

0

Most popular sources

  • You see 896 news out of 896.
  • Sources 61 out of 61.
Mashable 0%
AlleyWatch 0%
Droid Life 0%
ScienceDaily 0%
Tech Wire Asia 0%
View sources »

LIKE us on Facebook so you won't miss the most important news of the day!

18.06.2026 12:04
Last update: 11:55 EDT.
News rating updated: 18:50.

What is Times42?

Times42 brings you the most popular news from tech news portals in real-time chart.
Read about us in FAQ section.


Times42 © 2026