182 place 3 fresh

288 The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call for enterprise AI

VentureBeat
VentureBeat · 12/10/2025 18:00 EDT

There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction following to agentic web browsing and tool use. But many of these benchmarks have one major shortcoming: they measure the AI's ability to complete specific problems and requests, not how factual the model is in its outputs — how well it generates objectively correct information tied to real-world data — especially when d

To see detailed statistics for the news please log in »

Read the original

Add your comment
You must be logged in with Facebook to read and write comments.

A newsletter a day!

You may get 10 most important news around midday in daily newsletter. Press the button and we will send you the most important news only, no spam attached.

or register

LIKE us on Facebook so you won't miss the most important news of the day!

News from the same source
VentureBeat VentureBeat
Gizmodo
Matt Novak @ Gizmodo 1 place · 12/10/2025 15:20 EDT

Marco Rubio Orders State Dept to Stop Using Calibri Font in Anti-DEI Push

“Switching to Calibri achieved nothing except the degradation of the department’s official correspondence," Rubio's memo said. Read more

1,024

🔮
11.12.2025 ♑︎ Dear Capricorn, today promises to bring you relatively bright and emotionally rich moments in the... Read more ›
Business Insider
Aditi Bharade @ Business Insider 1 place · today 00:03 EDT

David Ellison told Warner Bros. shareholders it's 'not too late' to switch teams from Netflix to Paramount

Paramount's CEO urged shareholders to tender their shares and slammed WBD for an "opaque sales process" that gave Netflix preferential treatment. Read more

861 fresh

Digital Trends
Trevor Mogg @ Digital Trends 1 place · 12/10/2025 23:20 EDT

After the fireworks: The everyday journey of a Falcon 9 space rocket

SpaceX has been landing the first stage of its workhorse Falcon 9 booster since 2015, and the sight of the vehicle coming in for an upright touchdown, engines blazing, never gets old. Most of the landings take place on a droneship waiting in the ocean, though occasionally SpaceX also lands the booster back near the ... Read more

363 fresh

Slashdot
BeauHD @ Slashdot 1 place · 12/10/2025 19:50 EDT

Operation Bluebird Wants To Relaunch 'Twitter' For a New Social Network

A startup called Operation Bluebird is petitioning the US Patent and Trademark Office to strip X Corp of the "Twitter" and "tweet" trademarks, hoping to relaunch a new Twitter with the old brand, bird logo, and "town square" vibe. "The TWITTER and TWEET brands have been eradicated from X Corp.'s products, services, and marketing, effectively abandoning the storied brand, with no intention to resume use of the mark," the petition... Read more

219 fresh

The Verge
Sean Hollister @ The Verge 1 place · 12/10/2025 18:37 EDT

Donald Trump reminds the entire world he has no idea what 6G means

When business leaders spout buzzwords like "AI," "8K" and "5G," sometimes in the same sentence, we often get a sneaking suspicion they don't know what they mean! With President Donald Trump, there's no need to wonder: he clearly has no idea. "What does [6G] do? Give you a little bit deeper view into somebody's skin?" […] Read more

212 fresh

Engadget
Ian Carlos Campbell @ Engadget 1 place · 12/10/2025 14:04 EDT

State Department: Calibri font was a DEI hire

The US Department of State is unwinding a 2023 decision to use san-serif Calibri font on all official communications and switching to Times New Roman instead, The New York Times reports. In a memo obtained by NYT titled "Return to Tradition: Times New Roman 14-Point Font Required for All Department Paper," Secretary of State Marco Rubio frames the change as a way to return professionalism to the State Department."Switching to... Read more

202

Slashdot
BeauHD @ Slashdot 2 place · 12/10/2025 17:50 EDT

Qualcomm Acquires RISC-V Chip Designer Ventana Micro Systems

Qualcomm has acquired RISC-V startup Ventana to strengthen its CPU ambitions beyond mobile, "reinforcing its commitment and leadership in the development of the RISC-V standard and ecosystem," the company said in a press release. CRN Magazine reports: The San Diego-based company said Ventana's expertise in RISC-V, a free and open alternative to the Arm and x86 instruction set architectures, will enhance its CPU engineering capabilities and complement "existing efforts to... Read more

198 fresh

Wired
Emily Mullin @ Wired 2 place · 12/10/2025 14:12 EDT

Many States Say They’ll Defy RFK Jr.’s Changes to Hepatitis B Vaccination

Most Democratic-led states will continue to recommend the hepatitis B vaccine at birth, despite a CDC advisory panel’s vote against it. Read more

181

Mashable
Mashable 1 place · 12/10/2025 22:00 EDT

NYT Connections hints today: Clues, answers for December 11, 2025

Connections is a New York Times word game that's all about finding the "common threads between words." How to solve the puzzle. Read more

172 fresh

The Verge
Emma Roth @ The Verge 2 place · 12/10/2025 18:21 EDT

Trump could introduce ‘mandatory’ social media reviews for travelers

The Trump administration could soon require tourists from dozens of nations to hand over their social media handles before entering the country. Under a proposal from US Customs and Border Protection, the agency would make social media history from the past five years a "mandatory" part of the screening process, as reported earlier by The […] Read more

159 fresh

Business Insider
Ben Bergman @ Business Insider 2 place · 12/10/2025 18:15 EDT

Google stands to make $111 billion if SpaceX goes public at a $1.5 trillion valuation

A SpaceX IPO could hand Google billions. Its early bet on the rocket company may turn out to be one of the most lucrative startup investments ever. Read more

153 fresh

The Verge
Emma Roth @ The Verge 3 place · 12/10/2025 16:31 EDT

Operation Bluebird wants to reclaim Twitter’s ‘abandoned’ trademarks for a new social network

A startup called Operation Bluebird is trying to reclaim Twitter's branding, as reported earlier by Ars Technica and Reuters. Last week, Operation Bluebird filed a petition that asks the US Patent and Trademark Office (USPTO) to cancel X Corp.'s ownership of the "Twitter" and "Tweet" trademarks, claiming they've been "abandoned" by the Elon Musk-owned company. […] Read more

135 fresh

The most popular news from the same source for the last week
VentureBeat VentureBeat
VentureBeat
VentureBeat 1 place · 12/05/2025 08:00 EDT

Three years ago this week, Chat GPT was born. It amazed the world and ignited unprecedented investment and excitement in AI. Today, ChatGPT is still a toddler, but public sentiment around the AI boom has turned sharply negative. The shift began when OpenAI released GPT-5 this summer to mixed reviews, mostly from casual users who, unsurprisingly, judged the system by its surface flaws rather than its underlying capabilities.Since then, pundits... Read more

73

VentureBeat
VentureBeat 1 place · 12/04/2025 18:00 EDT

OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and policy violations. This technique, "confessions," addresses a growing concern in enterprise AI: Models can be dishonest, overstating their confidence or covering up the shortcuts they take to arrive at an answer. For real-world applications, this technique evolves the creation of more transparent and... Read more

29

VentureBeat
VentureBeat 2 place · 12/08/2025 03:00 EDT

Presented by Design.comFor most of history, design was the last step in starting a business — something entrepreneurs invested in once the idea was proven. Today, it’s one of the first. The rise of generative AI has shifted how small businesses imagine, launch, and grow — turning what used to be a months-long creative process into something interactive, iterative, and accessible from day one.Search data tells the story. Since 2022,... Read more

28

VentureBeat
VentureBeat · 12/09/2025 11:00 EDT

There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others.AI agents excel at solving abstract math problems and passing PhD-level exams that most benchmarks are based on, but Databricks has a question for the enterprise: Can they actually handle the document-heavy work most enterprises need them to do?The answer, according to new research from the... Read more

9

VentureBeat
VentureBeat · 12/09/2025 00:00 EDT

Presented by CelonisThe State of Oklahoma discovered its blind spots the hard way. In April 2023, a legislative report revealed its agencies had spent $3 billion without proper oversight. Janet Morrow, Director of Oklahoma's Risk, Assessment and Compliance Division, set out to track thousands of monthly transactions across dozens of disconnected systems.The Sooner State became the first U.S. state to apply process intelligence (PI) technology for procurement oversight. The transformation,... Read more

6

VentureBeat
VentureBeat 2 place · 12/04/2025 09:02 EDT

Amazon Web Services on Wednesday introduced Kiro powers, a system that allows software developers to give their AI coding assistants instant, specialized expertise in specific tools and workflows — addressing what the company calls a fundamental bottleneck in how artificial intelligence agents operate today.AWS made the announcement at its annual re:Invent conference in Las Vegas. The capability marks a departure from how most AI coding tools work today. Typically, these... Read more

1

VentureBeat
VentureBeat · 12/09/2025 14:44 EDT

French AI startup Mistral has weathered a rocky period of public questioning over the last year to emerge, now here in December 2025, with new, crowd-pleasing models for enterprise and indie developers.Just days after releasing its powerful open source, general purpose Mistral 3 LLM family for edge devices and local hardware, the company returned today to debut Devstral 2.The release includes a new pair of models optimized for software engineering... Read more

2

VentureBeat
VentureBeat · 12/04/2025 04:00 EDT

For all their superhuman power, today’s AI models suffer from a surprisingly human flaw: They forget. Give an AI assistant a sprawling conversation, a multi-step reasoning task or a project spanning days, and it will eventually lose the thread. Engineers refer to this phenomenon as “context rot,” and it has quietly become one of the most significant obstacles to building AI agents that can function reliably in the real world.A... Read more

0

VentureBeat
VentureBeat 3 place · 12/04/2025 09:00 EDT

The debate over whether artificial intelligence belongs in the corporate boardroom appears to be over — at least for the people responsible for generating revenue.Seven in ten enterprise revenue leaders now trust AI to regularly inform their business decisions, according to a sweeping new study released Thursday by Gong, the revenue intelligence company. The finding marks a dramatic shift from just two years ago, when most organizations treated AI as... Read more

0

VentureBeat
VentureBeat 1 place · 12/07/2025 00:00 EDT

Remember this Quora comment (which also became a meme)?(Source: Quora)In the pre-large language model (LLM) Stack Overflow era, the challenge was discerning which code snippets to adopt and adapt effectively. Now, while generating code has become trivially easy, the more profound challenge lies in reliably identifying and integrating high-quality, enterprise-grade code into production environments.This article will examine the practical pitfalls and limitations observed when engineers use modern coding agen Read more

0

Most popular sources

  • You see 964 news out of 964.
  • Sources 61 out of 61.
Business Insider 17% 22
Ars Technica 16% 9
Gizmodo 14% 7
Wired 8% 4
The Verge 6% 0
View sources »

LIKE us on Facebook so you won't miss the most important news of the day!

11.12.2025 00:35
Last update: 00:20 EDT.
News rating updated: 07:31.

What is Times42?

Times42 brings you the most popular news from tech news portals in real-time chart.
Read about us in FAQ section.


Times42 © 2025