47 place 1

499 Study Accuses LM Arena of Helping Top AI Labs Game Its Benchmark

Slashdot
msmash @ Slashdot · 05/01/2025 09:00 EDT

Study Accuses LM Arena of Helping Top AI Labs Game Its Benchmark

An anonymous reader shares a report: A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals.

According to the authors, LM Arena allowed some industry-leading AI companies like Meta, OpenAI, Google, and Amazon to privately test several variants of AI models, then not publish the scores of the lowest performers. Th

To see detailed statistics for the news please log in »

Read the original

Add your comment
You must be logged in with Facebook to read and write comments.

A newsletter a day!

You may get 10 most important news around midday in daily newsletter. Press the button and we will send you the most important news only, no spam attached.

or register

LIKE us on Facebook so you won't miss the most important news of the day!

News from the same source
Slashdot Slashdot
The Verge
Emma Roth @ The Verge 1 place · today 14:00 EDT

Android 16 has arrived with iPhone-style Live Updates

Android 16 is officially here, and it includes Google’s take on the iPhone’s Live Activities. On Tuesday, Google announced that Live Updates are rolling out first with ride-share and food delivery apps, allowing you to track the progress of your ride or delivery with persistent, real-time notifications. Google first started working on Live Updates last […] Read more

2,077 fresh

🔮
10.06.2025 ♏︎ Today will be a fulfilling day for Scorpios, especially in terms of health and finances.... Read more ›
Business Insider
Callie Ahlgrim @ Business Insider 1 place · today 14:40 EDT

Amanda Seyfried says Paramount still owes her money for using her face on 'Mean Girls' merch

Amanda Seyfried said she feels "a little resentful" toward Paramount for selling "Mean Girls" merchandise without compensating her for her likeness. Read more

1,955 fresh

Vox
Gabriela Fernandez @ Vox 1 place · today 14:15 EDT

I’m the daughter of immigrants. The LA I know isn’t in the news.

My mom has been a housekeeper for as long as I can remember. As a child, I’d accompany her on the bus to the houses she cleaned, impressed with how it seemed like she knew just about everyone en route to their own jobs. There was always friendly acknowledgment and solidarity — especially with those in […] Read more

1,886 fresh

Engadget
Devindra Hardawar @ Engadget 1 place · today 14:19 EDT

Apple's Liquid Glass is Windows Vista done well

It's hard to look at Apple's new "Liquid Glass" aesthetic and not think about Windows Vista, Microsoft's much-maligned OS which also touted transparencies and glass-like effects as a bold new vision for computing. You can see the similarities between Apple's UI and Vista's "Windows Aero" design language everywhere, from the glassified app icons in iOS 26 and macOS Tahoe 26 which look a lot like VIsta's glossy icons, to the... Read more

1,160 fresh

Vox
Ian Millhiser @ Vox 2 place · 06/09/2025 18:45 EDT

Trump asks the Supreme Court to neutralize the Convention Against Torture

Federal law states that the United States shall not “expel, extradite, or otherwise effect the involuntary return of any person to a country in which there are substantial grounds for believing the person would be in danger of being subjected to torture.” This law implements a treaty, known as the Convention Against Torture, which the […] Read more

1,079

Tom's Hardware
Tom's Hardware 1 place · today 13:41 EDT

AMD Radeon RX 9060 XT 16GB review: plenty of performance with 16GB

AMD’s Radeon RX 9060 XT 16GB delivers solid performance at seemingly reasonable pricing, though current retail costs are about 10% higher than the official MSRP. It’s a good option with no massive shortcomings, unlike the 8GB cards that are still being foisted off on less knowledgeable consumers. Read more

997 fresh

Wired
Julian Chokkattu @ Wired 1 place · today 14:41 EDT

The Top New Features in Apple’s iOS 26 and iPadOS 26

Liquid Glass, iPad multitasking, and call screening—take a look at Apple's latest features coming to your iPhone and iPad later this year. Read more

963 fresh

Business Insider
Selima Hussain,Havovi Cooper,Mia de Graaf,Lilian Manansala,Yuele @ Business Insider 3 place · today 11:35 EDT

What ultra-processed seed oils actually mean for your nutrition

Seed oils are ultra-processed and chemically treated cooking fats. But does that make them worse for you than alternatives like beef tallow? Read more

794 fresh

Business Insider
Brent D. Griffiths @ Business Insider · today 12:22 EDT

Have you seen this car?

Donald Trump's red Tesla isn't parked at the White House. Read more

722 fresh

Wired
Joseph Cox @ Wired 2 place · today 09:00 EDT

Airlines Don’t Want You to Know They Sold Your Flight Data to DHS

A contract obtained by 404 Media shows that an airline-owned data broker forbids the feds from revealing it sold them detailed passenger data. Read more

706 fresh

Gizmodo
Ed Cara @ Gizmodo 1 place · today 11:15 EDT

RFK Jr. Purges CDC’s Vital Vaccine Advisory Committee

On Monday afternoon, the head of HHS enacted a "clean sweep" of the Advisory Committee on Immunization Practices. Read more

671 fresh

Wired
Dell Cameron @ Wired 3 place · today 12:24 EDT

The ‘Long-Term Danger’ of Trump Sending Troops to the LA Protests

President Trump’s deployment of more than 700 Marines to Los Angeles—following ICE raids and mass protests—has ignited a fierce national debate over state sovereignty and civil-military boundaries. Read more

642 fresh

MacRumors
Juli Clover @ MacRumors 1 place · today 12:35 EDT

iOS 26 Will Let You Add Your U.S. Passport to Wallet for Identity Verification

In iOS 26, Apple is implementing a new Digital ID feature that builds on integration for Driver's Licenses in the Wallet app. Starting this fall, Apple Wallet will allow iPhone users to add a U.S. passport that can be used in lieu of a physical passport for domestic travel. The Digital ID can be stored on the ‌iPhone‌ or the Apple Watch, and it can be used at select TSA... Read more

532 fresh

Business Insider
Cameron Dever @ Business Insider · today 10:34 EDT

I've photographed weddings for over a decade. Here are 4 of the biggest mistakes couples make.

I've been photographing weddings for over a decade, so I've witnessed couples make many mistakes, from overscheduling to trying to please everyone. Read more

407 fresh

The most popular news from the same source for the last week
Slashdot Slashdot
Slashdot
msmash @ Slashdot · 06/06/2025 10:40 EDT

YouTube Pulls Tech Creator's Self-Hosting Tutorial as 'Harmful Content'

YouTube pulled a popular tutorial video from tech creator Jeff Geerling this week, claiming his guide to installing LibreELEC on a Raspberry Pi 5 violated policies against "harmful content." The video, which showed viewers how to set up their own home media servers, had been live for over a year and racked up more than 500,000 views. YouTube's automated systems flagged the content for allegedly teaching people "how to get... Read more

137

Slashdot
EditorDavid @ Slashdot · 06/09/2025 03:34 EDT

'AI Is Not Intelligent': The Atlantic Criticizes 'Scam' Underlying the AI Industry

The Atlantic makes that case that "the foundation of the AI industry is a scam" and that AI "is not what its developers are selling it as: a new class of thinking — and, soon, feeling — machines." [OpenAI CEO Sam] Altman brags about ChatGPT-4.5's improved "emotional intelligence," which he says makes users feel like they're "talking to a thoughtful person." Dario Amodei, the CEO of the AI company Anthropic,... Read more

126

Slashdot
BeauHD @ Slashdot · 06/05/2025 21:00 EDT

UK Tech Job Openings Climb 21% To Pre-Pandemic Highs

UK tech job openings have surged 21% to pre-pandemic levels, driven largely by a 200% spike in demand for AI skills. London accounted for 80% of the AI-related postings. The Register reports: Accenture collected data from LinkedIn in the first and second week of February 2025, and supplemented the results with a survey of more than 4,000 respondents conducted by research firm YouGov between July and August 2024. The research... Read more

101

Slashdot
BeauHD @ Slashdot · 06/05/2025 17:00 EDT

Discord's CTO Is Just As Worried About Enshittification As You Are

An anonymous reader quotes a report from Engadget: Discord co-founder and CTO Stanislav Vishnevskiy wants you to know he thinks a lot about enshittification. With reports of an upcoming IPO and the news of his co-founder, Jason Citron, recently stepping down to hand leadership of the company over to Humam Sakhnini, a former Activision Blizzard executive, many Discord users are rightfully worried the platform is about to become, well, shit.... Read more

92

Slashdot
EditorDavid @ Slashdot · 06/09/2025 00:34 EDT

Scientists Show Reforestation Helps Cool the Planet Even More Than Thought

"Replanting forests can help cool the planet even more than some scientists once believed, especially in the tropics," according to a recent announcement from the University of California, Riverside. In a new modeling study published in Communications Earth & Environment, researchers at the University of California, Riverside, showed that restoring forests to their preindustrial extent could lower global average temperatures by 0.34 degrees Celsius. That is roughly one-quarter of the... Read more

80

Slashdot
msmash @ Slashdot · 06/06/2025 11:20 EDT

Trump AI Czar Sacks on Universal Basic Income: 'It's Not Going To Happen'

David Sacks, President Trump's AI policy advisor, has dismissed the prospect of implementing a universal basic income program, declaring "it's not going to happen" during his tenure. He said: The future of AI has become a Rorschach test where everyone sees what they want. The Left envisions a post-economic order in which people stop working and instead receive government benefits. In other words, everyone on welfare. This is their fantasy;... Read more

76

Slashdot
msmash @ Slashdot · 06/05/2025 12:05 EDT

Andrew Ng Says Vibe Coding is a Bad Name For a Very Real and Exhausting Job

An anonymous reader shares a report: Vibe coding might sound chill, but Andrew Ng thinks the name is unfortunate. The Stanford professor and former Google Brain scientist said the term misleads people into imagining engineers just "go with the vibes" when using AI tools to write code. "It's unfortunate that that's called vibe coding," Ng said at a firechat chat in May at conference LangChain Interrupt. "It's misleading a lot... Read more

69

Slashdot
EditorDavid @ Slashdot · 06/07/2025 12:34 EDT

'For Algorithms, a Little Memory Outweighs a Lot of Time'

MIT comp-sci professor Ryan Williams suspected that a small amount of memory "would be as helpful as a lot of time in all conceivable computations..." writes Quanta magazine. "In February, he finally posted his proof online, to widespread acclaim..." Every algorithm takes some time to run, and requires some space to store data while it's running. Until now, the only known algorithms for accomplishing certain tasks required an amount of... Read more

66

Slashdot
BeauHD @ Slashdot · 06/04/2025 23:30 EDT

Chinese Hacked US Telecom a Year Before Known Wireless Breaches

An anonymous reader quotes a report from Bloomberg: Corporate investigators found evidence that Chinese hackers broke into an American telecommunications company in the summer of 2023, indicating that Chinese attackers penetrated the US communications system earlier than publicly known. Investigators working for the telecommunications firm discovered last year that malware used by Chinese state-backed hacking groups was on the company's systems for seven months starting in the summer of 2023,... Read more

57

Slashdot
msmash @ Slashdot · 06/04/2025 11:43 EDT

ChatGPT Adds Enterprise Cloud Integrations For Dropbox, Box, OneDrive, Google Drive, Meeting Transcription

OpenAI is expanding ChatGPT's enterprise capabilities with new integrations that connect the chatbot directly to business cloud services and productivity tools. The Microsoft-backed startup announced connectors for Dropbox, Box, SharePoint, OneDrive and Google Drive that allow ChatGPT to search across users' organizational documents and files to answer questions, such as helping analysts build investment theses from company slide decks. The update includes meeting recording and transcription features that Read more

56

Most popular sources

  • You see 911 news out of 911.
  • Sources 61 out of 61.
Vox 29% 24
Business Insider 24% 2
MacRumors 9% 1
Wired 6% 2
The Verge 6% 1
View sources »

LIKE us on Facebook so you won't miss the most important news of the day!

10.06.2025 15:50
Last update: 15:46 EDT.
News rating updated: 22:41.

What is Times42?

Times42 brings you the most popular news from tech news portals in real-time chart.
Read about us in FAQ section.


Times42 © 2025