Eval JavaScript - Search News

Hugging Face Introduces Community Evals for Transparent Model Benchmarking

Hugging Face has launched Community Evals, a feature that enables benchmark datasets on the Hub to host their own ...

OpenClaw Integrates VirusTotal Scanning to Detect Malicious ClawHub Skills

OpenClaw integrates VirusTotal Code Insight scanning for ClawHub skills following reports of malicious plugins, prompt injection & exposed instances.

GitHub

METR/eval-analysis-public

This repository contains the analysis code and data for METR's time horizon methodology, as described in "Measuring AI Ability to Complete Long Tasks". . ├── src/horizon/ # Analysis code (installable ...

IEEE

CAST-Eval: A Domain-Specific Benchmark for Large Language Models in Civil Aviation Safety

Abstract: In this paper, we present CAST-Eval, a novel, comprehensive and domain-specific benchmark designed to assess the knowledge and reasoning capabilities of large language models (LLMs) in the ...

GitHub

Block or report XiaoMaColtAI

An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si… ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results