Wildguard - Search

About 7,300 results

Open links in new tab

Any time

arxiv.org
https://arxiv.org › abs
WildGuard: Open One-Stop Moderation Tools for Safety Risks, …
Jun 26, 2024 · We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate.
github.com
https://github.com › allenai › wildguard
WildGuard: Open One-stop Moderation Tools for Safety Risks ... - GitHub
WildGuard is a safety classification model for user-model chat exchanges. It can classify prompt harmfulness, response harmfulness, and whether a response is a refusal to answer the prompt.
huggingface.co
https://huggingface.co › allenai › wildguard
allenai/wildguard · Hugging Face
The high-level risk categories covered by WildGuard are: 1) Privacy, 2) Misinformation, 3) Harmful language, and 4) Malicious uses. Each category has a set of subcategories, consisting of total 13 subcategories.
wikipedia.org
https://en.m.wikipedia.org › wiki › WildGuard
WildGuard - Wikipedia
WildGuard is a comic book series about a "made-for-TV" superhero team. The series was created, written and drawn by Todd Nauck and published by Image Comics. Nauck initially published the characters in 1993, [citation needed] in ashcan comics he published while sending out submissions. It was published again in 2003.
huggingface.co
https://huggingface.co › datasets › allenai › wildguardmix
allenai/wildguardmix · Datasets at Hugging Face
WildGuardMix consists of two splits, WildGuardTrain and WildGuardTest. Here's the breakdown of WildGuardMix: Data Size: corresponds of 86,759 examples, of which 48,783 are prompt-only and 37,976 contain a prompt and response. Data types: synthetic data (87%), in-the-wild user-LLLM interactions (11%), and existing annotator-written data (2%).
arxiv.org
https://arxiv.org › pdf
[PDF]
W G : Open One-stop Moderation Tools for Safety Risks, …
Our comprehensive evaluations on WILDGUARDTEST and ten existing public benchmarks show that WILDGUARD outperforms the strongest existing open-source baselines (e.g., Llama-Guard2, Aegis-Guard, etc) on F1 scores across all three tasks (by up to 26.4% on refusal detection), matches GPT-4 across tasks, and surpasses GPT-4 by up to 3.9% on ...
pypi.org
https://pypi.org › project › wildguard
wildguard · PyPI
Jul 10, 2024 · WildGuard is a safety classification model for user-model chat exchanges. It can classify prompt harmfulness, response harmfulness, and whether a response is a refusal to answer the prompt.
openreview.net
https://openreview.net › forum
WildGuard: Open One-stop Moderation Tools for Safety Risks...
Sep 26, 2024 · We introduce WildGuard---an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate.
promptlayer.com
https://www.promptlayer.com › research-papers › wildguard-open-one...
WildGuard: Open One-Stop Moderation Tools for Safety Risks, …
Jun 26, 2024 · WildGuard isn't just another safety tool; it's a comprehensive system that identifies malicious user prompts (like those sneaky "jailbreak" attempts to bypass safety protocols), flags risky model responses, and measures how often a model refuses inappropriate requests.
marktechpost.com
https://www.marktechpost.com › wildguard-a-light-weight...
WildGuard: A Light-weight, Multi-Purpose Moderation Tool for …
Jul 2, 2024 · A team of researchers from the Allen Institute for AI, the University of Washington, and Seoul National University propose WILDGUARD, a novel, lightweight moderation tool designed to address the limitations of existing methods.

Pagination
- 1
- 2
- 3
- 4
- Next

WildGuard: Open One-Stop Moderation Tools for Safety Risks, …

WildGuard: Open One-stop Moderation Tools for Safety Risks ... - GitHub

allenai/wildguard · Hugging Face

WildGuard - Wikipedia

allenai/wildguardmix · Datasets at Hugging Face

W G : Open One-stop Moderation Tools for Safety Risks, …

wildguard · PyPI

WildGuard: Open One-stop Moderation Tools for Safety Risks...

WildGuard: Open One-Stop Moderation Tools for Safety Risks, …

WildGuard: A Light-weight, Multi-Purpose Moderation Tool for …