← Back to articles

Large Language Models Open-Source AI

Researchers boost multilingual hate speech detection by combining web-scale pre-training with LLM-generated synthetic labels across four languages.

arXiv cs.CL · April 14, 2026

Researchers boost multilingual hate speech detection by combining web-scale pre-training with LLM-generated synthetic labels across four languages.

AI Summary

•Continued pre-training on unlabeled OpenWebSearch.eu data improved BERT models by ~3% average macro-F1 across 16 benchmarks, with larger gains in low-resource languages
•Ensemble of four open-source LLMs (Mistral-7B, Llama3.1-8B, Gemma2-9B, Qwen2.5-14B) generated synthetic annotations for hate speech detection
•LightGBM meta-learner ensemble outperformed simpler strategies like mean averaging and majority voting for combining LLM predictions
•Study covers English, German, Spanish, and Vietnamese languages, demonstrating improved cross-lingual generalization for hateful content detection

Read Original Article

Related Articles

Custom LLM training platforms from AWS, NVIDIA, Microsoft, and OpenAI are positioned for significant growth through 2035, with major opportunities in domain-specific model training and secure cloud deployments.

Large Language ModelsOpen-Source AI

Custom LLM training platforms from AWS, NVIDIA, Microsoft, and OpenAI are positioned for significant growth through 2035, with major opportunities in domain-specific model training and secure cloud deployments.

Yahoo Finance AI·Apr 20, 2026

Developer shares curated guide to open-weight language models for production deployment

Large Language Models

Developer shares curated guide to open-weight language models for production deployment

Hacker News·Apr 20, 2026

New framework helps developers assess whether their codebases are prepared for AI agent automation and integration.

Large Language Models

New framework helps developers assess whether their codebases are prepared for AI agent automation and integration.

Hacker News·Apr 20, 2026

Building visibility into AI agent systems becomes critical as autonomous agents grow more complex and harder to debug

Large Language Models

Building visibility into AI agent systems becomes critical as autonomous agents grow more complex and harder to debug

Hacker News·Apr 20, 2026

Scryptian enables users to invoke local AI models instantly using keyboard shortcuts, leveraging Python and Ollama for seamless integration.

Large Language ModelsOpen-Source AI

Scryptian enables users to invoke local AI models instantly using keyboard shortcuts, leveraging Python and Ollama for seamless integration.

Hacker News·Apr 20, 2026

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free