LLM-based data generators show promise for privacy-preserving synthetic data but struggle with statistical accuracy due to inherent AI biases.
arXiv cs.LG · 2026年4月20日
AI要約
•PersonaLedger, an LLM-powered financial simulator, achieved fraud detection utility with AUC 0.70 at epsilon=1 when seeded with differentially private synthetic personas
•The system exhibited significant distribution drift caused by LLM learned priors overriding input statistics for temporal and demographic features
•While LLMs offer advantages over traditional methods for generating complex synthetic data from high-dimensional user profiles, systematic biases must be resolved before practical deployment