Detect | Encoding Php
$string = "Café"; $encoding = mb_detect_encoding($string); echo $encoding; // UTF-8 (usually) By default, it looks for . You can pass a custom list of encodings:
PHP gives us tools to handle this, but they aren't magic. Let’s look at how to reliably detect encoding—and when you shouldn't rely on detection at all. PHP’s Multibyte String extension (mbstring) provides mb_detect_encoding() . It scans a string and tries to guess the character set. detect encoding php
// Double-check UTF-8 validity if ($detected === 'UTF-8' && !mb_check_encoding($string, 'UTF-8')) return 'Windows-1252'; // common fallback What About Files
// Wrong approach for text encoding: $finfo = finfo_open(FILEINFO_MIME_ENCODING); echo finfo_file($finfo, 'file.txt'); // "us-ascii" or "utf-8" (unreliable) // Better: read content and detect $content = file_get_contents('file.txt'); echo mb_detect_encoding($content); Consider nelexa/encoding or symfony/polyfill-intl-normalizer
There’s also a pure-PHP option: combined with mb_* functions gives you a U::toUtf8() method that attempts detection + conversion. What About Files? finfo vs mb_detect_encoding Don't confuse file encoding (how bytes are structured) with MIME content type .
For serious work, mb_detect_encoding has limitations. Consider nelexa/encoding or symfony/polyfill-intl-normalizer , but the gold standard is Mozilla’s universalchardet (ported to PHP as jaybizzle/crawler-detect or similar, or use the mbstring strict mode).