BOM is actually a “zero-width non-breaking space” (practically a NULL character) and it is represented as U+FEFF
Unicode Encoding | In ISO-8859-1 BOM appears as |
UTF-8 |  |
UTF-16 (big endian) | þÿ |
UTF-16 (little endian) | ÿþ |
UTF-32 (big endian) | □□þÿ (□ is the ASCII null character) |
UTF-32 (little endian) | ÿþ□□ (□ is the ASCII null character) |
Remove BOM from an XML file
Just open the file in vim text editor use the “nobomb” command# vim file.xml :set nobomb :wq
Removal from HTML Files
Even if you have set the “charset=utf-8″ meta property right -it does not mean that you will not face the BOM problem. If a BOM character is causing problems in your HTML display -the problem actually lies in the text editor and not in your HTML/CSS code.Most HTML editors, like Dreamweaver, Programmer’s Notepad, TextPad etc., do provide a way to disable BOM. The option usually comes in the place where you set the encoding of your text editor. It may appear in the form of “UTF-8 without BOM” or “UTF-8 No BOM”
Appearance of  character in your HTML code can also be solved using the above encoding change in HTML editor.
Detection and Removal of BOM in UNIX/Linux
Find the list of files containing BOM characters1 | find /var/www/website/ -type f -print -exec hd -n 3 {} \; | grep -1 "ef bb bf" | grep "some_part_of_the_path" > bom_lines.txt |
1 | while read l; do sed -i '1 s/^\xef\xbb\xbf//' $l ; done < bom_lines.txt |
No comments :
Post a Comment