How to remove the BOM character from HTML Files?

No comments

Byte Order Mark (or BOM) tells the computer how the bytes are ordered in a Unicode document. Because Unicode can be used in 8, 16 and 32 bits –it is important for the computer to understand which encoding has been used. BOM tells exactly the same to the computer.

BOM is actually a “zero-width non-breaking space” (practically a NULL character) and it is represented as U+FEFF
Unicode
Encoding
In ISO-8859-1
BOM appears as
UTF-8 
UTF-16

(big endian)
þÿ
UTF-16

(little endian)
ÿþ
UTF-32

(big endian)
□□þÿ (□ is the ASCII null character)
UTF-32

(little endian)
ÿþ□□ (□ is the ASCII null character)
In HTML code the BOM character can also appear as 

Remove BOM from an XML file

Just open the file in vim text editor use the “nobomb” command
# vim file.xml
:set nobomb
:wq

Removal from HTML Files

Even if you have set the “charset=utf-8″ meta property right -it does not mean that you will not face the BOM problem. If a BOM character is causing problems in your HTML display -the problem actually lies in the text editor and not in your HTML/CSS code.
Most HTML editors, like Dreamweaver, Programmer’s Notepad, TextPad etc., do provide a way to disable BOM. The option usually comes in the place where you set the encoding of your text editor. It may appear in the form of “UTF-8 without BOM” or “UTF-8 No BOM”
Appearance of  character in your HTML code can also be solved using the above encoding change in HTML editor.
Setting UTF without BOM in Macromedia Dreamweaver
Setting UTF without BOM in Programmer's Notepad

Detection and Removal of BOM in UNIX/Linux

Find the list of files containing BOM characters
1

find /var/www/website/ -type f -print -exec hd -n 3 {} \; | grep -1 "ef bb bf" | grep "some_part_of_the_path" > bom_lines.txt

Remove BOM character
1

while read l; do sed -i '1 s/^\xef\xbb\xbf//' $l ; done < bom_lines.txt




No comments :

Post a Comment