As per Unicode documentation, the presence of BOM in file are useless, because it causes problems with non-BOM-aware software’s to identify or parse the leading characters having at the start. Same has been quoted at the bottom of the Wikipedia page:
Related errors in PostgreSQL:
ERROR: invalid input syntax for integer: “9” (in psql-client)
SQL state: 22P02 (in PgAdmin-III)
Test case & fix on Windows:
Sample file “state_data.txt” created in NOTEPAD with unicode characters in it:
9,Karnataka,कर्नाटक
10,Kerala,केरळा
Table to import data:
create table states(state_code int, state_name char(30), state_in_hindi text);
Error:
postgres=# copy test from 'c:/Pgfile/state_data.txt' with delimiter ',' CSV;
ERROR: invalid input syntax for integer: "9"
CONTEXT: COPY test, line 1, column state_code: "9"
To fix, I have used a tool “bomremover.exe” to remove leading characters from a file as its on windows, if its on linux, then there are many tips & tricks available on net to wipe BOM from a utf-8 format file.
Tool Download link and usage:
http://www.mannaz.at/codebase/utf-byte-order-mark-bom-remover/
Eg:-
C:Pgfile>bomremover.exe . *
Added '.state_data.txt' to processing list.
Press enter to process all files in the list. (1 files in total)
Processing file '.state_data.txt'...
Finished. Press Enter to Exit
After running bomremover.exe on file, re-run COPY command which will succeed to import data.
state_code | state_name | State_name_in_hindi
------------+------------+---------------------
9 | Karnataka | αñòαñ░αÑ<8d>αñ¿αñ╛αñƒαñò
10 | Kerala | केरळा
(2 rows)
Some of the editors, avoids default saving text with UTF8-BOM:
– Windows – Notepad++ (In Notepade default BOM enabled)
– Linux – VI
– Mac – TextEdit
–Raghav
thanks, nice tip.
thanks, nice tip.