As per Unicode documentation, the presence of BOM in file are useless, because it causes problems with non-BOM-aware software’s to identify or parse the leading characters having at the start. Same has been quoted at the bottom of the Wikipedia page:
Related errors in PostgreSQL:
ERROR: invalid input syntax for integer: “ï»¿9” (in psql-client)
SQL state: 22P02 (in PgAdmin-III)
Test case & fix on Windows:
Sample file “state_data.txt” created in NOTEPAD with unicode characters in it:
Table to import data:
create table states(state_code int, state_name char(30), state_in_hindi text);
postgres=# copy test from 'c:/Pgfile/state_data.txt' with delimiter ',' CSV;
ERROR: invalid input syntax for integer: "ï»¿9"
CONTEXT: COPY test, line 1, column state_code: "ï»¿9"
To fix, I have used a tool “bomremover.exe” to remove leading characters from a file as its on windows, if its on linux, then there are many tips & tricks available on net to wipe BOM from a utf-8 format file.
Tool Download link and usage:
C:Pgfile>bomremover.exe . *
Added '.state_data.txt' to processing list.
Press enter to process all files in the list. (1 files in total)
Processing file '.state_data.txt'...
Finished. Press Enter to Exit
After running bomremover.exe on file, re-run COPY command which will succeed to import data.
state_code | state_name | State_name_in_hindi
9 | Karnataka | αñòαñ░αÑ<8d>αñ¿αñ╛αñƒαñò
10 | Kerala | αñòαÑçαñ░αñ│αñ╛
Some of the editors, avoids default saving text with UTF8-BOM:
– Windows – Notepad++ (In Notepade default BOM enabled)
– Linux – VI
– Mac – TextEdit