NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit0afc0a7

committed

Fix unaccent generation script in Windows

As originally coded, the script would fail on Windows 10 and Python 3because stdout would not be switched to UTF-8 only for Python 2. Thispatch makes that apply to both versions.Also add python 2 compatibility markers so that we know what to removeonce we drop support for that. Also use a "with" clause to ensure filedescriptor is closed promptly.Author: Hugh Ranalli, RamanarayanaReviewed-by: Kyotaro HoriguchiDiscussion:https://postgr.es/m/CAKm4Xs7_61XMyOWmHs3n0mmkS0O4S0pvfWk=7cQ5P0gs177f7A@mail.gmail.comDiscussion:https://postgr.es/m/15548-cef1b3f8de190d4f@postgresql.org

1 parentb438e7e commit0afc0a7Copy full SHA for 0afc0a7

File tree

1 file changed

+24

-20

lines changed

contrib/unaccent
- generate_unaccent_rules.py

1 file changed

+24

-20

lines changed

`‎contrib/unaccent/generate_unaccent_rules.py`

Lines changed: 24 additions & 20 deletions

Original file line number	Diff line number	Diff line change
`@@ -32,9 +32,15 @@`
`32`	`32`	`# The approach is to be Python3 compatible with Python2 "backports".`
`33`	`33`	`from __future__importprint_function`
`34`	`34`	`from __future__importunicode_literals`
	`35`	`+# END: Python 2/3 compatibility - remove when Python 2 compatibility dropped`
	`36`	`+`
	`37`	`+importargparse`
`35`	`38`	`importcodecs`
	`39`	`+importre`
`36`	`40`	`importsys`
	`41`	`+importxml.etree.ElementTreeasET`
`37`	`42`
	`43`	`+# BEGIN: Python 2/3 compatibility - remove when Python 2 compatibility dropped`
`38`	`44`	`ifsys.version_info[0]<=2:`
`39`	`45`	`# Encode stdout as UTF-8, so we can just print to it`
`40`	`46`	`sys.stdout=codecs.getwriter('utf8')(sys.stdout)`
`@@ -45,12 +51,9 @@`
`45`	`51`	`# Python 2 and 3 compatible bytes call`
`46`	`52`	`defbytes(source,encoding='ascii',errors='strict'):`
`47`	`53`	`returnsource.encode(encoding=encoding,errors=errors)`
	`54`	`+else:`
`48`	`55`	`# END: Python 2/3 compatibility - remove when Python 2 compatibility dropped`
`49`		`-`
`50`		`-importre`
`51`		`-importargparse`
`52`		`-importsys`
`53`		`-importxml.etree.ElementTreeasET`
	`56`	`+sys.stdout=codecs.getwriter('utf8')(sys.stdout.buffer)`
`54`	`57`
`55`	`58`	`# The ranges of Unicode characters that we consider to be "plain letters".`
`56`	`59`	`# For now we are being conservative by including only Latin and Greek. This`
`@@ -233,21 +236,22 @@ def main(args):`
`233`	`236`	`charactersSet=set()`
`234`	`237`
`235`	`238`	`# read file UnicodeData.txt`
`236`		`-unicodeDataFile=open(args.unicodeDataFilePath,'r')`
`237`		`-`
`238`		`-# read everything we need into memory`
`239`		`-forlineinunicodeDataFile:`
`240`		`-fields=line.split(";")`
`241`		`-iflen(fields)>5:`
`242`		`-# http://www.unicode.org/reports/tr44/tr44-14.html#UnicodeData.txt`
`243`		`-general_category=fields[2]`
`244`		`-decomposition=fields[5]`
`245`		`-decomposition=re.sub(decomposition_type_pattern,' ',decomposition)`
`246`		`-id=int(fields[0],16)`
`247`		`-combining_ids= [int(s,16)forsindecomposition.split(" ")ifs!=""]`
`248`		`-codepoint=Codepoint(id,general_category,combining_ids)`
`249`		`-table[id]=codepoint`
`250`		`-all.append(codepoint)`
	`239`	`+withcodecs.open(`
	`240`	`+args.unicodeDataFilePath,mode='r',encoding='UTF-8',`
	`241`	`+ )asunicodeDataFile:`
	`242`	`+# read everything we need into memory`
	`243`	`+forlineinunicodeDataFile:`
	`244`	`+fields=line.split(";")`
	`245`	`+iflen(fields)>5:`
	`246`	`+# http://www.unicode.org/reports/tr44/tr44-14.html#UnicodeData.txt`
	`247`	`+general_category=fields[2]`
	`248`	`+decomposition=fields[5]`
	`249`	`+decomposition=re.sub(decomposition_type_pattern,' ',decomposition)`
	`250`	`+id=int(fields[0],16)`
	`251`	`+combining_ids= [int(s,16)forsindecomposition.split(" ")ifs!=""]`
	`252`	`+codepoint=Codepoint(id,general_category,combining_ids)`
	`253`	`+table[id]=codepoint`
	`254`	`+all.append(codepoint)`
`251`	`255`
`252`	`256`	`# walk through all the codepoints looking for interesting mappings`
`253`	`257`	`forcodepointinall:`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit0afc0a7

File tree

1 file changed

1 file changed

`‎contrib/unaccent/generate_unaccent_rules.py`

0 commit comments