Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit0afc0a7

Browse files
committed
Fix unaccent generation script in Windows
As originally coded, the script would fail on Windows 10 and Python 3because stdout would not be switched to UTF-8 only for Python 2. Thispatch makes that apply to both versions.Also add python 2 compatibility markers so that we know what to removeonce we drop support for that. Also use a "with" clause to ensure filedescriptor is closed promptly.Author: Hugh Ranalli, RamanarayanaReviewed-by: Kyotaro HoriguchiDiscussion:https://postgr.es/m/CAKm4Xs7_61XMyOWmHs3n0mmkS0O4S0pvfWk=7cQ5P0gs177f7A@mail.gmail.comDiscussion:https://postgr.es/m/15548-cef1b3f8de190d4f@postgresql.org
1 parentb438e7e commit0afc0a7

File tree

1 file changed

+24
-20
lines changed

1 file changed

+24
-20
lines changed

‎contrib/unaccent/generate_unaccent_rules.py

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,15 @@
3232
# The approach is to be Python3 compatible with Python2 "backports".
3333
from __future__importprint_function
3434
from __future__importunicode_literals
35+
# END: Python 2/3 compatibility - remove when Python 2 compatibility dropped
36+
37+
importargparse
3538
importcodecs
39+
importre
3640
importsys
41+
importxml.etree.ElementTreeasET
3742

43+
# BEGIN: Python 2/3 compatibility - remove when Python 2 compatibility dropped
3844
ifsys.version_info[0]<=2:
3945
# Encode stdout as UTF-8, so we can just print to it
4046
sys.stdout=codecs.getwriter('utf8')(sys.stdout)
@@ -45,12 +51,9 @@
4551
# Python 2 and 3 compatible bytes call
4652
defbytes(source,encoding='ascii',errors='strict'):
4753
returnsource.encode(encoding=encoding,errors=errors)
54+
else:
4855
# END: Python 2/3 compatibility - remove when Python 2 compatibility dropped
49-
50-
importre
51-
importargparse
52-
importsys
53-
importxml.etree.ElementTreeasET
56+
sys.stdout=codecs.getwriter('utf8')(sys.stdout.buffer)
5457

5558
# The ranges of Unicode characters that we consider to be "plain letters".
5659
# For now we are being conservative by including only Latin and Greek. This
@@ -233,21 +236,22 @@ def main(args):
233236
charactersSet=set()
234237

235238
# read file UnicodeData.txt
236-
unicodeDataFile=open(args.unicodeDataFilePath,'r')
237-
238-
# read everything we need into memory
239-
forlineinunicodeDataFile:
240-
fields=line.split(";")
241-
iflen(fields)>5:
242-
# http://www.unicode.org/reports/tr44/tr44-14.html#UnicodeData.txt
243-
general_category=fields[2]
244-
decomposition=fields[5]
245-
decomposition=re.sub(decomposition_type_pattern,' ',decomposition)
246-
id=int(fields[0],16)
247-
combining_ids= [int(s,16)forsindecomposition.split(" ")ifs!=""]
248-
codepoint=Codepoint(id,general_category,combining_ids)
249-
table[id]=codepoint
250-
all.append(codepoint)
239+
withcodecs.open(
240+
args.unicodeDataFilePath,mode='r',encoding='UTF-8',
241+
)asunicodeDataFile:
242+
# read everything we need into memory
243+
forlineinunicodeDataFile:
244+
fields=line.split(";")
245+
iflen(fields)>5:
246+
# http://www.unicode.org/reports/tr44/tr44-14.html#UnicodeData.txt
247+
general_category=fields[2]
248+
decomposition=fields[5]
249+
decomposition=re.sub(decomposition_type_pattern,' ',decomposition)
250+
id=int(fields[0],16)
251+
combining_ids= [int(s,16)forsindecomposition.split(" ")ifs!=""]
252+
codepoint=Codepoint(id,general_category,combining_ids)
253+
table[id]=codepoint
254+
all.append(codepoint)
251255

252256
# walk through all the codepoints looking for interesting mappings
253257
forcodepointinall:

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp