Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

prepare-benchmark get xbench-ds 出错UnicodeEncodeError: 'gbk' codec can't encode character '\u2011' in position 273: illegal multibyte sequence #100

Open
@zhoukai83

Description

@zhoukai83

Describe the bug
执行:uv run main.py prepare-benchmark get xbench-ds
后出错:
\MiroFlow\utils\prepare_benchmark\main.py:179 in get │
│ │
│ 176 │ ds_file = env.data_dir / dataset / env.meta_filename │
│ 177 │ with open(ds_file, mode="w") as f: │
│ 178 │ │ for task in ds_gen(): │
│ ❱ 179 │ │ │ f.write(task.to_json().decode() + "\n") │
│ 180 │ print("\n" + "=" * 80) │
│ 181 │ print(f" Benchmark: {dataset}") │
│ 182 │ print(f" Saved to: {ds_file}") │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ dataset = 'xbench-ds' │ │
│ │ ds_file = WindowsPath('D:/myCode/github/MiroFlow/data/xbench-ds/standardized_data.jsonl') │ │
│ │ env = _Env(data_dir=WindowsPath('D:/myCode/github/MiroFlow/data'), hf_token='') │ │
│ │ f = <_io.TextIOWrapper │ │
│ │ name='D:\myCode\github\MiroFlow\data\xbench-ds\standardized_data.jsonl' │ │
│ │ mode='w' encoding='cp936'> │ │
│ │ task = Task( │ │
│ │ │ task_id=1, │ │
│ │ │ │ │
│ │ task_question='截至2024年12月31日,2024年上海黄金交易所Au(T+D)合约的“最高价”与“最… │ │
│ │ │ ground_truth='161.27元', │ │
│ │ │ file_path=None, │ │
│ │ │ metadata={ │ │
│ │ │ │ 'reference_steps': '1. │ │
│ │ 访问官方行情页面:https://www.sge.com.cn/sjzx/quotation_daily_new\n2. │ │
│ │ 设置查询区间:在页面中,按月度查询'+223 │ │
│ │ │ } │ │
│ │ ) │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnicodeEncodeError: 'gbk' codec can't encode character '\u2011' in position 273: illegal multibyte sequence

代码没考虑到不同系统不同环境下字符的默认encoding可能不一样
修改utils/prepare_benchmark/main.py 177行后可执行成功
with open(ds_file, mode="w", encoding='utf8') as f:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2026 Movatter.jp