Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Implement translations infrastructure#61380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
goanpeca wants to merge11 commits intopandas-dev:main
base:main
Choose a base branch
Loading
fromgoanpeca:translations

Conversation

goanpeca
Copy link

@goanpecagoanpeca commentedApr 30, 2025
edited
Loading

Hello team!

This PR is a proposal for adding the translations infrastructure to the pandas web page.

Following the discussion in#56301, we (a group of folks working on the Scientific Python grant) have been working to set up infrastructure and translate the contents of the pandas web site. As of this moment, we have 100% translations for the pandas website into Spanish and Brazilian Portuguese, with other languages available for translation (depending on volunteer translators).

To build, the command remains the same:

python pandas_web.py pandas/content --target-path build

If you want to check out other related work, please take a look atscipy/scipy.org#617

You an read more about how the translation process works athttps://scientific-python-translations.github.io/docs/

What this PR does?

Supersedes#61220

Demo

pandas


cc@mroeschke@datapythonista

@datapythonista
Copy link
Member

/preview

@github-actionsGitHub Actions
Copy link
Contributor

Website preview of this PR available at:https://pandas.pydata.org/preview/pandas-dev/pandas/61380/

@datapythonista
Copy link
Member

Thanks@goanpeca for this. Do you mind adding some more context here? I can't see inhttps://github.com/Scientific-Python-Translations/pandas-translations much information, like what languages are available, or how to fix a bad translation, which would be useful to know.

Also, in the docs generated from this PR I can see any language dropdown or anything different from our current docs. What are we expecting?

@melissawm
Copy link
Contributor

Hi@datapythonista - this is a follow up to#61220, a proof-of-concept CI job to build the website with translations that don't live in this repo. This PR and#61220 are meant to work together and I'm happy to incorporate one into the other once we agree on the general direction and workflow for this.

Let us know if we can answer any other questions. Unfortunately I'm not sure how to get the preview for the other PR, I relied on building locally to test that things were working.

@datapythonista
Copy link
Member

Sorry, I missed#61220 and the issue discussion.

I don't fully understand what you're doing here, but I describe next how to add translations without adding too much complexity in this repo, which I don't think any core dev would be onboard with.

  1. You decide on how to generate translations and manage it independently from this repo, and end up with a structure like this with the translated documents:
+ es/  - index.md  + about/    - team.md    - ...  - ...+ pt/  - index.md  + about/    - team.md    - ...  - ...
  1. In our CI, before callingpandas_web.py you download this directory structure to theweb/ directory. No other changes needed, this will create all translated pages.
  2. We add a dropdown with the languages to the website (you can add the language list toweb/pandas/config.yml)

I think this makes everyone's life easy, and we get the expected result.

@melissawm
Copy link
Contributor

Thanks@datapythonista !

Can you clarify what is missing from#61220 to match your description? That is pretty much what is done in that PR. Maybe this is confusing because we chose to do it in two parts exactly because we wanted to decouple the reorganization of the repo + switcher (in#61220) from the actual translations (this PR).

Happy to follow up with any feedback in the other PR as well. Cheers!

@datapythonista
Copy link
Member

In#61220 you are moving all the current website pages, that should be undone. You are adding the translated pages to this repo, we don't want it. You are making changes to pandas_web.py, this is not needed based on what I described above.

Only changes in a PR to this pandas repo should be addi g a CI step as per step 2, editing the wevsite template with the language dropdown as per step 3.

@melissawm
Copy link
Contributor

I see! I will rework what I have there to match your proposal. Thanks!

@goanpecagoanpecaforce-pushed thetranslations branch 3 times, most recently fromcd72e5c to2b85ad4CompareMay 8, 2025 01:47
Copy link
Member

@datapythonistadatapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great improvement to the PR, this makes a lot of sense to me.

I'd personally still simplify things more here in two ways. And feel free to disagree, as it's something opinionated.

First, if I understand correctly, you download the translations of the web, and the percentage of the translated content, amd then you check for each language if it's translated enough to be published. Personally, I think you should better take care of this logic in your repo when generating the tar, not here. First to simplify the code here, and second to avoid downloading translations that are not going to be used.

Just as a suggestion, I wouldn't use this approach, even in your repo. Imagine you publish translations that are at least 90%, and we have Spanish at 100%. Then I add new content that is 11% of the website. And automatically the Spanish translations that are already indexed by search engines, in user bookmarks, in links in blog posts... are deleted from our website. Not great in my opinion, much better to simply get the new content in English and hope it will eventually be translated.

Another thing I would do is to extract the tar file as it is downloaded. So the tar file is passed to gzip/tarfile in memory, with the io module, not as a path in disk. With this you can get all the code here in a single short function. Or we ciuld even create a github action in your repo with this, as it's generic, and just use it here. So only the CI step would live in this PR.

Finally, we already have a configuration file for the website. We could save the url of the translations tar there. Also good in the script, just a question of preference. Or if you go for the github action approach, it could simply be a parameter in the CI step. Then you would need another one for the target dir in this repo.

In any case, the approach here is also very reasonable, all above are suggestions that personally I think would make things simpler.

goanpeca reacted with thumbs up emoji
@goanpecagoanpecaforce-pushed thetranslations branch 3 times, most recently from4a6532b to04f9259CompareMay 12, 2025 02:38
@goanpecagoanpeca marked this pull request as ready for reviewMay 12, 2025 02:47
@goanpeca
Copy link
Author

goanpeca commentedMay 12, 2025
edited
Loading

Great improvement to the PR, this makes a lot of sense to me.

Thanks for the review@datapythonista.

First, if I understand correctly, you download the translations of the web, and the percentage of the translated content, amd then you check for each language if it's translated enough to be published. Personally, I think you should better take care of this logic in your repo when generating the tar, not here. First to simplify the code here, and second to avoid downloading translations that are not going to be used.

Fixed!

Just as a suggestion, I wouldn't use this approach, even in your repo. Imagine you publish translations that are at least 90%, and we have Spanish at 100%. Then I add new content that is 11% of the website. And automatically the Spanish translations that are already indexed by search engines, in user bookmarks, in links in blog posts... are deleted from our website. Not great in my opinion, much better to simply get the new content in English and hope it will eventually be translated.

This is also fixed!

Another thing I would do is to extract the tar file as it is downloaded. So the tar file is passed to gzip/tarfile in memory, with the io module, not as a path in disk. With this you can get all the code here in a single short function. Or we ciuld even create a github action in your repo with this, as it's generic, and just use it here. So only the CI step would live in this PR.

Did not follow this one as I think things are now simpler and in a single script.

Finally, we already have a configuration file for the website. We could save the url of the translations tar there. Also good in the script, just a question of preference. Or if you go for the github action approach, it could simply be a parameter in the CI step. Then you would need another one for the target dir in this repo.

Added the information the to the config file as requested and updated the scripts to handle site generation for languages. Moved all logic to the existing script.

In any case, the approach here is also very reasonable, all above are suggestions that personally I think would make things simpler.

Please let me know what do you think about the current changes.


This PR now supersedes#61220

Thanks@melissawm!

@goanpecagoanpeca changed the titleUpdate CI to include Translations from Scientific Python RepoImplement translations infrastructureMay 12, 2025
@goanpecagoanpecaforce-pushed thetranslations branch 3 times, most recently from09aaca5 to7e68731CompareMay 12, 2025 03:04
@goanpeca
Copy link
Author

goanpeca commentedMay 19, 2025
edited
Loading

Hi@datapythonista, I implemented the suggestions.

@goanpecagoanpecaforce-pushed thetranslations branch 3 times, most recently from3523a00 to70ad69fCompareMay 19, 2025 23:33
Copy link
Member

@datapythonistadatapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks@goanpeca for the updates. This is getting much simpler and cleaner. I added few more comments that I think should simplify this PR even more, but this starts to be much more reasonable in my opinion.

@goanpeca
Copy link
Author

goanpeca commentedMay 21, 2025
edited
Loading

Hi@datapythonista, I implemented most of the additional suggestions. Please see comments.

@goanpecagoanpecaforce-pushed thetranslations branch 2 times, most recently from4a99e10 to0964c17CompareMay 21, 2025 09:33
Copy link
Member

@datapythonistadatapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great job. This looks very clear now. I'd split the preprocessors in a slightly different way as suggested in the comments, but I think the way the code is now is very simple and easy to understand and maintain. Thanks a lot for all the updates here.

goanpeca reacted with hooray emojimelissawm reacted with heart emoji
@goanpeca
Copy link
Author

I'd split the preprocessors in a slightly different way as suggested in the comments, but I think the way the code is now is very simple and easy to understand and maintain. Thanks a lot for all the updates here.

Made some new changes based on your suggestions@datapythonista.

@goanpecagoanpecaforce-pushed thetranslations branch 2 times, most recently from9bc7032 to5245445CompareMay 21, 2025 13:02
@datapythonista
Copy link
Member

/preview

@github-actionsGitHub Actions
Copy link
Contributor

Website preview of this PR available at:https://pandas.pydata.org/preview/pandas-dev/pandas/61380/

@datapythonista
Copy link
Member

The sponsor logos in the home page don't render correctly. I guess the problem is not in this PR, but in the translation of the html file, no?

@goanpeca
Copy link
Author

goanpeca commentedMay 21, 2025
edited
Loading

The sponsor logos in the home page don't render correctly. I guess the problem is not in this PR, but in the translation of the html file, no?

Would it be ok to use absolute URLs? since the english pages live in the root but the translated pages live ines/something. Would not work on preview though.

I could use/static/img... instead of../static at is currently used. Frameworks rely on filters likerelative_url / absolute_url to handle this cases and append the appropriate base_folder or base_url to the link.

Either that, or copying assets folder into each language.

@datapythonista
Copy link
Member

The images of the books should be implemented in the same exact way, and those seem to be working fine in the translated pages. Doesn't seem like we need to change the links, feels more like a problem in the translated content for those images, no?

@goanpeca
Copy link
Author

The images of the books should be implemented in the same exact way, and those seem to be working fine in the translated pages. Doesn't seem like we need to change the links, feels more like a problem in the translated content for those images, no?

I will look into the content.

@goanpeca
Copy link
Author

goanpeca commentedMay 21, 2025
edited
Loading

I guess the problem is not in this PR, but in the translation of the html file, no?

Correct. This is an issue in the translations. (Working on those fixes)

¿Besides that is there anything else you consider needs a revision?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@mroeschkemroeschkeAwaiting requested review from mroeschke

@datapythonistadatapythonistaAwaiting requested review from datapythonista

Assignees
No one assigned
Labels
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@goanpeca@datapythonista@melissawm

[8]ページ先頭

©2009-2025 Movatter.jp