Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add BE Publisher (Politico EU)#811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
rascaria wants to merge3 commits intoflairNLP:master
base:master
Choose a base branch
Loading
fromrascaria:add-politico-eu
Open
Show file tree
Hide file tree
Changes from1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
PrevPrevious commit
NextNext commit
fix: Enhance PoliticoEu parser with images and improved selectors
  • Loading branch information
@rascaria
rascaria committedNov 5, 2025
commit9da4ad496eb22a067b2f19f7df639f72a3e13f32
28 changes: 23 additions & 5 deletionssrc/fundus/publishers/be/politico_eu.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,32 @@
import datetime
import re
from typing import List, Optional

from lxml.cssselect import CSSSelector
from lxml.etree import XPath

from fundus.parser import ArticleBody, BaseParser, ParserProxy, attribute
from fundus.parser import ArticleBody, BaseParser,Image,ParserProxy, attribute
from fundus.parser.utility import (
extract_article_body_with_selector,
generic_author_parsing,
generic_date_parsing,
image_extraction,
)


class PoliticoEuParser(ParserProxy):
class V1(BaseParser):
_paragraph_selector = CSSSelector(".article__content > p")
_paragraph_selector = CSSSelector(".article__content p, .sidebar-grid_content p")
_subheadline_selector = CSSSelector(".article__content h3, .sidebar-grid__content h3")
_summary_selector = CSSSelector("p.hero__excerpt")

@attribute
def body(self) -> Optional[ArticleBody]:
return extract_article_body_with_selector(
self.precomputed.doc,
paragraph_selector=self._paragraph_selector,
subheadline_selector=self._subheadline_selector,
summary_selector=self._summary_selector,
)

@attribute
Expand All@@ -36,9 +43,20 @@ def title(self) -> Optional[str]:

@attribute
def topics(self) -> List[str]:
keywords: Optional[List[str]]= self.precomputed.ld.bf_search("keywords")
keywords_string= self.precomputed.meta.get("keywords")

ifkeywords is None:
ifkeywords_string is None:
return []

return keywords
return keywords_string.split(",")

@attribute
def images(self) -> List[Image]:
return image_extraction(
doc=self.precomputed.doc,
upper_boundary_selector=CSSSelector("article"),
image_selector=CSSSelector("figure img"),
paragraph_selector=self._paragraph_selector,
caption_selector=XPath("./ancestor::figure//div[contains(@class, 'figcaption__inner')]"),
author_selector=re.compile(r"\|(?P<credits>.*)$"),
)
135 changes: 123 additions & 12 deletionstests/resources/parser/test_data/be/PoliticoEu.json
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -4,7 +4,9 @@
"Shawn Pogatchnik"
],
"body": {
"summary": [],
"summary": [
"Independent socialist Catherine Connolly’s coolness to Brussels and hostility to Donald Trump put her at odds with the Irish government."
],
"sections": [
{
"headline": [],
Expand All@@ -14,18 +16,39 @@
"The latest opinion poll, published Wednesday night, put Connolly on 55.7 percent support compared to Humphreys’ 31.6 percent. Results will be announced Saturday, but the surprisingly fleet-footed 68-year-old Connolly acts and talks like she’s already won.",
"A Connolly victory shouldn’t be a surprise, given how Ireland’s presidency has evolved in the public mind since civil rights lawyer Mary Robinson’s trailblazing win in 1990.",
"Once a sinecure for senior statesmen backed by the dominant Fianna Fáil party, Robinson’s breakthrough augured in a new era of presidents coming from the opposition benches or outside political ranks. That reflects many voters’ apparent preference today for a presidency — a largely ceremonial post with no role in day-to-day government — that can challenge the establishment and, more specifically, the current Fianna Fáil-led coalition.",
"Should she win, Connolly will succeed another Galway socialist, Michael D. Higgins, who spent the past two terms and 14 years expanding what the president is allowed to say and do.",
"Should she win, Connolly will succeed another Galway socialist, Michael D. Higgins, who spent the past two terms and 14 years expanding what the president is allowed to say and do."
]
},
{
"headline": [
"Unapologetically outspoken"
],
"paragraphs": [
"Like Higgins, Connolly has been outspoken in condemning Israel for its two-year war in Gaza — a certain vote-winner in a country that openly sympathizes with the Palestinians and has wretched relations with Tel Aviv.",
"But Connolly has gone farther, defending Hamas’ right to play a future role in any Palestinian state. That drew rebukes from Prime Minister Micheál Martin, the leader of Fianna Fáil, and Foreign Minister Simon Harris, leader of Fine Gael, the other party in Ireland’s center-right government.",
"It’s her NATO-critical stance on Ukraine, and opposition to wider European security moves, that could soon be generating the most awkward headlines for an Irish government caught between the state’s official neutrality and its support for EU efforts to bolster Ukraine.",
"Connolly faced questioning from supporters at one campaign event in a Dublin pub after she compared Germany’s current plans to boost defense spending with Nazi militarization in the 1930s. But she’s stood firm in her opposition to the EU’s ReArm Europe plans to boost defense spending by €800 billion.",
"At the final televised presidential debate Tuesday night, Connolly was asked how she would treat a visiting U.S. President Donald Trump — and whether she would challenge him to his face about U.S. backing of Israel and its Gaza war.",
"“Genocide was enabled and resourced by American money,” Connolly began, before being asked again if she would say this to Trump, who has a golf resort in Ireland and plans to visit when it hosts the Irish Open next year.",
"“If it’s just a meet and greet, then I will meet and greet. If the discussion is genocide, then that’s a completely different thing,” she said.",
"“If it’s just a meet and greet, then I will meet and greet. If the discussion is genocide, then that’s a completely different thing,” she said."
]
},
{
"headline": [
"Unifying the opposition"
],
"paragraphs": [
"Connolly’s dominance in the campaign has been delivered, in part, by her ability to win backing from all of the opposition parties on Ireland’s normally fractious left wing — most crucially the nationalist Sinn Féin, which declined to run its own candidate. Instead, Sinn Féin chief Mary Lou McDonald and the party’s leader of the government in neighboring Northern Ireland, First Minister Michelle O’Neill, have joined Connolly on the campaign trail.",
"Conspicuous in their absence have been Connolly’s previous highest-profile allies, socialist radicals Mick Wallace and Clare Daly — dubbed Moscow Mick and Kremlin Clare by their political opponents.",
"Connolly in 2018 joined Daly and Wallace in a tour of government-controlled parts of Syria back when all three were opposition lawmakers in the Irish parliament. Daly and Wallace subsequently were elected MEPs, but lost their Brussels seats in the 2024 European Parliament election.",
"In media interviews and televised debates, Connolly has repeatedly batted away questions about the wisdom of taking that one-sided tour of Syria in areas under the control of President Bashar al-Assad, who was deposed by rebels last year. She has dismissed media questions about her links with Daly and Wallace as attempts at “guilt by association.”",
"In media interviews and televised debates, Connolly has repeatedly batted away questions about the wisdom of taking that one-sided tour of Syria in areas under the control of President Bashar al-Assad, who was deposed by rebels last year. She has dismissed media questions about her links with Daly and Wallace as attempts at “guilt by association.”"
]
},
{
"headline": [
"Speaking for Ireland"
],
"paragraphs": [
"Veterans of Ireland’s diplomatic service have expressed fears that a President Connolly might confuse the world about the Irish government’s true positions. They stress Ireland’s economic dependence on hundreds of U.S. multinationals and the EU’s strong support for Ireland following the U.K.’s disruptive exit.",
"Bobby McDonagh, a former Irish ambassador to the U.K. and EU, says Connolly “has made statements and taken stances that, in my strong view and the view of many, are essentially anti-EU.”",
"He called Connolly’s criticisms of increased German defense spending in the face of rising Russian security threats “fatuous” and her comments suggesting a parallel to Nazi-era militarization “absurd.”",
Expand All@@ -34,17 +57,105 @@
}
]
},
"images": [
{
"versions": [
{
"url":"https://www.politico.eu/cdn-cgi/image/width=480,quality=80,onerror=redirect,format=auto/wp-content/uploads/2025/10/23/GettyImages-2242056166-scaled.jpg",
"query_width":null,
"size": {
"width":480,
"height":320
},
"type":"image/jpeg"
},
{
"url":"https://www.politico.eu/cdn-cgi/image/width=768,quality=80,onerror=redirect,format=auto/wp-content/uploads/2025/10/23/GettyImages-2242056166-scaled.jpg",
"query_width":null,
"size": {
"width":768,
"height":512
},
"type":"image/jpeg"
},
{
"url":"https://www.politico.eu/cdn-cgi/image/width=1024,quality=80,onerror=redirect,format=auto/wp-content/uploads/2025/10/23/GettyImages-2242056166-scaled.jpg",
"query_width":null,
"size": {
"width":1024,
"height":682
},
"type":"image/jpeg"
},
{
"url":"https://www.politico.eu/cdn-cgi/image/width=1280,quality=80,onerror=redirect,format=auto/wp-content/uploads/2025/10/23/GettyImages-2242056166-scaled.jpg",
"query_width":null,
"size": {
"width":1280,
"height":853
},
"type":"image/jpeg"
},
{
"url":"https://www.politico.eu/cdn-cgi/image/width=1440,quality=80,onerror=redirect,format=auto/wp-content/uploads/2025/10/23/GettyImages-2242056166-scaled.jpg",
"query_width":null,
"size": {
"width":1440,
"height":960
},
"type":"image/jpeg"
},
{
"url":"https://www.politico.eu/cdn-cgi/image/width=1920,quality=80,onerror=redirect,format=auto/wp-content/uploads/2025/10/23/GettyImages-2242056166-scaled.jpg",
"query_width":null,
"size": {
"width":1920,
"height":1279
},
"type":"image/jpeg"
},
{
"url":"https://www.politico.eu/cdn-cgi/image/width=2640,quality=80,onerror=redirect,format=auto/wp-content/uploads/2025/10/23/GettyImages-2242056166-scaled.jpg",
"query_width":null,
"size": {
"width":2640,
"height":1759
},
"type":"image/jpeg"
}
],
"is_cover":true,
"description":"Irish presidential election",
"caption":"Should she win, Catherine Connolly will succeed another Galway socialist, Michael D. Higgins.",
"authors": [
"Niall Carson/PA Images via Getty Images"
],
"position":512
}
],
"publishing_date":"2025-10-23 15:50:00+00:00",
"title":"Socialist critic of NATO and EU poised to win Ireland’s presidency",
"topics": [
"defense",
"elections",
"mayors",
"media",
"parliament",
"poll",
"security",
"services"
"Clare Daly",
"Defense",
"Donald Trump",
"Elections",
"Ireland",
"Mary Lou McDonald",
"Mary Robinson",
"Mayors",
"Media",
"Michael D. Higgins",
"Micheál Martin",
"Michelle O'Neill",
"Mick Wallace",
"Northern Ireland",
"Parliament",
"Poll",
"Security",
"Services",
"Simon Harris",
"United Kingdom"
]
}
}

[8]ページ先頭

©2009-2025 Movatter.jp