2
\$\begingroup\$

I'm not sure if the problem here is how I'm wording the question or going about finding an answer, but I have what I think is a relatively trivial task: getting fact-check claims from the Google Fact Check API, saving them in a database and then using that database to get the counts of fact checked claims & who claimed them.

The project is availablehere and the specific piece of code I'm wondering about ishere. What I'm trying to accomplish here is basically getting a count of claims by the claimant and the claims themselves. Here's an output of that function:

The Gateway Pundit{'claims': [Says Joe Biden “imports oil from Iran.”], 'number_of_claims': 1}~~~Real Raw News, social media users{'claims': [Hillary Clinton was hanged at Guantanamo Bay], 'number_of_claims': 1}~~~Posts on Facebook and Instagram{'claims': [Donald Trump has been permitted to once again use the Facebook and Instagram accounts he used when US president.], 'number_of_claims': 1}~~~Donald Trump{'claims': [Georgia didn’t update its voter rolls prior to the 2020 presidential election; “this means we (you!) won the presidential election in Georgia.”, “Thank you and congratulations to Laura Baigert of the Georgia Star News on the incredible reporting you have done. Keep going! The scam [in Fulton County, Georgia] is all unraveling fast.”, Twitter banned President Muhammadu Buhari, “Republican state senators” who started an audit of 2020 election results in Maricopa County are “exposing this fraud.”], 'number_of_claims': 4}~~~Joe Biden{'claims': [“In my first four months in office, more than two million jobs have been created. That’s more than double the rate of my predecessor, and more than eight times the rate of President Reagan.”], 'number_of_claims': 1}

My question is: it seems there must be a much easier way to get what I'm after without doing the wholedefaultdict dance we're doing in93-97 - I suspect there's a pretty easy way to do what I'm doing there with a more clever SQL query, but it's eluding me right now. Any ideas? Of course, any other comments on the code are very much appreciated as well!

main.py:

import datetimeimport osfrom collections import defaultdictimport requestsfrom sqlalchemy import create_engine, funcfrom sqlalchemy.exc import IntegrityErrorfrom sqlalchemy.orm import sessionmakerfrom sqlalchemy.pool import NullPoolfrom models import Claim, Claimant, claims, claimantsdef create_db_session():    engine = create_engine(        f"postgresql+psycopg2://postgres:"        + f"{os.getenv('POSTGRES_PASSWORD')}@127.0.0.1"        + f":5432/gfc",        poolclass=NullPool,    )    Session = sessionmaker(bind=engine)    session = Session()    return sessiondef get_api_key():    API_KEY = os.getenv("API_KEY")    if API_KEY is None:        raise EnvironmentError(            "Need API_KEY to be set to valid Google Fact Check Tools API key.\n"        )    return API_KEYdef process_claim(claim, session):    claimant = Claimant(name=claim["claimant"])    session.add(claimant)    try:        session.commit()    except IntegrityError:        print(f'Claimant "{claimant.name}" is already in the database.')        session.rollback()        claimant = session.query(Claimant).where(Claimant.name == claimant.name).first()    claim = Claim(        text=claim["text"],        date=datetime.datetime.strptime(claim["claimDate"][:-10], '%Y-%m-%d').date(),        claimant=claimant    )    session.add(claim)    try:        session.commit()    except IntegrityError:        print(f'Claim "{claim.text}" is already in the database.')        session.rollback()    return claimdef search_query(API_KEY):    url = (        f"https://content-factchecktools.googleapis.com/v1alpha1/claims:search?"        f"pageSize=10&"        f"query=biden&"        f"maxAgeDays=30&"        f"offset=0&languageCode=en-US&key={API_KEY}"    )    r = requests.get(url, headers={"x-referer": "https://explorer.apis.google.com"})    data = r.json()    next_page_token = data.get("nextPageToken")    claims = data.pop("claims")    return claimsdef main():    API_KEY = get_api_key()    claims = search_query(API_KEY=API_KEY)    session = create_db_session()    for claim in claims:        process_claim(claim, session)def source_of_claims(session):    results = session.query(        claims._columns['claimant_id'],  # id of claimant        func.count(claims._columns['claimant_id'])  # number of claims claimant is responsible for    ).group_by(        claims._columns['claimant_id']    ).all()  # returns tuple of claimant_id, # of instances    parsed_results = defaultdict(list)    for result in results:        claimant = session.query(Claimant).get(ident=result[0])        claims_ = session.query(Claim).where(Claim.claimant_id == claimant.id).all()        parsed_results[claimant] = dict(claims=claims_, number_of_claims=len(claims_))    for key, value in parsed_results.items():        print('~~~')        print(key)        print(value)    #  TODO: build claim result stuff    return parsed_resultsif __name__ == "__main__":    session = create_db_session()    source_of_claims(session=session)

models.py:

from sqlalchemy import Column, Date, ForeignKey, Integer, MetaData, Table, Textfrom sqlalchemy.orm import declarative_base, relationshipBase = declarative_base()metadata = MetaData()class Claim(Base):    __tablename__ = "claims"    id = Column(Integer, primary_key=True)    text = Column(Text, unique=True)    date = Column(Date)    claimant_id = Column(Integer, ForeignKey("claimants.id"))    claimant = relationship("Claimant")    def __str__(self):        return f"{self.text}"    def __repr__(self):        return f"{self.text}"class Claimant(Base):    __tablename__ = "claimants"    id = Column(Integer, primary_key=True)    name = Column(Text, unique=True)    def __str__(self):        return f"{self.name}"claims = Table(    "claims", metadata,    Column('id', Integer, primary_key=True),    Column('text', Text, unique=True),    Column('date', Date),    Column('claimant_id', Integer, ForeignKey('claimants.id')),)claimants = Table(    "claimants", metadata,    Column('id', Integer, primary_key=True),    Column('name', Text, unique=True))
Jamal's user avatar
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
askedJun 25, 2021 at 17:02
n1c9's user avatar
\$\endgroup\$
2
  • \$\begingroup\$Why use a database at all, if you're just doing counts? Can you count things streamed from the API?\$\endgroup\$CommentedJun 25, 2021 at 20:56
  • \$\begingroup\$I could, and that would be sufficient for now, but I'd like to not have to constantly hit the API to do these kind of queries.\$\endgroup\$CommentedJun 25, 2021 at 21:38

1 Answer1

3
\$\begingroup\$
  • Could benefit from PEP484 type hints, e.g.process_claim(claim: Dict[str, Any], session: Session) -> Claim:
  • Isstrptime(claim["claimDate"][:-10], '%Y-%m-%d') correct? You're saying that you're taking the whole string from the beginning up to ten characters before the end. Instead, wildly guessing, you either want the first ten characters or the last ten characters; this does neither.

This:

url = (    f"https://content-factchecktools.googleapis.com/v1alpha1/claims:search?"    f"pageSize=10&"    f"query=biden&"    f"maxAgeDays=30&"    f"offset=0&languageCode=en-US&key={API_KEY}")r = requests.get(url, headers={"x-referer": "https://explorer.apis.google.com"})

is better-represented as

with get(    "https://content-factchecktools.googleapis.com/v1alpha1/claims:search",    params={        'pageSize': 10,        'query': 'biden',        'maxAgeDays': 30,        'offset': 0,        'languageCode': 'en-US',        'key': API_KEY,    },    headers={        "X-Referer": "https://explorer.apis.google.com",    },    ) as r:    r.raise_for_status()    data = r.json()
answeredJun 26, 2021 at 2:14
Reinderien's user avatar
\$\endgroup\$

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.