View the previous post athttps://friendlyuser.github.io/posts/stonks/scrap_from_sec_with_python
This code fetches 13F filings for a list of companies from the Securities and Exchange Commission's (SEC) Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system. 13F filings are quarterly reports that detail the holdings of investment managers with at least $100 million in assets. The filings function from the secedgar library is used to retrieve these 13F filings, and the get_company_of_interest function acts as a filter to only return the 13F filings for the companies specified in the company_list variable. This list is read from the holdings_list.txt file. The fetch_filings function uses the save method of the combo_filings object to save the retrieved 13F filings to the temp folder. The get_year_quarter_from_path function extracts the year and quarter from the path of a file and returns them as a tuple of strings.
fromsecedgarimportfilings,FilingTypefromdatetimeimportdateimportglobimportmatplotlib.pyplotaspltimportseabornassnsimportreimportpandasaspdimportjsonfromtypingimportListfromparserimportDocParser# read company list from holdings_list.txtwithopen("holdings_list.txt","r")asf:holdings_list=f.read().splitlines()company_list=holdings_listdefget_company_of_interest(filing_entry):if"13F"infiling_entry.form_typeandfiling_entry.company_name.lower()in(name.lower()fornameincompany_list):company_list.append(filing_entry.company_name)returnTruereturnFalsedeffetch_filings():# 13F filings for Apple (ticker "aapl")# read list from holdings_list.txtcombo_filings=filings(start_date=date(2022,1,1),end_date=date(2022,12,20),# filing_type=FilingType.FILING_13F,user_agent="Your name <dlcoding20@gmail.com>",entry_filter=get_company_of_interest,rate_limit=5)# map folder to year and quartercombo_filings.save("temp")defget_year_quarter_from_path(path:str):""" Get year and quarter from path """# get all numbers from path using regexyear,quarter=re.findall(r'\d+',path)[0:2]returnyear,quarter
This code appears to be parsing 13F filing documents and outputting the results to a file in markdown format. The parse_filings function takes a dictionary of data (data) as input and retrieves a company's CIK number (a unique identifier assigned by the SEC to publicly traded companies) from the cik field in the data dictionary. If the cik field is not present, it defaults to "1649339". It also retrieves the desired output filename from the filename field in the data dictionary, defaulting to "burry" if the field is not present. The function then uses the glob library to find all the text files under the temp folder that match the specified CIK number and stores their filenames in the filename variable. It uses the DocParser class to parse each file and filters the resulting documents for those of type "INFORMATION_TABLE". It then adds the year and quarter of each document to its data frame and stores the resulting document in the final_docs list. If the final_docs list has four elements, the loop breaks. Finally, the function sorts the elements of final_docs by quarter and calls the output_to_md function to output the data to a markdown file. The metadata dictionary stores some metadata about the 13F filing that is included in the output file.
defparse_filings(data:dict={}):""" Parse filings """cik=data.get("cik","1649339")output_name=data.get("filename","burry")# find files under temp/2022/QTR{1,2,3,4}/*.txt with globfinal_docs=[]forfilenameinglob.iglob(f'temp/2022/QTR*/{cik}/*.txt',recursive=True):documents=DocParser(filename,"13F").parse()# filter for INFORMATION_TABLEfordocumentindocuments:ifdocument["type"]=="INFORMATION_TABLE":document["filename"]=filename# add year and quarter to documentyear,quarter=get_year_quarter_from_path(filename)temp_df=document["df"]# add year quarter to dftemp_df["year"]=yeartemp_df["quarter"]=quarterfinal_docs.append(document)iflen(final_docs)==4:break# sort by quarterfinal_docs=sorted(final_docs,key=lambdax:x["df"]["quarter"].iloc[0])metadata={"filename":f"{output_name}.md","company_name":data.get("outputLabel","Burry"),"category":"13F","date":"2022-12-21","start_date":"2022-01-01","end_date":"2022-12-20","cik":cik,}# eventually parse all this from a metadata yaml file or json fileoutput_to_md(final_docs,metadata)
This code appears to be processing and outputting data from 13F filing documents. The output_to_md function takes a list of data frames (final_docs) and a dictionary of metadata (metadata) as input. It first combines all the data frames in final_docs into a single data frame called combined_df. It then groups combined_df by the cusip column and stores the resulting data frame in grouped_df. It then extracts the last two quarters of data from final_docs and stores them in a data frame called last_two_quarters. It then converts the value column in last_two_quarters to numeric values and pivots the data frame by nameOfIssuer and quarter, storing the resulting data frame in pivot_df. It then plots pivot_df as a bar chart and saves the chart to a file. It then merges the last two quarters of data from final_docs using the pd.merge function and stores the resulting data frame in last_two_quarter_diff.
The function then opens a file with the name specified in the filename field of the metadata dictionary and writes some metadata and the pivoted data frame to it in markdown format. It also writes the data from each data frame in final_docs to the file, grouped by cusip. Finally, it writes the merged data frame (last_two_quarter_diff) to the file in markdown format.
defoutput_to_md(final_docs:List[pd.DataFrame],metadata:dict):""" Output dataframe to markdown file """combined_df=None# iterate across final_docs and append them allfordocinfinal_docs:ifcombined_dfisNone:combined_df=doc["df"]else:# pandas.concatcombined_df=pd.concat([combined_df,doc["df"]])# combine df then group by cusipgrouped_df=combined_df.groupby("cusip")# get 2nd last element in listlast_quarter=final_docs[-2]current_quarter=final_docs[-1]filename=metadata["filename"]csv_filename=filename.replace(".md",".csv")# write to csvcombined_df.to_csv(csv_filename,index=False)merge_df=pd.merge(last_quarter["df"],current_quarter["df"],how="outer")# quarterly_df = combined_df.groupby("quarter")# append last two quarterslast_two_quarters=pd.concat([last_quarter["df"],current_quarter["df"]])last_two_quarters=last_two_quarters[["quarter","nameOfIssuer","value"]]# plot ax grouped by quarter# make value numeric# use .loc[row_indexer, col_indexer]last_two_quarters.loc[:,"value"]=pd.to_numeric(last_two_quarters["value"])png_filename=filename.replace(".md",".png")# make pivot tablepivot_df=last_two_quarters.pivot(index="nameOfIssuer",columns="quarter",values="value")# plot pivot table# first color red, second color blueax=pivot_df.plot(kind="bar",color=["blue","green"])# make image big enough for labelsax.figure.savefig(png_filename,bbox_inches="tight")# quarter diff combinedlast_two_quarter_diff=pd.merge(last_quarter["df"],current_quarter["df"],how="outer",indicator=True)withopen(filename,"w")asf:f.write(f"Title:{metadata['company_name']} -{metadata['cik']}\n")f.write(f"Date:{metadata['date']}\n")f.write(f"Category:{metadata['category']}\n")# merge dff.write("## Quarterly holdings\n\n")# write png_filenamef.write(f"\n\n")f.write("uses last two quarters to compare\n\n")f.write(f"## All tables\n\n")# write all tablesreduced_df=combined_df[["nameOfIssuer","value","sshPrnamt","sshPrnamtType","year","quarter"]]# if combined_df has putCall columnif"putCall"incombined_df.columns:reduced_df["putCall"]=combined_df["putCall"]f.write(reduced_df.to_markdown(index=False))f.write("\n\n")# write headerf.write("## Tables by cusip\n\n")forcusip,groupingrouped_df:f.write("\n\n")f.write(f"###{cusip}\n\n")simple_group=group[["nameOfIssuer","value","sshPrnamt","sshPrnamtType","year","quarter"]]if"putCall"ingroup.columns:simple_group["putCall"]=group["putCall"]f.write(simple_group.to_markdown(index=False))f.write("\n\n")f.write("\n\n")
To view the sample output, please go to the following link:
13F.grandfleet.eu.org
It is generated by pelican, a static site generator written in python. The source code for the site is available at the following link:
https://github.com/FriendlyUser/13F-sec
Pelican is a static site generator written in Python. It is used to generate static HTML files for a website from templates and content written in a markup language such as Markdown or reStructuredText. Pelican is designed to be flexible, allowing users to customize the look and feel of their website and to extend its functionality through plugins. It also integrates with popular tools such as Jinja2 templates and the Pygments syntax highlighter. Pelican is a popular choice for bloggers, documentation authors, and other users who want to create a simple, fast-loading website without the need for a dynamic backend.
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse