Posted onJun 28, 2021 • Originally published atmattlayman.com

Publish to DEV automatically with GitHub Actions

DEV is a great community for developer content. If you have articles that you don't want live all at once, how can you publish on a schedule automatically? In this article, let's use GitHub Actions to get your content online on your timeline.

GitHub Actions is a way to run code on GitHub's servers. The service is effectively a Continuous Integration (CI) service from GitHub. To use GitHub Actions, we create aworkflow in a Git repository. That workflow contains all the steps to run the code we care about.

To publish to DEV, we can:

Use a workflow...
To read a publishing schedule we create...
And check the current time...
And publish an article to DEV if the publish date is in the past.

Before we get to the workflow, let's look at the code that will run.

Creating a schedule

The first step in this whole process is to create a publishing schedule that our script can read. Minimally, that requires two types of data from a list of articles:

The ID of the article
The datetime to publish at

So, where do we get those things?

The first datum comes from your actual unpublished articles. From my observation, the DEV website does not expose this ID in an obvious fashion from the UI, but it's not hard to find.

From your unpublished article page, view the source of the page. In both Firefox and Chrome, you can get to this by selecting "View Page Source" when you right click somewhere on the page.

On the source page, search forarticle-id withCtrl+F (orCmd+F on macOS). You should find a number value like168727.

Now we have what we need to build a schedule. We'll use the JSON format for the schedule because Python can read that format easily. You can name the file whatever you want, but I called minedevto-schedule.json and stored it at the root of my repository.

We also need a datetime when we want to publish the article. I used the ISO-8601 standard to make dates that we can compare to the current time. To continue our example, if I want to publish the article on October 22, 2019 at 2pm Eastern Time in the US, then the schedule looks like:

[{"id":168727,"publish_date":"2019-10-22T18:00:00+00:00","title":"Ep 11"},{"id":167526,"publish_date":"2019-10-15T18:00:00+00:00","title":"Ep 10"}]

We've got the ID asid and the datetime aspublish_date. Thepublish_date uses UTC to store the time. Timezones are hard so I would always recommend storing your datetimes in UTC.

I also included atitle so I know what article the object refers to in the future.title isn't necessary, but it's useful for tracking the articles.

Notice that this is a JSON list. I included an extra article so you can see what this would look like if the script checks multiple articles (which is kind of the point).

Publishing to DEV

With the schedule in hand, we're ready to look at the code that will do all the work. I'll start by showing the whole script, and we'll break it down into each section.

First, the main file,publish.py:

"""Publish scheduled articles to DEV."""importdatetimeimportjsonimportloggingimportsysfromdateutil.parserimportparsefromdevtoimportDEVGatewaylogger=logging.getLogger(__name__)defpublish_scheduled_articles():dev_gateway=DEVGateway()published_articles=dev_gateway.get_published_articles()published_article_ids=set([article["id"]forarticleinpublished_articles])scheduled_articles=get_scheduled_articles()forscheduled_articleinscheduled_articles:ifshould_publish(scheduled_article,published_article_ids):article_id=scheduled_article["id"]logger.info(f"Publishing article with id:{article_id}")dev_gateway.publish(article_id)defget_scheduled_articles():"""Get the schedule."""logger.info("Read schedule.")withopen("devto-schedule.json","r")asf:returnjson.load(f)defshould_publish(article,published_article_ids):"""Check if the article should be published.    This depends on if the date is in the past    and if the article is already published."""publish_date=parse(article["publish_date"])now=datetime.datetime.now(datetime.timezone.utc)returnpublish_date<nowandarticle["id"]notinpublished_article_idsif__name__=="__main__":logging.basicConfig(format="%(levelname)s: %(message)s",level=logging.INFO,stream=sys.stdout)publish_scheduled_articles()

And then the supporting DEV file,devto.py:

importosimportrequestsclassDEVGateway:url="https://dev.to/api"def__init__(self):self.session=requests.Session()self.session.headers={"api-key":os.environ["DEV_API_KEY"]}defget_published_articles(self):"""Get all the published articles."""response=self.session.get(f"{self.url}/articles/me/published",params={"per_page":1000})response.raise_for_status()returnresponse.json()defpublish(self,article_id):"""Publish the article."""data={"article":{"published":True}}response=self.session.put(f"{self.url}/articles/{article_id}",json=data)response.raise_for_status()

The top down view

Let's start our examination with the entry point at the bottom of thepublish.py file.

if__name__=="__main__":logging.basicConfig(format="%(levelname)s: %(message)s",level=logging.INFO,stream=sys.stdout)publish_scheduled_articles()

This code follows Python's popular pattern of checking the__name__ attribute to see if it is__main__. This will be true when the code runs withpython3 publish.py.

Since the script is going to run in Continuous Integration, we configure the logging to output to stdout. Running on stdout is necessary because we don't have easy access to files generated from GitHub Actions, including log files.

The next call is topublish_scheduled_articles. I could have called thismain, but I chose to give it a descriptive name. We'll continue working top down to see whatpublish_scheduled_articles does.

defpublish_scheduled_articles():dev_gateway=DEVGateway()published_articles=dev_gateway.get_published_articles()published_article_ids=set([article["id"]forarticleinpublished_articles])scheduled_articles=get_scheduled_articles()forscheduled_articleinscheduled_articles:ifshould_publish(scheduled_article,published_article_ids):article_id=scheduled_article["id"]logger.info(f"Publishing article with id:{article_id}")dev_gateway.publish(article_id)

The overall flow of this function can be described as:

Get what is already published to DEV.
Get what is scheduled for publishing.
Publish a scheduled article if it should be published.

To see what is published on DEV, we need to dig into the next layer and see how we communicate with DEV.

A gateway to DEV

DEV has a JSON API that developers can interact with. To keep a clean abstraction layer, I used theGateway Pattern to hide the details of how that API communication happens. By doing that, the script interacts with a high level interface (likeget_published_articles) rather than fiddling with therequests module directly.

classDEVGateway:url="https://dev.to/api"def__init__(self):self.session=requests.Session()self.session.headers={"api-key":os.environ["DEV_API_KEY"]}defget_published_articles(self):"""Get all the published articles."""response=self.session.get(f"{self.url}/articles/me/published",params={"per_page":1000})response.raise_for_status()returnresponse.json()

When we create a gateway instance, we start arequests.Session that is configured with the proper API key. We'll talk about how GitHub gets access to that API key later, but you'll need to create a key on the DEV website. You can find this under your Settings/Account section.

With the key available, we can ask DEV about our own stuff. I used the/articles/me/published API to get the list of published articles. The published articles are required to check that the script is idempotent (which means that we can safely run it repeatedly without suffering from weird side effects).

Since the gateway is small, let's look at its other method before moving on.

defpublish(self,article_id):"""Publish the article."""data={"article":{"published":True}}response=self.session.put(f"{self.url}/articles/{article_id}",json=data)response.raise_for_status()

When the script confirms that an article is ready to publish, we change the state of the article by setting thepublished attribute toTrue. We do this state modification with an HTTP PUT.

The other bit of code worth discussing isresponse.raise_for_status(). Truthfully, I was a bit lazy when writing this part of the script.raise_for_status will raise an exception whenever the response is not a200 OK. I put it in as a safeguard in case something about the API changes in the future. The part that's missing (and the laziness that I mentioned) is the handling of those exceptions.

When a request failure occurs, this script is going to fail hard and dump out a traceback. From my point of view,that's fine. Since this will be part of a CI job, the only person seeing the traceback will be me, and I will want whatever data the traceback reports. This is one of those scenarios where a developer tool can have a sharper edge than production-caliber code and be perfectly reasonable.

Checking the schedule

Now we've seen how the gateway works. We can move our attention to the schedule.

defget_scheduled_articles():"""Get the schedule."""logger.info("Read schedule.")withopen("devto-schedule.json","r")asf:returnjson.load(f)

This might be the most vanilla code in the entire script. This code reads the JSON file which has our schedule of articles. The file handle is passed toload to create the list of articles with a future publication date.

forscheduled_articleinscheduled_articles:ifshould_publish(scheduled_article,published_article_ids):article_id=scheduled_article["id"]logger.info(f"Publishing article with id:{article_id}")dev_gateway.publish(article_id)

Here is the heart of the script. We:

Loop through the schedule.
Check if we should publish an article.
Publish the article if we should.

Why did I pull the list of published articles earlier? Now we get our answer when examiningshould_publish which needs the published articles.

defshould_publish(article,published_article_ids):"""Check if the article should be published.    This depends on if the date is in the past    and if the article is already published."""publish_date=parse(article["publish_date"])now=datetime.datetime.now(datetime.timezone.utc)returnpublish_date<nowandarticle["id"]notinpublished_article_ids

There are actually two factors to consider when checking a scheduled article.

Is the article's publication date in the past from now?
Has the script already published the article?

That second question is less obvious until you see that this CI job is running regularly. Because we'll configure the script to run every hour, we want to ensure that we don't try to publish an article multiple times.should_publish guards against multiple publishing by checking if the article'sid is in the list of published IDs.

Now you've seen from top to bottom how this script can publish articles to DEV. Our next step is to hook the script into GitHub Actions.

Running on GitHub Actions

GitHub Actions works by defining workflows in a.github/workflows directory in your repository. The workflow file is a YAML file with the commands to do the work.

We're calling our workflow.github/workflows/devto-publisher.yaml, but you can name it whatever you want as long as the extension ends with.yaml or.yml.

As before, let's look at the entire workflow file and break down the pieces to get an understanding of what's happening.

name:DEV Publisheron:schedule:-cron:'0****'jobs:publish:name:Publish to DEVruns-on:ubuntu-lateststeps:-name:Get the codeuses:actions/checkout@v1with:fetch-depth:1-name:Set up Pythonuses:actions/setup-python@v1with:python-version:'3.x'architecture:'x64'-name:Install packagesrun:pip install -r requirements/pub.txt-name:Run publisherrun:python bin/publish.pyenv:DEV_API_KEY:${{ secrets.DEV_API_KEY }}

You can optionally give your workflow aname. I like adding a name because it's friendlier for me to read in the logs over the filename. If filenames suit you better, skip it!

on:schedule:-cron:'0****'

Theon section is critical and defines when your action will occur. There are tons of different events you can use. For this workflow, we need theschedule trigger. All the gory details about it are in thedocumentation.

If you're familiar with cron, this syntax won't shock you. Non-cron users might wonder what the heck is going on. The docs explains all the syntax, but this particular string of0 * * * * means "run on the zeroth minute of every hour of every day." In other words, "run at the start of every hour."

jobs:publish:name:Publish to DEVruns-on:ubuntu-latest

All workflows need at least one job. Without a job, there is nothing for the action to do. I've named this jobpublish, given it a friendly name, and defined what operating system to run on (namely, Ubuntu).

steps:-name:Get the codeuses:actions/checkout@v1with:fetch-depth:1-name:Set up Pythonuses:actions/setup-python@v1with:python-version:'3.x'architecture:'x64'-name:Install packagesrun:pip install -r requirements/pub.txt-name:Run publisherrun:python bin/publish.pyenv:DEV_API_KEY:${{ secrets.DEV_API_KEY }}

The job runs a series ofsteps. Again, I tried to name each step so that we can get a readable flow of what is happening. If I extract the names, we get:

Get the code
Set up Python
Install packages
Run publisher

The first two steps in the process pull from pre-built actions developed by GitHub. GitHub gives us these actions to save time from figuring these things out ourselves. For instance, without theactions/checkout@v1 action, how would we get a local clone? It would be really challenging.

The third step starts to use configuration specific to my job. In order for the script to run, it needs dependencies installed so we can importrequests anddateutil. I defined arequirements/pub.txt file that includes all the needed dependencies.

Here's what the requirements file looks like:

## This file is autogenerated by pip-compile# To update, run:##    pip-compile requirements/pub.in#certifi==2019.9.11        # via requestschardet==3.0.4            # via requestsidna==2.8                 # via requestspython-dateutil==2.8.0requests==2.22.0six==1.12.0               # via python-dateutilurllib3==1.25.3           # via requests

Finally, we get to the step that runs the publisher.

-name:Run publisherrun:python bin/publish.pyenv:DEV_API_KEY:${{ secrets.DEV_API_KEY }}

This is pretty vanilla except for theDEV_API_KEY line. Look back at theDEVGateway.__init__ method. The method pulls fromos.environ to get the API key. In our YAML file, we're wiring theDEV_API_KEY environment variable to thesecrets.DEV_API_KEY. We need to tell GitHub about our API key.

In your GitHub repository settings, we can define secrets. These secrets are available to workflow files as we see from the "Run publisher" step. Create a secret namedDEV_API_KEY that contains your DEV API key.

Now everything is in place to run the publisher. This workflow will execute every hour and run thepublish job. That job will setup the environment and execute thepublish.py script. The script will call DEV and publish any articles in your schedule that should be published.

What did we learn?

In this article, I tried to show off a few different topics.

How to communicate with the DEV API
How to create a file that a computer can read and parse (i.e., the JSON schedule)
How to use GitHub actions to run some code for your repository.

Want to see all this code in context? Therepository for this website has all the files we covered.

There are so many other things you can do with GitHub Actions. Now you can write about them all and put them on a schedule for other developers to read when you're ready.

If you have questions or enjoyed this article, please feel free to message me on Twitter at@mblayman or share if others might be interested too.

Top comments(1)

Michael Cade

Community first Technologist @Veeam @KastenHQ A big interest in DATA, and growing urge to learn more about Kubernetes, Cloud Native, and the Future of Software