Hello again and welcome to the third post in my Exploring AWS IoT Services series! In my previous post, I walked readers through how we can use AWS IoT Rules to listen to topics, format message payloads, and route messages to other AWS services.
In this post, we’ll look at using AWS IoT services to execute tasks on IoT devices. More specifically, we’ll use IoT Jobs to update configuration information on a fleet of devices. This post builds on the first and second in this series so please work through both of them if you haven’t already.
Before we begin, please note that this post references files found in the following Git repository:
Generally speaking, an IoT “job” entails subscribing devices to an MQTT topic in order to listen for events containing information that listening devices can use to perform various tasks and operations such as software updates, certificate rotation, reboots, or configuration changes. To make this point clearer, let’s look at how AWS describes the IoT Jobs component of its IoT Core service.
As previously mentioned, we’ll be simulating an IoT device being reconfigured using information provided to it by IoT Jobs. This will entail subscribing to a topic, downloading a web-hosted file, overwriting the device’s current configuration, and then using the newly introduced configuration to continue normal operations.
First, let’s first speak to what an IoT Job is and what a Job is comprised of.
“A job is a remote operation that is sent to and run on one or more devices connected to AWS IoT. “
The main components of an IoT Job are as follows:
Job document - Job documents are UTF-8 encoded JSON documents and contain information that your devices need to perform a job. A job document contains one or more URLs where the device can download an update or other data.
Targets - A list of IoT devices targeted to perform a set of operations. The targets can be “things” or “thing groups” (or both).
Job type - How the job is delivered to its targets.
Snapshot: This is used to send a job document to a specific set of devices. After the targeted devices complete the job (or report that they're unable to do so), the job is complete.
Continuous: This is used to target dynamic thing groups and will send the configured job document to all devices contained in a group. As the job operates in perpetuity, the configured job document will be delivered to devices that are later added to the group.
Knowing this, and assuming that job targets and types are relatively easy to understand, let’s talk about job documents.
Job documents are JSON-formatted instructions an IoT device will utilize to perform some set of tasks or operations. These documents are consumed by “things” and can be comprised of whatever is needed by your device. A common convention is to provide devices a URL (or set of URLs) where remote resources can be accessed and utilized by the task at hand.
This brings us to pre-signed URLs. AWS provides a mechanism for accessing private files in an Amazon S3 bucket via pre-signed URLs. This is a method of temporarily authenticating a device and authorizing it to perform an HTTP operation (GET or PUT) on a (potentially) sensitive S3 object. In other words, pre-signed URLs are an easy way to allow an IoT device to download a file from Amazon S3 and keep that file protected from unauthorized access.
“By default, all S3 objects are private. Only the object owner has permission to access them. However, the object owner can optionally share objects with others by creating a presigned URL, using their own security credentials, to grant time-limited permission to download the objects.“
In the context of IoT jobs, you can create S3 presigned URLs via the use of a placeholder.
This document contains a single action and URL. Note the S3 pre-signed URL placeholder being used here. Write this JSON document to a file to be used later.
I’m assuming you’ve been following along with my previous two posts and already have a private S3 bucket provisioned. We created one in the second post of this series. If not, you can create one by executing the following AWS CLI commands.
With our policy documents written, we’ll create a role and attach an inline policy to it using the following AWS CLI commands:
aws iam create-role --role-name trek10-iot-job-role \
aws iam put-role-policy --role-name trek10-iot-job-role \
--policy-name trek10-iot-job-role-policy \
Now that we have our job’s role provisioned, we’ll turn our attention to the IoT policy we’ll attach to the devices (things) we want to manage. If you recall from the first post in this series, we created a basic IoT policy that allowed our simulated thing to publish and subscribe to an IoT Core topic. For this post, we’ll create a new policy that allows our devices to interact with special topics used by IoT Jobs.
Execute the following CLI command to write an IoT policy to “/tmp/trek10-iot-job-policy-2.json”. Note that you’ll need to change the account ID ("111222333444") and region to match the ID of your AWS account and region you're working in:
You’ll note various placeholder variables used in this policy.
iot:ClientId - The client ID used to connect to the AWS IoT Core message broker.
iot:Connection.Thing.ThingName - This resolves to the name of the thing for which the policy is being evaluated.
These “policy variables” help us apply the Principle of Least Privilege to IoT policies. They can be used in IoT Core policies in “Resource” or “Condition” blocks and are replaced by actual values when a policy is evaluated.
You can read more on how to use these variables by referencing the following link.
“Devices can communicate with AWS IoT Jobs using the MQTT protocol. Devices subscribe to MQTT topics to be notified of new jobs and to receive responses from the AWS IoT Jobs service. Devices publish on MQTT topics to query or update the state of a job execution. Each device has its own general MQTT topic.”
To be more specific, a thing will typically interact with the following topics when working through an IoT Job:
$aws/things/<thing-name>/jobs/notify AWS Jobs uses this topic to notify devices when a job execution is
added or removed from the list of pending job executions.
$aws/things/<thing-name>/jobs/notify-next AWS Jobs uses this topic to notify devices when the next pending
job execution has changed.
$aws/things/<thing-name>/jobs/<job-id>/get AWS Jobs will publish a list of all jobs for a given thing on this topic.
$aws/things/<thing-name>/jobs/<job-id>/update Used by devices to update job execution status.
$aws/things/<thing-name>/jobs/<request type>/accepted AWS Jobs will publish success messages for a given thing on this topic.
$aws/things/<thing-name>/jobs/<request type>/rejected AWS Jobs will publish failure messages for a given thing on this topic.
$aws/things/<thing-name>/jobs/<job-id>/<request type>/accepted AWS Jobs will publish success messages for a given thing and job
on this topic.
$aws/things/<thing-name>/jobs/<job-id>/<request type>/rejected AWS Jobs will publish failure messages for a given thing and job
on this topic.
You can read more about these topics and how they’re used at the following links:
For the sake of this blog post, we’re going to keep things relatively simple. The workflows employed by our simulated thing(s) will have two different entry points to obtain job information that resolve in a single path of logic that handles the actual job execution.
Entry Point 1: Upon device boot (execution of our simulation script):
Subscribe to the “$aws/things/<thing-name>/jobs/notify”
Publish a message to the “$aws/things/<thing-name>/jobs/get” topic.
Sleep for a few seconds so you can view the IN_PROGRESS state of the job execution for each device.
Parse the JSON payload to obtain the job document information.
Download our configuration file store in S3 using the pre-signed URL generated by IoT Jobs contained in the job document.
Backup the current configuration file.
Overwrite the current configuration file with the one downloaded from S3.
Publish a “SUCCEEDED” message to the “$aws/things/<thing-name>/jobs/<job-id>/update” topic if no issues are encountered.
Publish a “FAILED” message to the “$aws/things/<thing-name>/jobs/<job-id>/update” topic if issues are encountered.
The logic supporting this workflow will be able to handle having one or more jobs simultaneously queued for a given device fleet. However, note that only a single job will be executed at a time. This is made possible largely by how IoT Jobs publishes messages to the “$aws/things/<thing-name>/jobs/notify” topic.
Something else to note is that I added the ability for our simulated devices to recover their configuration files in the event that a job gets canceled during its execution. Having a device periodically check for job termination during the course of executing a task could prove useful in certain situations. I’ll leave it up to the reader to read over the Python device script to see how this functionality (along with the aforementioned workflows) were coded.
Should you want to experiment with canceling jobs, you will need to add the “force” flag to the API call. This looks like the following when using the AWS CLI:
aws iot cancel-job --job-id <job-id> --force
This seems like an appropriate time to note that it is a good idea to match your devices’ client IDs to the thing names registered in IoT Core. As shown above, the topics that are utilized by IoT Jobs are dynamically based on thing names and not a device’s client ID. Matching client IDs with thing names makes life easier when configuring devices as you only need to remember one ID per device instead of two.
For the sake of demonstrating how to run a job on a fleet of devices, we’ll utilize three IoT things in this post. Execute the following AWS CLI commands to create our things:
At this point, we’ll need to attach the IoT policy we just created to a device certificate and then associate the certificate with our things. Remember that we already created a certificate in the first post of this series for our simulated thing to use. We’ll repurpose that certificate by associating it with the three things we’re using for this exercise.
We’ll need to obtain the certificate ARN prior to doing this. Execute the following command to retrieve your certificate’s ARN:
aws iot list-certificates
You should see something like the following returned on the console:
Now that we have everything in place to create an IoT job let’s talk about the scripts we’re using to simulate the devices in our thing fleet. The original version of the publish and subscribe scripts used in the first and second posts in this series had important variables hard-coded at the top of each file. I revised both to read in a JSON configuration file to help illustrate how to update a file on a device through use of an IoT Job. The second version of these scripts now looks for a configuration file in a directory named “conf”.
An additional caveat to note is that we’re using three copies of the subscribe script to illustrate the three devices in our thing fleet. Each copy has a hard-coded device ID and looks for a config file with the path “conf/config-N.json” (where N is 1, 2, or 3).
I’ve added a script (“create-config-files.sh”) to the repo supporting this blog series that will populate all three config files for you. Just make sure to edit the following two variables within the script before executing.
You will populate AWS_IOT_DATA_ENDPOINT with the value returned by the following AWS CLI command:
You will populate CERT_DIR with the path to the directory where you stored the certificate files created in the first post of this series.
Note that this script populates the temperature scale property with a “c” for Celsius. This is the starting configuration our devices will use. As you can guess, this is what we’ll be looking to alter via use of an IoT job. Which brings us to our next step.
After editing and executing create-config-files.sh, copy one of the configuration files written to the “conf” directory to “/tmp/config.json”, open it in an editor, and change the “state” property’s value to “f”. Once you’ve finished editing your copy, execute the following AWS CLI command to copy it to our bucket:
You may remember this path from the pre-signed URL we placed in the job document we created earlier in this post. This is the updated config file we will push to our thing fleet from our IoT job to ensure devices report temperatures using the Fahrenheit scale.
At this point, we are ready to activate our simulated IoT devices. Let’s start this by opening three command prompts and executing a copy of the subscribe script in each window. Make sure you’ve installed the “paho.mqtt” package before doing so. Please revisit my first post if this doesn’t sound familiar.
Execute the following commands in three separate windows:
Open up two more command prompts so we can run our publish script in one and watch the “conf” directory for changes in another.
Execute the following command to list the “conf” directory every second:
watch -n 1 "ls -lh conf"
Execute the following command to run the updated version of the publish script in a loop every second:
while true; do python3 publish-to-iot-core-v2.py; sleep 1; done
Notice that the publish script is publishing messages with message IDs from 1 to 3 with all three simulated devices having their temperature scale property set to “c” for Celsius.
We’ll open up yet one more command prompt to create our job and monitor its execution. Once opened, we’ll execute the following AWS CLI commands to create our IoT job. Make note that we’re running this with the SNAPSHOT job type.
Once the job has been created, you should see something like the following in each of the simulated device’s output:
What you should have witnessed was the successful execution of the workflow mentioned above. Let’s recap what happened.
Our device (“trek10-thing-1”) connected to IoT Core and subscribed to the “$aws/things/trek10-thing-1/jobs/notify” topic.
Our device published a message to the “$aws/things/trek10-thing-1/jobs/get” topic in order to obtain a list of pending jobs.
IoT Jobs publishes a message on “$aws/things/trek10-thing-1/jobs/get/accepted” with an empty list of pending jobs.
Shortly after creating our job, IoT Jobs published a message on the “$aws/things/trek10-thing-1/jobs/notify” topic to let our device know a new job has been queued.
Our device published a message to the “$aws/things/trek10-thing-1/jobs/job-1/get” topic to retrieve information about the job that was just queued.
IoT Jobs published a message to the “$aws/things/trek10-thing-1/jobs/job-1/get/accepted” topic that contained a list of pending jobs.
Our device published an “IN_PROGRESS” message to the “$aws/things/trek10-thing-1/jobs/job-1/update” topic to update the job execution status.
IoT Jobs published a message to the “$aws/things/trek10-thing-1/jobs/job-1/update/accepted” to inform our device that the execution status update was successful.
Our device published a message to the “$aws/things/trek10-thing-1/jobs/job-1/get” topic to determine if the status has changed and the job execution needs to be terminated.
IoT Jobs published a message to the “$aws/things/trek10-thing-1/jobs/job-1/get/accepted” to provide our device with the status update initiated in step 9.
Exactly as in step 9, our device published a message to the “$aws/things/trek10-thing-1/jobs/job-1/get” topic again to determine if the status has changed and the job execution needs to be terminated.
Exactly as in step 10, IoT Jobs published a message to the “$aws/things/trek10-thing-1/jobs/job-1/get/accepted” to provide our device with the status update initiated in step 11.
Our device published a “SUCCEEDED” message to the “$aws/things/trek10-thing-1/jobs/job-1/update” topic to update the job execution status.
IoT Jobs published a message to the “$aws/things/trek10-thing-1/jobs/job-1/update/accepted” to inform our device that the execution status update was successful.
IoT Jobs published a message to the “$aws/things/trek10-thing-1/jobs/notify” topic to let our device know that there weren’t any pending jobs.
Steps 9 through 12 were executed to allow the device to determine if the job was canceled by IoT Jobs. This is the job termination logic I spoke of earlier. FYI, these checks occur before and after the configuration file is overwritten.
Now let’s take a closer look at the message published to the “$aws/things/trek10-thing-1/jobs/job-1/get/accepted” topic by IoT Jobs in step 6 that contained the job document needed by our device to update its configuration file.
Note that IoT Jobs translated our S3 pre-signed URL placeholder in the job document to a functional URL. This is what allowed our device to authenticate to S3 and download the new configuration file.
Looking at the “conf” directory we see our device configuration files were backed up and updated. We can also verify that our devices are now reporting temperatures in Fahrenheit by setting the scale property set to “f”. You may also note that actual temperatures are within the Fahrenheit range. Seems like they were running a little hot!
And finally, we look over the job execution details and notice that each device's status transitioned from QUEUED to IN_PROGRESS and then finally to “SUCCEEDED”.
You can also watch the execution status change within the web console. Once completed, our job’s execution status ought to look like the following in the web console:
With the thing-specific execution statuses looking like the following:
Hopefully, you were able to duplicate this exercise and witnessed the same outcome. If not, go back and see what you might have missed.
And with that, we’ve successfully worked through learning how to use IoT Jobs to execute a task on a fleet of IoT devices. More specifically, we were able to successfully push a new configuration file to alter the temperature scale of data being reported by devices.
You can utilize a script I included in the Git repository supporting this blog series should you want to repeat the exercise we just worked through. This script deletes the overwritten configuration files (along with any backups), recreates the configuration files, and then uses the AWS CLI to delete the job created during the exercise. You will supply this script with the job ID. used during the exercise. This script is executed in the following manner:
You may have to wait a minute while the IoT job deletes before recreating and monitoring the job. Alternatively, you can change the JOB_ID variable to save yourself some time.
Much like the previous post in this series, I have created a script to automate all of the steps undertaken in this exercise outside of the actual job creation itself. A cleanup script is also available for deleting all of the generated resources.
And lastly, thanks for spending time with me again! Please do reach out for support on your next IoT project or even just to brainstorm how Trek10 can help.