Scheduler is a small task scheduling utility built for Django using Celery.
I had to execute a task at specific intervals. At first, I tried the one and only, fan-favorite, cronjob. Luckily, Celery has something which does the exact same thing -
@periodic_task. Just enter your cron rule and your job is done. Voila! Problem solved. Celebrations were too early and I just discovered a few scenarios which could not be done with cron jobs alone.
To understand this, let's see how a complicated rule of our old, faithful cron looks like.
5-30/3 */2 * 1,5,9 2
The above rule translates to,
This seems all nice and fair. However, I had some bizaare use cases.
- Task should execute every 2nd Friday
- Task should execute on second last day on every month
- Task should execute on every even day
Cron rule for first is still manageable in some cases. For second, it is impossible. And for 3rd, it fails after every month with odd number of days (though the non-standard way of doing it is
0 0 2/2 * *). After head banging for a day and countless Google searches, I just glanced at my calendar app and saw some recurring events, like weekly sprints, month-end discussions, etc. When I tried to create one with a repetition pattern, I saw half a dozen of configurations. Initially, it seemed like some Google kinda stuff (of course, you can only expect such sophistications from a Google-like company). But, then I saw similar settings on Evolution. To my utter surprise, I could edit the reccurence rules set for the event on Google Calendar, and then on the app, those changes got sync'd up. This commonality means that there's some practice or convention followed by both. After digging about it for some time, I hit the jackpot. It's called Recurrence Rules which is an actual standard set by the IEFT (the gods behind everything related to internet including HTTP, LDAP, sockets, etc. You can check the list of standards they have in place here.)
Recurrence rules (RRULE)
They are similar to cron jobs. Like cron, they solve the problem of defining rules for programming date and time patterns. Unlike cron, RRULE is a complete DSL with limited grammar. You can stack multiple RRULEs to create an inclusion-exclusion pattern. Look at this example below,
The above rule executes every day for the next 5 days. Not amused, take another example,
This rule executes every second last day of the month for the next 3 months. Sounds familiar? Yes, this solves my second use case too.
One by one, all my usecases were handled by RRULEs. But building a complete RRULE parser in Python is a prodigious chore. I had written an article about writing DSLs in Python a while back which you may refer to understand how much does it take to build a basic DSL. This was just too much for me to do. I almost settled for cronjobs again when I realised that the
python-dateutil library has something already built related to RRULEs. Oh yes! They had built an RRULE parser (
dateutil.rrule) which, in fact, also has a Python API giving an even greater control of the individual elements of the RRULE. Such a sweet delight!
The library has a function
dateutil.rrule.rrulestr which converts a raw RRULE string into a
rrule object. The best part about this object is, some parameters of the RRULE can be changed directly by modifying the object properties, like the
dtstart parameter. Also, the best part is, the
rrule object is a generator, which means I can iterate over it and it will give me the next timestamp in the sequence. Overall, this utility provided the perfect suite for managing RRULEs.
Th initial idea for flow was,
- Get the RRULEs from the database defined by the user
- Build the
- Get the next timestamp in the sequence
- Pickle the
rruleobject and store it in the database and execute the function at the specified timestamp using the
etaargument in Celery's
- When the function gets executed, unpickle the
rruleobject, get the next timestamp and then repeat step 4.
- Repeat until
rruleno longer returns a timestamp.
However, there was one major issue. The
rrule object was unpickable. Further I realised, the
rrule object was not a generator but an iterator instead. A workaround for this (a hack!) was to get the time when the function last executed and keep fetching the next timestamp from the
rrule iterator until the next timestamp was greater than the current one. Since I no longer had to pickle the
rrule object, I just stored the last executed timestamp in the database which, in my opinion, is a much cleaner solution and much more library agnostic.
Also, I didn't want to interfere with the usual way Celery works and wanted this flow to be introduced with as little custom code as possible. So, instead of creating a new task decorator, I preferred Celery's task inheritence and created a base task class called
RepeatTask which contained the entire logic for fetching the function, fetching the arguments and keyword arguments, executing the function, getting the next timestamp and then scheduling the task for next execution.