I was setting up a Trino cluster for one of my clients on GCP and shutting down nodes causes query errors.
I have been using Trino previously called Presto on GCP now for over 4 years. Its
an amazing product with phenomenal response times when querying Hive over HDFS
(ORC) format.
I need to shutdown Trino (Workers/Nodes) during specific hours
or even refresh them or resize the number of workers throughout the day or
night. Clusters run 24x7 as to support UK and USA time zones as well as other
countries in between. Using an instance group was challenging as when you issue
a command to resize an Instance Group the shut-down script method per VM was
hit and miss and its also unpredictable.
A GCP instance group shuts down the Network Communication on the machine
when issue a resize command and does not give the Trino Worker (Machine) enough
time to fulfil any query that is currently running. Once you issue the shutdown for a VM within an
instance group you only have a 90-120 second time window. This was just not
good enough for my client as some queries can run anywhere between 1s and 1hr …
So, I ruled out the shutdown-script … No good, wont work.
A little digging around later, I wrote a shell script .. not
rocket science but something that will do the trick and get the job done as I wanted
to cover a few scenarios that can run in airflow or a CRON job.
1.
Shutdown a Trino worker and
Remove the worker VM from an Instance group once it has finished all its tasks
(Important finished all its tasks so the worker will remain in SHUT_DOWN
state till all its work has finished and let the Co-Ordinator know its going to shutdown so it wont take on any more work).
2.
Im able to resize the Trino
Cluster on Demand Up or Down.
3.
Bring Different size
Workers to the Trino cluster dependant upon time of day for cost efficiency.
Such as I want highmem16 or highmem8 machines.
Shell script is simple . It’s a divide and rule approach. I
will tell the worker to shut down gracefully and once it has done everything it
needs or wants to do I will check if the node is alive and the delete the
machine from the instance group.
So to conquer this approach I have split the script into
multiple processes
1. SIGNAL-SHUTDOWN – Tell the workers to shutdown (Pass Number of
Workers to Shutdown) , I then store the VM hostnames in a file as I’m telling
them to shutdown only for the number of workers required.
2. SHUTDOWN as a separate process, In this part of the script I’m
checking if the VM (Trino Node is not in ACTIVE or SHUTDOWN state, If Trino has
shut down the worker completely you wont get a response back. So, take that as
a Kill signal, remove the VM and remove from the list of machines that need to
removed from the instance group. This process can be run on a timer, to keep
checking in case 2 of let’s say 5 nodes have not yet shutdown as the queries
they are running take longer than the others.
3. RESIZE, I use this just for
Scaling Upwards as its literally a resize of an instance group.
4. SHUTDOWN-RESIZE, This one
can also be used after SIGNAL-SHUTDOWN but will recreate the Trino Worker
machines after they have shutdown too. This can be used to refresh the worker
VM’s. Trino does like to be refreshed to keep everything ticking over
correctly.
The Script takes a few simple Parameters
1.
The Number of Trino Workers
(VM’s to Remove)
2.
Signal Type ()
3.
Instance Group Name
4.
Number of Workers to Resize
too.
Feel free to Amend and use the script and enhance it to your
requirements. We have used apache airflow to orchestrate the commands and use
sensors to see when the process is complete by issuing ALLDONE once the VM list
file is empty. (Check out the script). Keep it simple is the key.
The Script is Available on our Git Repo : https://github.com/deliverbi/Trino-GCP
In a future post I will write up the instructions on how to
create an auto-scaling Trino Cluster using instance groups on GCP.
Note: only a member of this blog may post a comment.