Finn continues to get updated on a regular basis. A best practice is to ensure you are using a specific version of Finn for your production forecast. This can be done through the use of the renv package while using Finn on your local machine, or using docker containers for running Finn in the cloud.
Finn was built to run at scale in Azure, leveraging spark as the parallel back end. Check out the parallel processing vignette to learn how to get Finn running on Azure services like Databricks. The best way to run Finn in production is through the use of Azure Machine Learning, specifically Azure ML Pipelines.
Below are a few tips for leveraging Azure ML Pipelines
forecast_time_series()
so you can have different pipeline
steps ran for each step of the Finn forecast process.add_unique_id
to FALSE within set_run_info()
.
That will let you call the exact same run info in each separate pipeline
step. Make sure that you are using your own unique run_name
within set_run_info()
, one that is different than previous
Finn runs but the same name when using set_run_info()
in
each pipeline step.prep_data()
and prep_models()
, use the
default spark cluster settings. Where each spark task gets sent to a
specific core on an executor.train_models()
,
ensemble_models()
, and final_models()
consider
adjusting the spark cluster settings like “spark.executor.cores” equal
to 1 and set inner_parallel
to TRUE within each function.
That way only a single task/time series gets sent to each spark executor
node, and all cores within that node can be used during the modeling
process for that task/time series. Use num_cores
within
each function to control how many cores on the executor to use, with the
default being all available cores minus one. This can significantly
speed up run time, but if you have many time series and want to run all
of the models within Finn ensure that you have a spark cluster that can
scale to many VM’s.