With this, the pipeline is defined, having the class Pipeline from Spark ML. To make predictions, we need first to fit the model to data, obtaining a PipelineModel. As we have only pretrained stages in our pipeline, no training will be performed, and this will be only a formality.