Reconsider use of Resource#exists in MDA stage initialisation
Description
Environment
Activity
Ian Young May 21, 2020 at 1:19 PM
Phase 2, changed the behaviour of DOMResourceSourceStage
, commit c69f1c301949320c120ad98b6c571234c055b1dd.
Ian Young May 21, 2020 at 9:36 AM
Phase 1, removed the two entirely redundant tests, commit 9b9632ca65633f03eeddd52b6ec91991c3a0d86c.
Ian Young May 21, 2020 at 9:00 AMEdited
Here are the call sites for exists on a Resource (there are other calls on a File, which we're not interested in here):
AbstractXSLProcessingStage calls this during doInitialize, then opens a stream anyway. No added value. This will affect numerous child classes.
DOMResourceSourceStage calls it during doInitialize, but as described in the OP, this seems to be neither necessary nor beneficial.
X509RSAOpenSSLBlacklistValidator calls it during doInitialize, then opens a stream anyway. No added value.
Looks like we should just remove all three of these.
In two cases, we will get slightly different {{ComponentInitializationException}}s but I don't think we even need to mention that as long as we look at what the two cases look like.
In the DOMResourceSourceStage the exception will be deferred to execute-time. This case should be documented in release notes.
Several of the existing MDA stages accept a Spring
Resource
as input, and attempt to validate this by testing for "existence" as part of stage initialization. For example, inDOMResourceSourceStage
:if (!domResource.exists()) { throw new ComponentInitializationException("Unable to initialize " + getId() + ", DOM resource " + domResource.getDescription() + " does not exist"); }
If the
Resource
refers to, say, a local file, this provides rapid termination which is a good thing. For other resource types, such as Spring'sUrlResource
or ourHTTPResource
, it can be counter-productive:It replaces a run-time check as to whether the resource exists when the stage is executed with one which checks whether the resource exists when the stage is initialized.
It provides no details about the nature of the failure (was the URL invalid? did it return a 404, a 500 or something else?)
In the case of a large HTTP-based resource which doesn't respond to HEAD, causes the resource to be downloaded twice.
We should enumerate the places where
Resource#exists()
is used, and remove these checks from places where they are subtracting value rather than adding it. We also need to document the change in behaviour: for example, whether a resulting failure will still be reported at initialization time, or whether it will be deferred to (each) execution of the stage.