Uploaded image for project: 'Metadata Aggregator'
  1. Metadata Aggregator
  2. MDA-42

add per-stage performance instrumentation



    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.6.0
    • Fix Version/s: None
    • Component/s: Pipeline
    • Labels:


      I now have a large number of stages all doing various things, and would like to be able to pick parts of the system worth spending time optimising. To do this, it would be useful to be able to see the amount of time spent executing each stage. Note that for these purposes I'm not very interested in how long it takes to process each item, just the total across all items for that stage.

      What I did to dig down within the amount of time being spent in XSLT processing (which in turn had been found by profiling to be a large proportion of my overall processing time) was tweak doExecute in AbstractXSLProcessingStage to surround the real work with this:

      long startTime = System.currentTimeMillis();
      long endTime = System.currentTimeMillis();
      double time = (endTime - startTime)/1000.0D;
      if (time >= 0.1) {
      log.info("XSL stage " + getId() + " time " + time);

      This logs the elapsed time for each XSL-based stage that takes more than 0.1 seconds to execute.

      This might not be quite the right thing to add as a generic facility, though. Instead it might make sense to (for example):

      • add a boolean logPerformance property to BaseStage
      • perform the logging within execute, not doExecute
      • only execute the System.currentTimeMillis() calls and the other work if logPerformance is set
      • make the threshold another property on BaseStage

      On the other hand, the threshold makes most sense if it is global across all stages, and in some ways that also applies to the logging property itself. I don't think static properties are regarded as very bean-like, though.

      Some of the above is just rambling about possible implementations and can be ignored if we want to do something along these lines. It would be useful to have something that allowed people to be able to dig down into large pipelines to understand performance behaviour without having to modify the aggregator sources as I did.




            ian@iay.org.uk Ian Young
            ian@iay.org.uk Ian Young
            0 Start watching this issue