WMI Counters
On install, Windows Management Instrumentation (WMI) counters are set up to report data and events from the Endpoint Web Service and Dispatch Service. The simplest way to access these is through the Windows Performance Monitor,
which is included in Windows.
Please see Alerting Recommendations for suggestions about which counters counters to monitor for alerting purposes.
PA Data Hub Category
The PA Data Hub category describes general events and rates for both the Endpoint Web Service and Dispatch Service.
The counters are:
-
Dispatch Service Status
- Set to
1
when the Dispatch Service is Running.
- Set to
0
otherwise.
- This status can also be monitored by checking the state of the Dispatch Service using Windows Service tools.
- Endpoint Tasks Available
- Indicates how many concurrent tasks are available to deliver messages from the endpoint queue.
- Endpoint Tasks In Use
- Indicates how many concurrent tasks are delivering messages from the endpoint queue.
-
Endpoint Queue Size
- The number of messages in the Endpoint Queue, including those currently being processed.
-
Endpoint Queue Status
- Set to
1
when the last message received was successfully queued.
- Set to
0
otherwise (either no messages yet received or the last message received was not successfully queued).
-
Endpoint Status
- Set to
1
when the Endpoint Web Service is active.
- Set to
0
when the Endpoint Web Service is inactive.
- WARNING: If IIS Idle Time-out is enabled, the service will become inactive after a period of inactivity (20 minutes by default), but become active again when receiving a new request.
- Incoming Message
- Rate at which messages (requests) are received by the Endpoint Web Service.
- Incoming Message Failed
- Rate at which messages fail to be enqueued by the Endpoint Web Service.
- Incoming Message Succeeded
- Rate at which messages are enqueued by the Endpoint Web Service.
- Incoming Message Succeeded Bytes
- Rate at which message bodies, in bytes, are enqueued by the Endpoint Web Service.
-
Looped Message Discarded
-
Looped Message Dispatched
-
Message Movement Failed
- Rate at which messages fail to move from one queue to another due to an unexpected problem on the destination queue.
-
Message Not Matched To Destination
- Rate at which messages are discarded because they do not match any destination's criteria, even though they were processed successfully.
-
Queuing Blocked
- Set to
0
during normal operations, changed to 1
if the Data Hub is unable to queue a message because RabbitMQ has raised a disk or memory alarm.
- Response Time
- Measures the time spent responding to incoming requests, in milliseconds.
PA Data Hub Dispatch Destinations Category
The PA Data Hub Dispatch Destinations category describes events and rates that are tied to the configured destinations. Each destination has an associated WMI instance to allow monitoring individual destinations
as needed.
The counters are:
-
Destination Status
- Set to
1
when the destination is accepting messages.
- Set to
0
when the destination is considered offline.
- Dispatch Attempted
- Rate at which messages are attempted to be dispatched to the given destination.
- Note that this counter includes messages that are sent directly to the offline queue, when the destination is considered offline.
-
Dispatch Destination Error
- Rate at which messages receive error responses (e.g., 500 Internal Server Error) from the destination.
- Dispatch Destination Offline
- Rate at which messages receive offline responses (e.g., 404 Not Found) from the destination.
- Note that this counter includes messages that are sent directly to the offline queue, when the destination is considered offline.
- Dispatch Failed
- Rate at which messages fail to be delivered to the destination for any reason, including error and offline responses.
- Dispatch Succeeded
- Rate at which messages are successfully delivered to the given destination.
- Dispatch Succeeded Bytes
- Rate, in bytes, at which message bodies are successfully delivered to the given destination.
- Error Message Given Up
- Rate at which messages that continue to result in error responses are deleted.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
- This rate is affected by the errorRetry and errorGiveup settings.
PA Data Hub Dispatch Destinations By Source Queue Category
The PA Data Hub Dispatch Destinations By Source Queue category describes events and rates that are tied to the queues used for each configured destination. Each destination has an associated WMI instance. This
category is primarily for use in debugging production issues.
- Dispatch Attempted From Endpoint
- Rate at which messages originating from the Endpoint Queue are attempted to be dispatched to the given destination.
- Dispatch Attempted From Error
- Rate at which messages originating from the destination's Error Queue are attempted to be dispatched to the given destination.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
- Dispatch Attempted From Offline
- Rate at which messages originating from the destination's Offline Queue are attempted to be dispatched to the given destination.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
- Dispatch Failed From Endpoint
- Rate at which messages originating from the Endpoint Queue fail to be delivered to the destination for any reason, including error and offline responses.
- Dispatch Failed From Error
- Rate at which messages originating from the destination's Error Queue fail to be delivered to the destination for any reason, including error and offline responses.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
- Dispatch Failed From Offline
- Rate at which messages originating from the destination's Offline Queue fail to be delivered to the destination for any reason, including error and offline responses.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
- Dispatch Succeeded Bytes From Endpoint
- Rate at which message bodies, in bytes, are successfully delivered to the given destination from the Endpoint Queue.
- Dispatch Succeeded Bytes From Error
- Rate at which message bodies, in bytes, are successfully delivered to the given destination from the destination's Error Queue.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
- Dispatch Succeeded Bytes From Offline
- Rate at which message bodies, in bytes, are successfully delivered to the given destination from the destination's Offline Queue.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
- Dispatch Succeeded From Endpoint
- Rate at which messages originating from the Endpoint Queue are successfully delivered to the given destination.
- Dispatch Succeeded From Error
- Rate at which messages originating from the destination's Error Queue are successfully delivered to the given destination.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
- Dispatch Succeeded From Offline
- Rate at which messages originating from the destination's Offline Queue are successfully delivered to the given destination.
- WARNING: Instances whose destinations are non-durable (and the built-in
_all_non_durable_destinations
aggregate instance) will always be 0
.
-
Error Queue Size
- The number of messages in the destination's Error Queue, including those currently being processed.
- Offline and Error Tasks Available
- Indicates how many concurrent tasks are available to deliver messages from the Offline and/or Error queues.
- Offline and Error Tasks In Use
- Indicates how many concurrent tasks are delivering messages from the Offline and/or Error queues.
-
Offline Queue Size
- The number of messages in the destination's Offline Queue, including those currently being processed.
Alerting Recommendations
First, we recommend monitoring the Dispatch Service process and and Endpoint IIS website directly, rather than just relying on the Data Hub's WMI counters to notify you of the overall service status. If the Dispatch Service's process
is killed, or IIS is killed, the WMI counters will not be updated to reflect that they are no longer active. If you are just monitoring status via these WMI counters, you will not be notified of a problem in this type of situation.
Additionally, we recommend monitoring the following resources of the host itself:
- CPU: Ensure the total CPU usage is not above 90% for a period of 20 minutes.
- Memory: Ensure the amount of free memory is not below 10% of total memory.
- Disk: Ensure all disks have at least 3 GB free space. MongoDB and RabbitMQ will stop accepting new data once free disk space drops to 2GB or 1GB, respectively.
Once basic process and resource monitoring is in place, we recommend using the following counters for alerting purposes:
- PA Data Hub category
- Dispatch Service Status
- If
0
, no destinations are receiving messages because the Dispatch Service is not running.
- Endpoint Queue Size
- If too large (actual value depends on usage scenario), the Data Hub is not dispatching fast enough to keep up with the incoming message rate.
- Endpoint Queue Status
- If
0
(and at least one message has been sent to this Data Hub), messages cannot be queued by the endpoint and will be rejected.
- Endpoint Status
- If
0
, the Endpoint will not respond to requests from clients, unless it was deactivated due to IIS Idle Time-out. To better monitor the Endpoint Status, we recommend disabling the Idle Time-out.
- Looped Message Discarded
- If not
0
, a dispatch configuration (possibly on another Data Hub) is probably misconfigured.
- Message Movement Failed
- If not
0
, there is some critical issue preventing the Data Hub from operating, such as insufficient space on disk.
- Message Not Matched To Destination
- If not
0
, some messages are being dropped because they are not matched to any destination's rules.
- Queuing Blocked
- Set to
0
during normal operations, changed to 1
if the Data Hub is unable to queue a message because RabbitMQ has raised a disk or memory alarm.
- PA Data Hub Dispatch Destinations category
- Destination Status
- If
0
, this destination appears to be offline.
- PA Data Hub Dispatch Destinations By Source Queue category
- Error Queue Size (for durable destinations)
- If too large (actual value depends on usage scenario), a large number of message attempts are receiving error responses from the destination.
- Offline Queue Size (for durable destinations)
- If too large (actual value depends on usage scenario), a large number of message attempts are receiving offline responses from the destination.
The following counters may also be helpful to monitor, depending on your particular needs:
WMI Instances
With the exception of the PA Data Hub category, all counters have associated WMI instances, one for each destination. The instance names are based on the destination ID specified in the Dispatch configuration,
with the following changes:
- Names are always lowercase.
- Parentheses -
(
and )
- are replaced by brackets - [
and ]
.
- Hash symbols and slashes -
#
, /
, and \
- are replaced by underscores - _
.
In addition to such destination instances, there are also three aggregate instances:
_all_destinations
- The sum of the counts for all destinations.
- In the case of Destination Status, set to 1 when all destinations are considered online.
_all_durable_destinations
- The sum of the counts for all durable destinations.
- In the case of Destination Status, set to 1 when all durable destinations are considered online.
_all_non_durable_destinations
- The sum of the counts for all non-durable destinations.
- In the case of Destination Status, set to 1 when all non-durable destinations are considered online.
To access the Performance Monitor:
- Open the Performance Monitor Console (
perfmon.exe
).
- Select Performance Monitor in the left pane.
- You may wish to check the Show Description checkbox, near the bottom of the window.
To add counters to the view:
- Click the green
+
icon.
- From the Available counters section, expand one of the three PA Data Hub categories.
- Select one or more counters to add.
- If applicable, select one or more instances to add.
- Click
Add
.
- Repeat from step 3 for all counters to be added.
- Click
OK
.
To remove counters from the view:
- Select a counter from the display at the bottom of the page.
- Click the red
X
icon.
If you have selected many counters, you may wish to change the view to a single-page report:
- Select the Change Graph Type drop-down icon.
- Select Report.