Azure Machine Learning Services Workspaces

Introduction

An Azure Machine Learning (Azure ML) Workspace is a centralized platform within Azure Machine Learning Services that enables data scientists and developers to efficiently manage machine learning (ML) projects. It acts as a collaborative environment for building, training, deploying, and monitoring ML models while ensuring security and scalability.

Use OpsRamp Azure Public Cloud Integration to discover and collect metrics against Machine Learning Services Workspaces.

Setup

To set up the Azure integration and discover the Azure Machine Learning Services Workspaces resources, do the following:

Create an Azure Integration, if not available in your installed integrations. For more information on how to install the Azure Integration, refer to Install Azure Integration.
Create a discovery profile. For more information on how to create a discovery profile, refer to Create Discovery Profile.
Select Machine Learning Services Workspaces under the Filter Criteria in the Edit Discovery Profile page.
Save the discovery profile to make them available in the list of Discovery Profiles.
Scan to discover the resources at any time independent of the predefined schedule.
Once the scan is completed, you can view the Machine Learning Services Workspaces resources under Infrastructure > Resources > Microsoft Azure category.

Event support

OpsRamp supports Azure events for Machine Learning Services Workspaces. Configure Azure Events in the OpsRamp Azure integration discovery profile.

See Process Azure Events for more information on how to configure Azure events.

Supported metrics

OpsRamp Metric	Azure Metric	Metric Display Name	Unit	Aggregation Type	Description
azure_ml_services_workspaces_Agents	Agents	Agents	Count	Average	Number of events for AI Agents in this workspace
azure_ml_services_workspaces_IndexedFiles	IndexedFiles	IndexedFiles	Count	Average	Number of files indexed for file search in this workspace
azure_ml_services_workspaces_Messages	Messages	Messages	Count	Average	Number of events for AI Agent messages in this workspace
azure_ml_services_workspaces_Runs	Runs	Runs	Count	Average	Number of runs by AI Agents in this workspace
azure_ml_services_workspaces_Threads	Threads	Threads	Count	Average	Number of events for AI Agent threads in this workspace
azure_ml_services_workspaces_Tokens	Tokens	Tokens	Count	Average	Count of tokens by AI Agents in this workspace
azure_ml_services_workspaces_ToolCalls	ToolCalls	ToolCalls	Count	Average	Tool calls made by AI Agents in this workspace
azure_ml_services_workspaces_Model_Deploy_Failed	Model Deploy Failed	Model Deploy Failed	Count	Total	Number of model deployments that failed in this workspace
azure_ml_services_workspaces_Model_Deploy_Started	Model Deploy Started	Model Deploy Started	Count	Total	Number of model deployments started in this workspace
azure_ml_services_workspaces_Model_Deploy_Succeeded	Model Deploy Succeeded	Model Deploy Succeeded	Count	Total	Number of model deployments that succeeded in this workspace
azure_ml_services_workspaces_Model_Register_Failed	Model Register Failed	Model Register Failed	Count	Total	Number of model registrations that failed in this workspace
azure_ml_services_workspaces_Model_Register_Succeeded	Model Register Succeeded	Model Register Succeeded	Count	Total	Number of model registrations that succeeded in this workspace
azure_ml_services_workspaces_Active_Cores	Active Cores	Active Cores	Count	Average	Number of active cores
azure_ml_services_workspaces_Active_Nodes	Active Nodes	Active Nodes	Count	Average	Number of Acitve nodes. These are the nodes which are actively running a job
azure_ml_services_workspaces_Idle_Cores	Idle Cores	Idle Cores	Count	Average	Number of idle cores
azure_ml_services_workspaces_Idle_Nodes	Idle Nodes	Idle Nodes	Count	Average	Number of idle nodes. Idle nodes are the nodes which are not running any jobs but can accept new job if available
azure_ml_services_workspaces_Leaving_Cores	Leaving Cores	Leaving Cores	Count	Average	Number of leaving cores
azure_ml_services_workspaces_Leaving_Nodes	Leaving Nodes	Leaving Nodes	Count	Average	Number of leaving nodes. Leaving nodes are the nodes which just finished processing a job and will go to Idle state
azure_ml_services_workspaces_Preempted_Cores	Preempted Cores	Preempted Cores	Count	Average	Number of preempted cores
azure_ml_services_workspaces_Preempted_Nodes	Preempted Nodes	Preempted Nodes	Count	Average	Number of preempted nodes. These nodes are the low priority nodes which are taken away from the available node pool
azure_ml_services_workspaces_Quota_Utilization_Percentage	Quota Utilization Percentage	Quota Utilization Percentage	Count	Average	Percent of quota utilized
azure_ml_services_workspaces_Total_Cores	Total Cores	Total Cores	Count	Average	Number of total cores
azure_ml_services_workspaces_Total_Nodes	Total Nodes	Total Nodes	Count	Average	Number of total nodes. This total includes some of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, Leaving Nodes
azure_ml_services_workspaces_Unusable_Cores	Unusable Cores	Unusable Cores	Count	Average	Number of unusable cores
azure_ml_services_workspaces_Unusable_Nodes	Unusable Nodes	Unusable Nodes	Count	Average	Number of unusable nodes. Unusable nodes are not functional due to some unresolvable issue. Azure will recycle these nodes
azure_ml_services_workspaces_CpuCapacityMillicores	CpuCapacityMillicores	CpuCapacityMillicores	Count	Average	Maximum capacity of a CPU node in millicores. Capacity is aggregated in one minute intervals
azure_ml_services_workspaces_CpuMemoryCapacityMegabytes	CpuMemoryCapacityMegabytes	CpuMemoryCapacityMegabytes	Count	Average	Maximum memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals
azure_ml_services_workspaces_CpuMemoryUtilizationMegabytes	CpuMemoryUtilizationMegabytes	CpuMemoryUtilizationMegabytes	Count	Average	Memory utilization of a CPU node in megabytes. Utilization is aggregated in one minute intervals
azure_ml_services_workspaces_CpuMemoryUtilizationPercentage	CpuMemoryUtilizationPercentage	CpuMemoryUtilizationPercentage	Count	Average	Memory utilization percentage of a CPU node. Utilization is aggregated in one minute intervals
azure_ml_services_workspaces_CpuUtilization	CpuUtilization	CpuUtilization	Count	Average	Percentage of utilization on a CPU node. Utilization is reported at one minute intervals
azure_ml_services_workspaces_CpuUtilizationMillicores	CpuUtilizationMillicores	CpuUtilizationMillicores	Count	Average	Utilization of a CPU node in millicores. Utilization is aggregated in one minute intervals
azure_ml_services_workspaces_CpuUtilizationPercentage	CpuUtilizationPercentage	CpuUtilizationPercentage	Count	Average	Utilization percentage of a CPU node. Utilization is aggregated in one minute intervals
azure_ml_services_workspaces_DiskAvailMegabytes	DiskAvailMegabytes	DiskAvailMegabytes	Count	Average	Available disk space in megabytes. Metrics are aggregated in one minute intervals
azure_ml_services_workspaces_DiskReadMegabytes	DiskReadMegabytes	DiskReadMegabytes	Count	Average	Data read from disk in megabytes. Metrics are aggregated in one minute intervals
azure_ml_services_workspaces_DiskUsedMegabytes	DiskUsedMegabytes	DiskUsedMegabytes	Count	Average	Used disk space in megabytes. Metrics are aggregated in one minute intervals
azure_ml_services_workspaces_DiskWriteMegabytes	DiskWriteMegabytes	DiskWriteMegabytes	Count	Average	Data written into disk in megabytes. Metrics are aggregated in one minute intervals
azure_ml_services_workspaces_GpuCapacityMilliGPUs	GpuCapacityMilliGPUs	GpuCapacityMilliGPUs	Count	Average	Maximum capacity of a GPU device in milli-GPUs. Capacity is aggregated in one minute intervals
azure_ml_services_workspaces_GpuEnergyJoules	GpuEnergyJoules	GpuEnergyJoules	Count	Average	Interval energy in Joules on a GPU node. Energy is reported at one minute intervals
azure_ml_services_workspaces_GpuMemoryCapacityMegabytes	GpuMemoryCapacityMegabytes	GpuMemoryCapacityMegabytes	Count	Average	Maximum memory capacity of a GPU device in megabytes. Capacity aggregated in at one minute intervals
azure_ml_services_workspaces_GpuMemoryUtilization	GpuMemoryUtilization	GpuMemoryUtilization	Count	Average	Percentage of memory utilization on a GPU node. Utilization is reported at one minute intervals
azure_ml_services_workspaces_GpuMemoryUtilizationMegabytes	GpuMemoryUtilizationMegabytes	GpuMemoryUtilizationMegabytes	Count	Average	Memory utilization of a GPU device in megabytes. Utilization aggregated in at one minute intervals
azure_ml_services_workspaces_GpuMemoryUtilizationPercentage	GpuMemoryUtilizationPercentage	GpuMemoryUtilizationPercentage	Count	Average	Memory utilization percentage of a GPU device. Utilization aggregated in at one minute intervals
azure_ml_services_workspaces_GpuUtilization	GpuUtilization	GpuUtilization	Count	Average	Percentage of utilization on a GPU node. Utilization is reported at one minute intervals
azure_ml_services_workspaces_GpuUtilizationMilliGPUs	GpuUtilizationMilliGPUs	GpuUtilizationMilliGPUs	Count	Average	Utilization of a GPU device in milli-GPUs. Utilization is aggregated in one minute intervals
azure_ml_services_workspaces_GpuUtilizationPercentage	GpuUtilizationPercentage	GpuUtilizationPercentage	Count	Average	Utilization percentage of a GPU device. Utilization is aggregated in one minute intervals
azure_ml_services_workspaces_IBReceiveMegabytes	IBReceiveMegabytes	IBReceiveMegabytes	Count	Average	Network data received over InfiniBand in megabytes. Metrics are aggregated in one minute intervals
azure_ml_services_workspaces_IBTransmitMegabytes	IBTransmitMegabytes	IBTransmitMegabytes	Count	Average	Network data sent over InfiniBand in megabytes. Metrics are aggregated in one minute intervals
azure_ml_services_workspaces_NetworkInputMegabytes	NetworkInputMegabytes	NetworkInputMegabytes	Count	Average	Network data received in megabytes. Metrics are aggregated in one minute intervals
azure_ml_services_workspaces_NetworkOutputMegabytes	NetworkOutputMegabytes	NetworkOutputMegabytes	Count	Average	Network data sent in megabytes. Metrics are aggregated in one minute intervals
azure_ml_services_workspaces_StorageAPIFailureCount	StorageAPIFailureCount	StorageAPIFailureCount	Count	Average	Azure Blob Storage API calls failure count
azure_ml_services_workspaces_StorageAPISuccessCount	StorageAPISuccessCount	StorageAPISuccessCount	Count	Average	Azure Blob Storage API calls success count
azure_ml_services_workspaces_Cancel_Requested_Runs	Cancel Requested Runs	Cancel Requested Runs	Count	Total	Number of runs where cancel was requested for this workspace. Count is updated when cancellation request has been received for a run
azure_ml_services_workspaces_Cancelled_Runs	Cancelled Runs	Cancelled Runs	Count	Total	Number of runs cancelled for this workspace. Count is updated when a run is successfully cancelled
azure_ml_services_workspaces_Completed_Runs	Completed Runs	Completed Runs	Count	Total	Number of runs completed successfully for this workspace. Count is updated when a run has completed and output has been collected
azure_ml_services_workspaces_Errors	Errors	Errors	Count	Total	Number of run errors in this workspace. Count is updated whenever run encounters an error
azure_ml_services_workspaces_Failed_Runs	Failed Runs	Failed Runs	Count	Total	Number of runs failed for this workspace. Count is updated when a run fails
azure_ml_services_workspaces_Finalizing_Runs	Finalizing Runs	Finalizing Runs	Count	Total	Number of runs entered finalizing state for this workspace. Count is updated when a run has completed but output collection still in progress
azure_ml_services_workspaces_Not_Responding_Runs	Not Responding Runs	Not Responding Runs	Count	Total	Number of runs not responding for this workspace. Count is updated when a run enters Not Responding state
azure_ml_services_workspaces_Not_Started_Runs	Not Started Runs	Not Started Runs	Count	Total	Number of runs in Not Started state for this workspace. Count is updated when a request is received to create a run but run information has not yet been populated
azure_ml_services_workspaces_Preparing_Runs	Preparing Runs	Preparing Runs	Count	Total	Number of runs that are preparing for this workspace. Count is updated when a run enters Preparing state while the run environment is being prepared
azure_ml_services_workspaces_Provisioning_Runs	Provisioning Runs	Provisioning Runs	Count	Total	Number of runs that are provisioning for this workspace. Count is updated when a run is waiting on compute target creation or provisioning
azure_ml_services_workspaces_Queued_Runs	Queued Runs	Queued Runs	Count	Total	Number of runs that are queued for this workspace. Count is updated when a run is queued in compute target. Can occure when waiting for required compute nodes to be ready
azure_ml_services_workspaces_Started_Runs	Started Runs	Started Runs	Count	Total	Number of runs running for this workspace. Count is updated when run starts running on required resources
azure_ml_services_workspaces_Starting_Runs	Starting Runs	Starting Runs	Count	Total	Number of runs started for this workspace. Count is updated after request to create run and run info, such as the Run Id, has been populated
azure_ml_services_workspaces_Warnings	Warnings	Warnings	Count	Total	Number of run warnings in this workspace. Count is updated whenever a run encounters a warning