MLflow Tracking Server
MLflow provides experiment tracking for training and fine-tuning workloads. In Alauda AI, MLflow is installed as a cluster plugin through the MLflow Operator. The operator deploys an MLflow Tracking Server, exposes it through the platform ingress, and adds an MLFlow entry in the Tools menu.
TOC
PrerequisitesInstall Or UpgradeWorkspace AccessClient ConfigurationHigh Availability And StorageTroubleshootingPrerequisites
- Alauda AI is installed on the target cluster.
- A PostgreSQL database is available for MLflow metadata.
- The platform OAuth/OIDC provider is configured.
- For namespace-backed workspaces, the target workspace namespaces have the label selected by the MLflow configuration, such as
mlflow-enabled=true.
Install Or Upgrade
- Upload the MLflow cluster plugin package to the global cluster.
- In the Web Console, go to Administrator > Marketplace > Upload Packages and verify that the MLflow package version appears.
- Install or upgrade the MLflow cluster plugin for the target cluster.
- Configure the PostgreSQL host, port, username, and password.
- Enable multi-tenancy when users should access MLflow workspaces backed by Kubernetes namespaces.
- Open Alauda AI > Tools > MLFlow after the plugin status is running.
Workspace Access
MLflow workspaces map to Kubernetes namespaces. Only namespaces matching the configured label selector are visible as workspaces.
Example namespace:
Grant users access to MLflow resources in a workspace with Kubernetes RBAC:
Client Configuration
Set the MLflow tracking URI to the platform route and select the workspace:
For HTTP clients, pass the workspace header:
High Availability And Storage
MLflow uses an external PostgreSQL database for metadata. Use a highly available PostgreSQL service for production environments.
The default artifact path is local to the MLflow pod. For production, configure durable artifact storage through the MLflow deployment settings before users store experiment artifacts. The default MLflow server deployment is not a multi-replica high-availability deployment unless the release notes for your version state otherwise.
Troubleshooting
- If the MLFlow Tools menu entry is missing, verify that the
aml-mlflow-menu-configConfigMapexists in the MLflow namespace and has the labelaml.cpaas.io/centralMenuItem: "true". - If a workspace is not visible, verify that its namespace matches the MLflow workspace label selector.
- If requests are denied, check the user's Kubernetes
RoleBindingin the workspace namespace. - If the server does not start, verify PostgreSQL connectivity and credentials.