DataIQ Tutorial with Code Examples
DataIQ Tutorial with Code Examples
Example 1: Creating a Data Catalog in DataIQ
from dataiq.sdk import DataCatalog
catalog = DataCatalog()
catalog.create_dataset("customer_data", description="Customer demographics and transactions")
catalog.add_metadata("customer_data", {"source": "CRM", "updated": "2024-02-24"})
catalog.publish("customer_data")
Explanation
Initialize Data Catalog – The
DataCatalog()
instance initializes a DataIQ catalog, enabling data asset management within the platform for governance and metadata tracking.Create a Dataset –
catalog.create_dataset("customer_data", ...)
registers a dataset in DataIQ, making it accessible for analysis, lineage tracking, and collaboration.Add Metadata – The
catalog.add_metadata("customer_data", ...)
function assigns metadata, such as the source and last update timestamp, improving discoverability.Publish Dataset –
catalog.publish("customer_data")
ensures the dataset is available for use across the organization, enforcing governance policies and quality standards.
Example 2: Data Quality Check with DataIQ
from dataiq.sdk import DataQuality
dq = DataQuality()
dq.add_rule("customer_data", "email_valid", "email IS NOT NULL AND email LIKE '%@%'")
dq.run_checks("customer_data")
dq.get_results("customer_data")
Explanation
Initialize Data Quality –
dq = DataQuality()
initializes the DataIQ Data Quality module, allowing validation rules to be applied to datasets for integrity checks.Define a Rule –
dq.add_rule("customer_data", "email_valid", "email IS NOT NULL AND email LIKE '%@%'")
ensures all email addresses follow a valid format.Run Quality Checks –
dq.run
_checks("customer_data")
executes the defined validation rules across the dataset, identifying records that do not meet the criteria.Retrieve Check Results –
dq.get_results("customer_data")
fetches the validation outcomes, helping analysts review failed records and take corrective actions.
Example 3: Automating Data Classification in DataIQ
from dataiq.sdk import DataClassifier
classifier = DataClassifier()
classifier.train("customer_data", labels=["personal", "transactional"])
classifier.predict("new_customer_data")
classifier.save_model("customer_classifier")
Explanation
Initialize Data Classifier –
classifier = DataClassifier()
creates a classification model in DataIQ to automatically label datasets based on predefined categories.Train the Model –
classifier.train("customer_data", labels=["personal", "transactional"])
learns patterns from historical data to categorize records into “personal” or “transactional.”Apply Predictions –
classifier.predict("new_customer_data")
uses the trained model to classify incoming data, ensuring consistent labeling for governance and compliance.Save the Model –
classifier.save
_model("customer_classifier")
persists the trained model, allowing it to be reused for continuous data classification without retraining.
Example 4: Enforcing Access Control in DataIQ
from dataiq.sdk import AccessControl
ac = AccessControl()
ac.grant_permission("customer_data", "user123", "read")
ac.revoke_permission("customer_data", "user456")
ac.list_permissions("customer_data")
Explanation
Initialize Access Control –
ac = AccessControl()
initializes the security module in DataIQ, allowing fine-grained control over who can access datasets.Grant Read Permission –
ac.grant_permission("customer_data", "user123", "read")
allows user123 to view, but not modify, the dataset, ensuring controlled access.Revoke User Access –
ac.revoke_permission("customer_data", "user456")
removes access for user456, enforcing security policies and protecting sensitive information.List Current Permissions –
ac.list_permissions("customer_data")
retrieves all assigned permissions for the dataset, enabling auditing and compliance tracking.
These four examples demonstrate essential DataIQ functionalities, covering cataloging, quality checks, classification, and access control.