What is AWS Glue DataBrew?
AWS Glue DataBrew is an interactive data preparation tool for cleaning, normalizing, analyzing, and adjusting datasets.
You can upload a dataset or use data sources like Glue Data Catalog, RDS Aurora, Redshift, and S3. SDK and API access is also supported for pipeline integration.
Once the data is loaded, a subset of it will be accessible within the tool to analyze and manipulate. These transformations will be recorded in a recipe that can be used for many other datasets. A job can then be created to run the recipe on entire datasets and publish the results to an output location like S3.
Features of AWS Glue DataBrew
- Serverless - Pay only for what you use
- No code required
- Over 250 pre-built transformations (extract, group, join, merge, normalize, pivot, remove, split, tokenize, etc.)
- Profile and analyze data for detailed statistics, quality, and anomaly detection
- Data Ingestion supports the following file formats - CSV, Excel, JSON, ORC, and Parquet
- View dataset lineage - Know where the data comes from, and what interacts with it