pentaho
det2562x_oct18

Data Prep Analytics

Imagine being able to access analytics from anywhere in the data pipeline, not just after data has been prepared and exported to a business analytics tool. Leverage the only platform on the market that brings analytics into data prep, without the need to switch in and out of tools– so that you can shorten the cycle from data to insights.

  • Bringing Analytics into Data Prep: ETL developers and data prep staff can spot check analytics in-flight with access to charts, graphs, visualizations, or ad hoc analysis from any step in the data prep process.
  • Share Analytics During Data Prep: Publish data sources for the business while preparing data. With the ability to immediately share data sources, IT can better collaborate with the business for a quicker, less iterative approach to the right analytics.

New Spark capabilities

Organizations are adopting Spark to fuel fast, flexible big data processing and analysis, but, given a shortage of relevant development skills, it can be challenging to maximize the value of Spark in production. These updates help extend the benefits of Spark to a wider audience, while allowing teams to operationalize Spark as part of bigger data-driven business processes.

  • Spark on SQL Access: Access SQL on Spark as a data source within Pentaho Data Integration, making it easier for ETL developers and data analysts to query Spark data and integrate it with other data for preparation and analytics.
  • Extended Spark Orchestration: Visually coordinate and schedule Spark applications that use a wider variety of libraries, including Spark Streaming and Spark SQL, as well as SparkML and Spark MLlib for machine learning. Additionally, Pentaho now supports the orchestration of Spark applications written in Python.
7-0-lp-spark-image-2
hadoop-slide

Expanded Hadoop Security

Visual development tools for big data must comply with security frameworks that protect key enterprise data resources from intrusion. Facilitate big data governance, and reduce risk with Pentaho’s expanded integration with Hadoop security technologies.

  • Expanded Kerberos Integration: Promote secure data integration for more users with updated capability that enables multiple Pentaho users to access Kerberos-enabled Cloudera clusters as multiple Hadoop users.
  • Sentry Integration: PDI works with Sentry for role-based access to specific Hadoop data sets, enabling granular tracking and enforcement of enterprise data authorization rules.

Enhanced Metadata injection

IT teams spend countless hours coding ingestion and processing jobs to onboard a wide variety of big data sources. Increase IT productivity when building out many data migration and onboarding processes by automating and scaling big data pipelines with metadata injection.

  • Metadata Injection Extended to More Steps: Enable IT teams to auto-generate a wider variety of data transformations at runtime with metadata injection support for 30+ additional PDI steps. The new injection-enabled steps include operations related to Hadoop, Hbase, JSON, XML, Vertica, Greenplum, and other big data sources.
7-0-lp-metadata-image-2
7-0-lp-enhancements-2_0

Additional enhancements

Pentaho 7.0 includes a number of additional enhancements to future-proof your investment and support a big data blended world.

  • Support for Kafka: Send data to, and receive data from, Kafka, the popular messaging queue technology leveraged in big data and IoT.
  • Support for Avro and Parquet: Output files in Avro and Parquet, two formats for storing data in Hadoop in big data onboarding use cases.
  • Simplified Configuration, Deployment, and Administration: More easily and quickly configure, deploy, and manage a unified data integration and business analytics server to support Pentaho development and production environments.

Contact for Pentaho