PrestoDB vs PrestoSQL & the new Presto Foundation

PrestoDB

Last year we posted an introduction article on Presto. So what is new in the Presto world since then? A ton!

PrestoDB moves to the Linux Foundation

In September 2019, the official PrestoDB Foundation was started by Facebook, Uber, Twitter, and Alibaba.

A formal, official foundation is what was needed for the Presto ecosystem to prosper. The formation and transition to a formal foundation under the Linux Foundation’s auspices was a significant first step to deal with confusion in the community.

Kudos to Facebook, Uber, Twitter, and others in making this a reality. Having open, shared, and community-driven organization is critical to future success Presto. You can read more about these principles and roadmaps here.

Revisiting PrestoDB vs. PrestoSQL

Why is a formal, independent foundation necessary? Presto Foundation established a set of much-needed guiding principles for the community. In the post last year, we highlighted some confusion about the two principle Presto project repositories; https://prestodb.io/ and prestosql.io.

We referred to prestosql as the “fork.” On GitHub, the fork is located at prestosql/presto. However, the official project is prestodb/presto.

So why is there confusion? Here is how they describe themselves:

  • prestodb/presto: “The official home of the Presto distributed SQL query engine for big data https://prestodb.github.io."
  • prestosql/presto: “Official home of Presto, the distributed SQL query engine for big data https://prestosql.io."

How many Presto Foundations or official projects do we need?

Last year I was approached by O’Reilly to act as a technical reviewer for “Presto: The Definitive Guide.” I was initially excited to be able to contribute to the work. However, in reviewing the initial drafts, it was clear the book was focused on prestosql. Most of the referenced documentation, code, Docker resources pointed to prestosql and Starburst. As a result, I ended up deciding not to participate as a technical reviewer.

My concern today, as it was last year, was that the forked prestosql and its similarly-named “Presto Software Foundation” had self-proclaimed they were “official.” They also have the appearance of being an extension of commercial operation (i.e., Starburst).

I want to make clear that I have no issue with the commercialization efforts of Presto. The Starburst team is helping move Presto forward, which is essential. We are also big fans of what Amazon has done (is doing) with Athena.

However, the ecosystem was fractured, which confuses outsiders. Confusion can impact interest and slow adoption. Having a well-respected, well-defined framework like the Linux Foundation’s Presto Foundation is critical.

For a healthy and vibrant Presto ecosystem, I think everyone in the Presto community would welcome convergence of efforts for the good of all.

Presto deployments gaining traction

Last year we pointed out how excited we were about the opportunities Presto community and commercialization efforts would unlock for a broader user base.

Athena is a top choice for our customers to query their data lakes. We have currently done over 100 Amazon Athena deployments. Athena (which used Linux Foundation’s PrestoDB) makes using a data lake for ordinary, everyday analytics activity a reality. Connect Tableau, Power BI, Looker, or any other supported tool to Athena, and you have immediate access to the contents of your data lake.

The AWS implementation of Presto makes the technology accessible to teams that generally do not have the technical skills to roll an implementation. For example, in Building A Serverless Business Intelligence Stack With Apache Parquet, Tableau, and Amazon Athena, we detailed how teams can quickly build a Presto architecture using a data lake and Athena query engine.

We have also seen interesting ELT and ETL hybrid data lake architectures leveraging Presto. For example, one of our customers has an ELT process that moves billions of Adobe analytic events to an AWS data lake. Next, they connect to the data lake via Athena to an enterprise Oracle Cloud environment. This hybrid cloud model allows the Oracle team to run ETL testing jobs, minimize the data imported to Oracle, create new data models or applications without impacting downstream workflows in Oracle.

Amazon recently released federated queries for Athena. Federated queries expand on the core distributed query engine model promoted by Presto.

What’s Next?

In addition to cloud vendors like AWS providing prestodb, new commercial entrants in the prestodb space are needed. There are ample opportunities for vendors to provide additional support that enterprises need, offer robust implementations of the full prestodb feature set, and offer dedicated expertise beyond the community channels.

Given the moves by Facebook with the PrestoDB Foundation, we certainly are looking forward to the growth of the community and new entrants in the commercial space.

DWant to discuss Presto or Athena for your organization? Need a platform and team of experts to kickstart your data and analytics efforts? We can help! Getting traction adopting new technologies, especially if it means your team is working in different and unfamiliar ways, can be a roadblock for success. This is especially true in a self-service only world. If you want to discuss a proof-of-concept, pilot, project, or any other effort, the Openbridge platform and team of data experts are ready to help.

Reach out to us at hello@openbridge.com. Prefer to talk to someone? Set up a call with our team of data experts.

References

Are you interested in learning more about Presto? Check out some of these reference sources to help you get started:


PrestoDB vs PrestoSQL & the new Presto Foundation was originally published in Openbridge on Medium, where people are continuing the conversation by highlighting and responding to this story.



source https://blog.openbridge.com/prestodb-vs-prestosql-the-new-presto-foundation-341405fb7ba3?source=rss----4c5221789b3---4

Popular posts from this blog

Data Lake Icon: Visual Reference

Why Timeszones Cause Amazon Seller Central Confusion