Macquarie Commits Up to $5 Billion in Applied Digital’s AI Data Centers
In a major boost to the artificial intelligence sector, Australian investment giant Macquarie Group has agreed to take a 15%
by Simon Goldberg, Autrin Abdi, Ayman Kazmi, and Emile Baizel on 08 AUG 2024 in Amazon Athena, Amazon Bedrock, Analytics, Blockchain, Foundational (100), Technical How-to Permalink Comments Share
Data within public blockchain networks such as Bitcoin and Ethereum can be accessed by anyone. It holds a wealth of valuable insights that can drive business decisions, inform investment strategies, and uncover emerging trends. However, accessing and making sense of this information has traditionally been a complex and technical undertaking. Much of the data is encoded and stored as bytes, rather than in a human-readable format.
The data structures used in blockchains are optimized to provide tamper-evidence and immutability of the data, but not to perform queries and analytics. Before the data can be queried, it must first be processed through an extract, transform, and load (ETL) pipeline and converted into a format that can be used with common business intelligence (BI) tools and query languages.
The AWS Public Blockchain Open Data Set was created to solve these challenges, and is available for Bitcoin and Ethereum. These datasets provide historical data, allowing analysts to issue SQL queries to services like Amazon Athena and Amazon Redshift to glean insights. One of the key advantages of these datasets is the potential to aggregate and analyze activity across multiple blockchain networks. However, the process of querying and analyzing blockchain data still requires a comprehensive understanding of data schemas and the ability to construct appropriate queries.
With generative artificial intelligence (AI), you can now extend this analysis capability to support natural language queries. This enables users who may not be familiar with SQL to gain similar insights from blockchain data. In this post, we introduce a solution that demonstrates how you can chat with blockchain data using Amazon Bedrock and the AWS Public Blockchain datasets. We discuss Amazon Bedrock, review the solution architecture, provide example prompts, share interesting findings, and go over how you can extend the solution to integrate with different data sources.
Recent advancements in large language models (LLMs) and generative AI have opened up new possibilities for interacting with data in more natural and intuitive ways. These models have demonstrated the ability to understand and generate human-like text, enabling natural language understanding and generation.
Amazon Bedrock is a fully managed service that makes it simple for customers to build generative AI applications and provides access to a variety of foundation models (FMs), agents, and knowledge bases for retrieval augmented generation (RAG) workflows.
Amazon Bedrock allows you to experiment, customize, and deploy FMs without having to manage the underlying infrastructure and training complexity.
A key capability within Amazon Bedrock is the ability to create autonomous agents using Agents for Amazon Bedrock that can assist users in completing tasks. These agents use the reasoning and language understanding capabilities of FMs to perform the following functions:
Agents are well-suited for building generative AI applications that can automate tasks and engage with users in a natural and conversational way.
This solution is available as an automated AWS Cloud Development Kit (AWS CDK) deployment in the accompanying GitHub repository. At the core of this solution is an agent using Anthropic Claude 3 Haiku on Amazon Bedrock, an LLM that allows the agent to understand user requests based on a given set of instructions and take appropriate action. Docker is used to build components of the CDK application locally, while the CDK utilizes CloudFormation to deploy the solution to AWS.
Instead of having to understand complex data schemas and construct SQL queries, you can simply express queries in natural language, and the agent will interpret your intent and translate it into queries you can run on the AWS Public Blockchain datasets. This lowers the barrier to entry for querying and analyzing blockchain data, making it more accessible to a broader range of users.
The following architecture showcases how the solution simplifies the process of querying blockchain data and effectively handles error recovery.
The workflow includes the following steps:
/athenaQuery
, that accepts a request body containing a SQL query.aws-public-blockchain
. The Lambda function returns the query results back to the agent’s action group, which are expected to be a ResultSet
array containing the rows returned by the query, as defined in the OpenAPI schema.The solution’s AWS CDK application additionally deploys an AWS CloudFormation stack that creates AWS Glue tables and partition definitions in the AWS Glue Data Catalog, which serves as a centralized repository for metadata about the blockchain data stored in Amazon S3. By defining the schema and partitioning structure of the datasets, the AWS Glue tables provide a logical abstraction layer that allows Athena to efficiently query and analyze the underlying data. The CloudFormation template also deploys a Lambda function that runs daily to update the partition definitions in the Data Catalog, so the data is up to date.
One of the key strengths of this solution is its robust error-handling mechanism. In the event that the initial SQL query fails due to syntax errors, missing tables, or other issues, the agent doesn’t simply return an error message to the user. Instead, it analyzes the error feedback, identifies the root cause, and autonomously reformulates the query to address the problem.
This iterative process continues until a valid query is generated and run successfully, so desired results are consistently returned, even in the face of complex or problematic queries. If the agent is unable to generate a valid query after multiple attempts, it informs the user about its inability to assist with the given prompt.
It’s important to note that accurately translating complex natural language queries to SQL can be a challenge for LLM’s. To improve the accuracy and reliability of the solution, the agent references contextual information about the dataset schemas.
The agent instruction, which can be found in the GitHub repo, plays a crucial role in enabling the agent to generate accurate and optimized SQL queries tailored for the AWS Public Blockchains datasets. It covers several key aspects:
btc
for Bitcoin and eth
for Ethereum) when referencing tables and fields.UNNEST
keyword, how to perform date comparisons and time range calculations, and how to handle token addresses in the Ethereum database using the lower
function.Complete the following prerequisite steps to deploy the solution. It is recommended that you deploy the solution in a dedicated sandbox AWS account. AWS CloudTrail, which is enabled by default, provides monitoring and auditing capabilities for your account. Additionally, make sure that you have properly configured AWS Identity and Access Management (IAM) permissions, limiting access of the deployment to specific users with the necessary permissions.
npm install -g aws-cdk
git clone https://github.com/aws-samples/chat-with-blockchain-data-with-amazon-bedrock.git
npm install
aws configure
cdk bootstrap aws://<ACCOUNT_NUMBER>/<REGION>
cdk deploy BedrockBlockchainDataAgentStack
It takes approximately 2 minutes for the stack to be deployed.
Note: Due to the Glue Catalog synchronization process, it will take approximately 4-5 minutes for the Ethereum data to become available after the initial deployment.
If this is your first time using Amazon Bedrock, choose Model access in the navigation pane on the Amazon Bedrock console and enable access for Anthropic Claude 3 Haiku on Amazon Bedrock, as shown below:
Now that the AWS CDK stack has been deployed, you can test the solution.
The following screenshot illustrates an example prompt and the corresponding output generated by the model.
You can use the following sample questions as a starting point, but be sure to test the agent with your own questions.
The following are sample questions regarding Bitcoin:
The following are sample questions regarding Ethereum:
Feel free to experiment with different queries and prompts to explore the full capabilities of the agent and the AWS Public Blockchain datasets.
To avoid incurring future charges, delete the resources you created by running the following AWS CDK command from the root of the directory:
cdk destroy
When developing this solution, we encountered several discoveries that highlight the capabilities enabled by natural language querying of blockchain data:
These findings demonstrate the powerful capabilities of agents in understanding and interacting with blockchain data in a natural and intuitive way.
Although this solution has demonstrated the power of natural language querying for the AWS Public Blockchain datasets, you can extend the capabilities further to integrate with additional data sources. Two promising avenues for expansion are Amazon Managed Blockchain (AMB) Query and The Graph.
AMB Query provides serverless access to historical token balances, transaction data, and more. During our testing, we found that retrieving balance information for a single Bitcoin address using an Athena query on the AWS Public Blockchain dataset required scanning 1.15 TB of data, which had a runtime of 40 seconds and an associated cost of approximately $6 USD. The reason for this high cost is that the AWS Public Blockchain dataset is stored in its raw form, without any indexing or optimizations for specific queries. As a result, Athena must scan the entire dataset to retrieve the requested information, leading to long runtimes and high costs, especially for queries that involve large amounts of data or complex computations.
In contrast, AMB Query can retrieve the same balance information in milliseconds, with a much lower cost of $0.000007 USD per request (or $7 USD per million requests). AMB Query uses specialized indexing to optimize access to blockchain data, resulting in significantly faster and more cost-effective retrieval of information.
It’s important to be aware of the potential costs associated with running complex or data-intensive queries on the raw dataset using Athena. If you plan to perform multiple balance or transaction queries, it may incur substantial costs due to the need to scan large portions of the dataset. In such cases, it is more cost-effective to consider alternative solutions like AMB Query.
This difference in latency and cost highlights the potential benefits of extending this solution to use AMB Query as an additional data source. This would allow for the seamless transition between querying the public blockchain datasets and the more optimized responses through AMB Query, all through the same natural language interface.
Another area of exploration is integrating with The Graph, a decentralized protocol for indexing and querying blockchain data. By integrating the agent with The Graph, users could ask natural language questions related to specific smart contracts and their associated data. For example, you could ask the agent questions about the various liquidity pools on the Uniswap decentralized exchange, and have it generate the appropriate GraphQL queries to retrieve the relevant information.
By incorporating additional data sources, this solution can provide users with an even more comprehensive and cost-effective way to perform cross-chain analytics by chatting with blockchain data. The flexibility to integrate with various data providers further enhances the value and versatility of this approach.
Lastly, as you consider expanding on this solution, it is recommended to implement Guardrails for Amazon Bedrock and associate it with your Agent. This feature allows you to establish safeguards, such as detecting and blocking potentially malicious user inputs that attempt to override or manipulate the Agent Instruction. Additionally, it is advisable to research best practices for prompt injection security. This approach will help mitigate the risks associated with prompt injection attacks, ensuring the integrity and reliability of the solution.
In this post, we covered how you can use Agents for Amazon Bedrock to enable natural language queries on the AWS Public Blockchain datasets. This solution allows you to gain insights from blockchain data in a natural and conversational manner, without the need for deep technical expertise. We discussed the key components of the solution and how it can be extended. As a next step, you can try deploying the GitHub repository in your AWS account. Let us know in the comments section if you have any questions.
Sign up for the newsletter and get our latest articles delivered straight to your inbox.