Original post has better layout with images.
Intro
I am currently using both GCP and AWS for a certain project. While each cloud providers provides very nice tool and visualizations for their own spending (Cloud billing for GCP and Cost&Usage report for AWS respectively), I wanted to consolidate both cloud providers usage report and visualize using single BI tool. The report does not need to be realtime, I only need daily granularity on my report. (No streaming)
So I had to spin up something and these were couple options
- House data to AWS using Athena+Quicksight(or other BI tool)
- House data to GCP using Bigquery+datastudio(or other BI tool)
- Spend more time searching for 3rd party tools
I am a big fan of BigQuery and I prefer to to ELT rather than ETL on BQ. Moreover, I was already using BigQuery for the project with holistics.io, I chose to house usage and cost data to bigquery.
Move data to BQ
GCP Cloud Billing
Setup
This is easy, cloud billing natively support bigquery export.
-
Enable billing for the GCP project.
I assume that you already have billing account setup. (You wouldn't need data analytics unless you are spending any moeny) But if you need to enable billing for the project, refer to the following link.
Modify a project's billing settings | Cloud Billing | Google Cloud
-
Create a dataset in BigQuery whthin the GCP project you desire - I named it
billing_export
-
You need appropriate permissions to setup export
- Billing account Admin
- Bigquery Admin for the project
-
resourcemanager.projects.update
permission
- Enable Cloud Billing export to BigQuery dataset
- Go to
Billing
menu in Navigation menu (https://console.cloud.google.com/billing/) - Go to
linked billing account
(this should be already set, if not you need to add payment info) - Select
Billing export
→ SelectBigQuery export
tab - Select
edit setting
on Daily cost detail- Select the project and dataset where you want the data to be sinked. (note) The BigQuery API is required to export data to BigQuery. If the project you selected doesn't have the BigQuery API enabled, you will be prompted to enable it. Click Enable BigQuery API and the API will be enabled for you
- Go to
Once everything is set, you will be able to see your screen like this.
When you first enable the daily cost detail export to BigQuery, it might take a few hours to start seeing your Google Cloud cost data. Table with gcp_billing_export_v1_<some_hash>
will be generated. Table will be automatically partitioned by day. (you can query partitioned data to save cost)
Schema
After a while you will notice a being exported to a table. You can click table and click preview or query.
SELECT usage_start_time, location.region, cost, currency, usage.amount, usage.unit, credits
FROM `<project>.billing_export.gcp_billing_export_v1_<hash>`
WHERE _PARTITIONTIME >= "2020-12-01 00:00:00"
If you need specific information for each column, refer to this link. It will help you understand each columns of data.
Understanding the Cloud Billing data tables in BigQuery | Google Cloud
Special note on credits
nested field if you are using free tier.
That's it for GCP! now to the hard part.
AWS Cost & Usage Report
I didn't want to reinvent the wheel, therefore as all the good developers do, I googled. I found couple of great articles regarding this subject, but they seemed to be complicated and outdated. Luckly, there were already a natively supported S3 transfer service in BigQuery. It supports transfer csv
and parquet
.
Amazon S3 transfers | BigQuery Data Transfer Service | Google Cloud
Let's save the cost and usage report to S3 first. Go to AWS Cost and Usage Reports. You can create a report here.
- Report name -
daily_report_gzip
- Include resource IDs -
check
- Data refresh settings -
uncheck
- S3 bucket - create a bucket or select a bucket
- prefix -
billing/daily_report_gzip
- Time granularity -
Daily
- Report versioning -
overwrite
- Compression type -
gzip
After creating report, it takes couple hours to show up on S3. Once report is created, you will see gz
file exported under following URI.
s3://<bucket>/billing/daily_report_gzip/20201201-20210101/
- data -
daily_report_gzip-00001.csv.gz
- metadata -
daily_report_gzip-Manifest.json
Schema
We need to know the schema in order to house the data in BigQuery table. If you want to know what each fields represents, consult the document.
When you open up the daily_report_gzip-Manifest.json
file, you will discover how each columns are structued. Take a note on variety of types - string
, OptionalBigDecimal
, DateInterval
and so on.
- Json - manifest file example
Converting manifest to bq table schema
In order to transfer the data, the BQ table needs to have correct schema to house the csv data. BQ have different types and there are too many fields to just copy and paste. You can configure table chema with Edit as text
. (ex: Field1:TYPE, Field2:TYPE) Let's use jq
to extract types as bq table schema.
I also need to replace some fields type to BigQuery compatible types.
- BigDecimal, OptionalBigDecimal → bignumeric
- Interval → string
- datetime → timestamp
- OptionalString → string
Download the json file and pipe with following jq command
cat daily_report_gzip-Manifest.json | jq -jr '.columns[] | .category+"_"+.name+":"+ .type +","' | sed s/.$// | sed 's/OptionalString/String/g' | sed 's/\<Interval\>/String/g' | sed 's/OptionalBigDecimal/BigNumeric/g' | sed 's/BigDecimal/BigNumeric/g' | sed 's/DateTime/Timestamp/g'
-
jq -jr '.columns[] | .category+"_"+.name+":"+ .type +","'
- this will extract each items in array and struct filed as
<category>_<name>:<type>
. I needed category prepended because there are duplicate names -
-r
for raw output
- this will extract each items in array and struct filed as
-
sed s/.$//
- remove last
,
- remove last
-
sed 's/<original type>/<bq type>/g'
- replace type to bq compatible types
This command will result in following texts
identity_LineItemId:String,identity_TimeInterval:String,bill_InvoiceId:String,bill_BillingEntity:String,bill_BillType:String,bill_PayerAccountId:String,bill_BillingPeriodStartDate:Timestamp,bill_BillingPeriodEndDate:Timestamp,lineItem_UsageAccountId:String,lineItem_LineItemType:String,lineItem_UsageStartDate:Timestamp,lineItem_UsageEndDate:Timestamp,lineItem_ProductCode:String,lineItem_UsageType:String,lineItem_Operation:String,lineItem_AvailabilityZone:String,lineItem_ResourceId:String,lineItem_UsageAmount:BigNumeric,lineItem_NormalizationFactor:BigNumeric,lineItem_NormalizedUsageAmount:BigNumeric,lineItem_CurrencyCode:String,lineItem_UnblendedRate:String,lineItem_UnblendedCost:BigNumeric,lineItem_BlendedRate:String,lineItem_BlendedCost:BigNumeric,lineItem_LineItemDescription:String,lineItem_TaxType:String,lineItem_LegalEntity:String,product_ProductName:String,product_accountAssistance:String,product_alarmType:String,product_architecturalReview:String,product_architectureSupport:String,product_availability:String,product_baseProductReferenceCode:String,product_bestPractices:String,product_cacheEngine:String,product_capacitystatus:String,product_caseSeverityresponseTimes:String,product_clientLocation:String,product_clockSpeed:String,product_computeFamily:String,product_computeType:String,product_cputype:String,product_currentGeneration:String,product_customerServiceAndCommunities:String,product_databaseEngine:String,product_deploymentOption:String,product_description:String,product_dominantnondominant:String,product_durability:String,product_ecu:String,product_endpointType:String,product_engineCode:String,product_enhancedNetworkingSupported:String,product_eventType:String,product_feeDescription:String,product_fromLocation:String,product_fromLocationType:String,product_group:String,product_groupDescription:String,product_includedServices:String,product_insightstype:String,product_instanceFamily:String,product_instanceType:String,product_instanceTypeFamily:String,product_intelAvx2Available:String,product_intelAvxAvailable:String,product_intelTurboAvailable:String,product_launchSupport:String,product_licenseModel:String,product_location:String,product_locationType:String,product_logsDestination:String,product_maxIopsBurstPerformance:String,product_maxIopsvolume:String,product_maxThroughputvolume:String,product_maxVolumeSize:String,product_memory:String,product_memoryGib:String,product_memorytype:String,product_messageDeliveryFrequency:String,product_messageDeliveryOrder:String,product_minVolumeSize:String,product_networkPerformance:String,product_normalizationSizeFactor:String,product_operatingSystem:String,product_operation:String,product_operationsSupport:String,product_physicalProcessor:String,product_preInstalledSw:String,product_pricingUnit:String,product_proactiveGuidance:String,product_processorArchitecture:String,product_processorFeatures:String,product_productFamily:String,product_programmaticCaseManagement:String,product_queueType:String,product_region:String,product_requestDescription:String,product_requestType:String,product_resourcePriceGroup:String,product_routingTarget:String,product_routingType:String,product_servicecode:String,product_servicename:String,product_sku:String,product_storage:String,product_storageClass:String,product_storageMedia:String,product_storageType:String,product_technicalSupport:String,product_tenancy:String,product_thirdpartySoftwareSupport:String,product_tiertype:String,product_toLocation:String,product_toLocationType:String,product_trafficDirection:String,product_training:String,product_transferType:String,product_usagetype:String,product_vcpu:String,product_version:String,product_volumeApiName:String,product_volumeType:String,product_whoCanOpenCases:String,pricing_LeaseContractLength:String,pricing_OfferingClass:String,pricing_PurchaseOption:String,pricing_RateId:String,pricing_currency:String,pricing_publicOnDemandCost:BigNumeric,pricing_publicOnDemandRate:String,pricing_term:String,pricing_unit:String,reservation_AmortizedUpfrontCostForUsage:BigNumeric,reservation_AmortizedUpfrontFeeForBillingPeriod:BigNumeric,reservation_EffectiveCost:BigNumeric,reservation_EndTime:String,reservation_ModificationStatus:String,reservation_NormalizedUnitsPerReservation:String,reservation_NumberOfReservations:String,reservation_RecurringFeeForUsage:BigNumeric,reservation_ReservationARN:String,reservation_StartTime:String,reservation_SubscriptionId:String,reservation_TotalReservedNormalizedUnits:String,reservation_TotalReservedUnits:String,reservation_UnitsPerReservation:String,reservation_UnusedAmortizedUpfrontFeeForBillingPeriod:BigNumeric,reservation_UnusedNormalizedUnitQuantity:BigNumeric,reservation_UnusedQuantity:BigNumeric,reservation_UnusedRecurringFee:BigNumeric,reservation_UpfrontValue:BigNumeric,savingsPlan_TotalCommitmentToDate:BigNumeric,savingsPlan_SavingsPlanARN:String,savingsPlan_SavingsPlanRate:BigNumeric,savingsPlan_UsedCommitment:BigNumeric,savingsPlan_SavingsPlanEffectiveCost:BigNumeric,savingsPlan_AmortizedUpfrontCommitmentForBillingPeriod:BigNumeric,savingsPlan_RecurringCommitmentForBillingPeriod:BigNumeric
Create BQ Table
Let's create a BQ table. You should have a dataset created already if you exported a Cloud billing. If not, let's create now.
Once dataset if created, click create table.
- table from -
Empty table
- Destination -
search for a project
- Project name - project you created a dataset
- Dataset name - select one you created
- Talbe name -
aws
- For Schema, toggle
Edit as text
and paste the schema you generated withjq
command - Partion - you can skip this or select a timestap filed that you will be commonly filter with. I chose
billingPeriodStartData
and partition byday
.
Transfer Prereq
Once the table is ready, let's create a transfer.
You need to have following permissions first.
bigquery.transfers.update
bigquery.datasets.update
These permissions are includede in predefiend bigquery.admin
role.
AWS credential
Create IAM for AWS credential for transfer. You only need AmazonS3ReadOnlyAccess
permission on specific bucket you created, but if you are lazy, just use AmazonS3ReadOnlyAccess
AWS managed policy on all resources. If you don't know how to create a IAM user, please refer to following link.
Amazon S3 - Creating an IAM user
Configure Transfer
On bigquery console, click transfer
menu on left pane. You will be redirected to a transfer configuration page. Click + Create Transfer
button.
- Source -
Amazon S3
- Display Name -
aws-billing-export
- Schedule options -
starts now
,daily
-
Destination
- Dataset ID - dataset you created
- Destination table -
aws
- Amazon S3 URI -
s3://<bucket>/billing/daily_report_gzip/*/*.gz
- Access key ID
- Secert access key
- File format -
csv
- Bq support gzip natively
- Transfer option
- number of errors allowed -
0
- ignore unknown values -
check
- field delimieter -
,
- Header rows to skip -
1
- number of errors allowed -
- Notification
- you can create new pub/sub and configure notification here. You can configure this later. So leave it blank for now.
Click Save!
You willing following green check mark if the transfer succeeds.
If you see error with red marker, exame the error logs on right pane. Common erros consist,
- Wrong S3 URI
- Wrong AWs credential
- Wrong table schema
- No objects
Cleaning Data
You have two tables populated with billing data! It's time do some analytics and visualizations.
Let's review fields
AWS
GCP
Understanding the Cloud Billing data tables in BigQuery | Google Cloud
You can use BQ console to query and testout data for integrity to makesure the load is intact. I won't go into details with sql query.
Once Extracted and Loaded, I do Transform on Holistics.io leaving the data intact, but you can also create new view to minimize, clean and consolidate the data into single table or a view using scheduled query feature of BigQuery.
Visualize data
After inspecting the with couple queries, and I am just a newbie in sql world, I use Data Studio to inspect the data.
When you use BigQuery, You can connect data to Data Studio without extra configuration.
You can use other BI tool to connect BQ to visualize and analyze you data from here.
Extra
- [ ] Check data integrity with daily and monthly roll over
Conclusion
As BQ support S3 transfer
, it was a lot easier for me to move data around but still there were some work to be done.
Please let me know if you find more concrete solution to consolidate billing data in single place!
Thank you for reading here are some pictures of Kokos!
Reference
Export Cloud Billing data to BigQuery | Google Cloud
Overview of Amazon S3 transfers | BigQuery Data Transfer Service
Using runtime parameters in transfers | BigQuery Data Transfer Service
Top comments (5)
This is a great post. I too had previously found the very complex AppEngine workflow, but this version is much simpler. I chose to use parquet as the export format, but I don't think that makes much difference. What I have observed though is that I end up with duplicated AWS billing data the next day; the AWS billing export is updated every day in the S3 bucket, and if I run the BigQuery transfer again, end up with doubled costs because of duplicated and changed rows. How did you get around this problem? Just de-duplicating the data naively with SELECT DISTINCT * doesn't quite work either, because some rows aren't exactly the same, but the second day's row is a cumulative update to the previous day's row, so you still have the cost in there twice for long-running resources. Did you find a solution to this issue?
Did you find a solution for this by any chance?
Yes - I partition the BigQuery table by day, and then delete the duplicates with a cloud function as follows. I use a BigQuery Transfer job to copy the AWS data from S3 into BigQuery once a day, and this cloud function triggers to tidy up.
In case you want to transfer data from BigQuery to S3. You can follow this video as it explains in very detail each step to transfer data from BigQuery to S3.
[Transfer data from Google BigQuery to AWS S3 In Under 10 Minutes] - youtu.be/whHpS4mbwdU
I am see a website that are working on Medical Billing Services in New Jersey I want to make some code will you provide me it.