Google sales his Big Query based on a Pay by Use business model.
I assume that you will use big Query for what it is designed for, I mean a lot of data (billions of lines) reads several times a day during batches, calculation, aggregation or reports reads
As you pay by use, you need to pay attention to the types you are going to use.
For example, to store numbers Google will invite you to use NUMERIC types.
Let’s dive in the Google documentation: you have 3 types of numerics
- NUMERIC: 16 bytes
- FLOAT64: 8 bytes
- INT64: 8 bytes
I made a POC about two different types in order to check about elapsed time, IOs and CPU consumption
- I created 2 tables with a payment amount column (PAYMENT_AMT)
- In F_SALES_TRX_LINE_PAYMENT_LOAD_ETL_TYPES table I defined it as a FLOAT64
- In F_SALES_TRX_LINE_PAYMENT_LOAD_ETL table I defined it as a NUMERIC
- I loaded the table with the exact same data and the same number of records
Here is the result!
Comparing the duration
FLOAT64 Type
select sum(PAYMENT_AMT) from ade_test.F_SALES_TRX_LINE_PAYMENT_LOAD_ETL_TYPES; -- 1.4466987007999676E8
NUMERIC Type
select sum(PAYMENT_AMT) from ade_test.F_SALES_TRX_LINE_PAYMENT_LOAD_ETL --144669870.08
Duration analysis
Duration is almost the double for Numeric (258 – 153)
Money costs
FLOAT Type costs
NUMERIC Type costs
Google will charge the double amount to perform the same calculation and obtain the exact same result.
57Mb will be charged for NUMERIC, vs 29Mb for FLOAT64
About Precision
Naturally losing precision on operations
Float implies some differences according to IEEE.
If you execute
SELECT 4.35 * 100 a_float , CAST(4.35 AS NUMERIC) * 100 a_numeric
You will obtain
Workaround
A good workaround that will not imply any cost if to ROUND the float
Just put the right precision within the ROUND function
SELECT 4.35 * 100 Regular_Float , ROUND(4.35 * 100) Rounded_float , CAST(4.35 AS NUMERIC) * 100 Regular_numeric
SELECT (4.35 * 9.3 )* 100 Regular_Float , ROUND((4.35 *9.3 ) * 100) Rounded_float , ROUND( ((4.35 *9.3 ) * 100) ,2 ) Rounded_float_2 , CAST((4.35 *9.3 ) AS NUMERIC) * 100 Regular_numeric
Conclusion
By choosing the right data types, you will
- Drastically reduce the cost on your Google invoice
- Make your queries faster
- And participate in saving the planet 😉