As we know, the first 70% of SQL is pretty straightforward but the remaining 30% can be pretty tricky.
So, in this blog, some popular hard SQL interview questions will be covered for people to sharpen their skills.
Part 1: How much a key metric, e.g., monthly active users, changes between months, e.g., a table named ‘logins’ is shown as below.
Q: find the month-over-month percentage change for monthly active users
WITH mau AS
DATE_TRUNC('month', date) AS month_timestamp,
COUNT(DISTINCT user_id) AS mau
GROUP BY DATE_TRUNC('month', date)
a.month_timestamp AS previous_month,
a.mau AS previous_mau,
b.month_timestamp AS current_month,
b.mau AS current_mau,
ROUND(100.0*(b.mau - a.mau)/a.mau,2) AS percent_change
FROM mau a
JOIN mau b
ON a.month_timestamp = b.month_timestamp …
As a BI Analyst working in an Online Travelling Agency company, interpreting customer behaviors data into meaningful insights is a Business As Usual task.
Google Analytics is a popular web analytics service tracking website traffic. Therefore, for BI&Reporting team, how to interpret Google Analytics data seems to be an essential skill.
In the technical side, standard SQL can be used in Google BigQuery(cloud-based data warehousing platform) to generate data insights from Google Analytics.
Let’s take some query samples to have a look. Firstly, some dynamic values need to be understood before we write the first query:
The maximum length of a Google Analytics payload is 8192 bytes. It is useful to check if you are approaching this value with some of your hits because if the payload length exceeds this, the hit is never sent to GA.
Today i will show you how to send the payload size as a custom dimension to GA with each hit. The tool is Google Tag Manager.
Before starting, creating a new hit-scoped custom dimension in GA is essential, named ‘Hit Payload Length’ and check its index, which will be used in the next step.
Then, create a custom task in GTM, the custom task is to let users modify the request sent to GA before it is sent. We can take Client ID as an example. …
Due to the massive volume of data, Spark is built to handle big data in many user cases. It is an open source project on Apache.
Spark can use data stored in a variety of formats, including parquet files.
Spark is a general-purpose distributed data processing engine that is suitable for use.
On top of the Spark core data processing engine, there are libraries for SQL, machine learning, etc. Tasks most frequently associated with Spark include ETL and SQL batch jobs across large data sets.
It has an extensive set of developer libraries and APIs and supports languages such as Java, Python, R, and Scala. Its flexibility makes it well-suited for a range of use cases, for this blog, we will just talk about data integration. Python is revealed the Spark programming model to work with structured data by the Spark Python API which is called as PySpark. …
Let me give you an example here to answer this question.
Every company has many departments, and different departments use different tools to store their data. For example, marketing team may use hubspot tool.
Now, we have different departments which store different types of data in a company.
However, insightful information is needed to make business decisions through those large amount of data.
What can we do?
Maybe we can connect all the databases everytime to generate reports. However, it will cost us large amount of time, then the term of data integration is raised.
Data integration is a process in which heterogeneous data is retrieved and combined as an incorporated form and structure. …
The practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.
Different kinds of hosting:
Last blog I wrote why we need a Data Warehouse.
It is a centralized relational database that pulls together data from different sources (CRM, marketing stack, etc.) for better business insights.
It stores current and historical data are used for reporting and analysis.
However, here is the problem:
1 Define Business Requirements
Because it touches all areas of a company, all departments need to be onboard with the design. Each department needs to understand what the benefits of data warehouse and what results they can expect from it.
What objectives we can focus on:
Last Blog I demonstrated the data pipeline we can use CrUX to analyze the site performance. This is from a BI developer perspective.
However, for a company, especially the leadership team, what they want is the final dashboard that generated from BI department, so management plan can be gained.
So this blog I will show you what kind of dashboard we can generate after the steps of data collection from Google public dataset and ETL. …
Last blog I gave some examples of how we can use the Chrome User Experience report (CrUX) to gain some insights about site speed. This blog I will continue to show you how to use bigquery to compare your site with the competitors.
chrome-ux-reportdataset and explore the way the tables are structured in 'preview'