Jacqui W

Sign in

Just imagine you are a customer of a cloud provider and you want to spin up some machines, you can go into some web console, fill in some forms, click some buttons and then launch an instance. But you can also use Terraform.

Terraform allows you to do the same…


What is CloudFormation?

CloudFormation is a tool which can spin up resources on AWS. If someone wants to be an AWS expert, CloudFormation is an essential service to master.

Before we jump into writing a CloudFormation template, let’s have a brief history about how to manage AWS infrastructure before CloudFormation.

Without CloudFormation, automating…


As we know, the first 70% of SQL is pretty straightforward but the remaining 30% can be pretty tricky.

So, in this blog, some popular hard SQL interview questions will be covered for people to sharpen their skills.

Self-Join Practice Problems

Part 1: How much a key metric, e.g., monthly active users, changes…


As a BI Analyst working in an Online Travelling Agency company, interpreting customer behaviors data into meaningful insights is a Business As Usual task.

Google Analytics is a popular web analytics service tracking website traffic. …


The maximum length of a Google Analytics payload is 8192 bytes. It is useful to check if you are approaching this value with some of your hits because if the payload length exceeds this, the hit is never sent to GA.

How can we know the payload size with each hit?

Today i will show you how to send the…


Due to the massive volume of data, Spark is built to handle big data in many user cases. It is an open source project on Apache.

Spark can use data stored in a variety of formats, including parquet files.

What is Spark?

Spark is a general-purpose distributed data processing engine that is suitable…


The first question is why we need data integration?

Let me give you an example here to answer this question.

Every company has many departments, and different departments use different tools to store their data. For example, marketing team may use hubspot tool.

Now, we have different departments which store different types of data in a company.

However, insightful…


What is cloud computing?

The practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.

On-premise:

  • You own the servers
  • You hire the IT people
  • You pay or rent the real-estate
  • You take all the risk

Cloud…


Last blog I wrote why we need a Data Warehouse.

First, what is the data warehouse?

It is a centralized relational database that pulls together data from different sources (CRM, marketing stack, etc.) for better business insights.

It stores current and historical data are used for reporting and analysis.

However, here is the problem:

How we can design a Data Warehouse?

1 Define…

Jacqui W

A current Data Analyst in a subsidiary under Webjet, with experience in applying data science techniques to business.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store