How to do web scrapping in Power Query in Power BI?

On web, there are public data sourse which can be imported and transformed to make insight reports or dashboards.

There are two circumstances:

For example, we can try to make an easy one.

Here is the web link: https://en.wikipedia.org/wiki/Demographics_of_China. What we want is to import the tables from a Wikipedia webpage.

The simple steps are as follow:

Open Power BI Desktop->Click Get Data->Click Web-> Paste the URL in the dialog and click OK.

So, there is the screenshot:

Image for post
Image for post

In Navigatot dialog, we can select the corresponding table that we want to import and transform it.

This kind of importing from web in Power BI is simple. Because Power Query can identify the table in HTML and tables are in the <table>…</table> tags.

If we meet this kind of circumstance, we will end up seeing the Document entity in the Navigator.

Next blog, we will demonstrate how to extract the relevant elements in HTML with Power Query in Power BI.

The tricky part id how to find the correct path in the HTML source tree, which needs you to have a basic but not prior HTML knowledge.

To motivate you to keep reading, here is the final web scarping screenshot which i made recently:

Image for post
Image for post

If you are interested in or have any problems with Power BI or Power Query, feel free to contact me.

Or you can connect with me through my LinkedIn.

Originally published at http://jacquiwu.com on February 3, 2020.

Written by

A current Data Analyst in a subsidiary under Webjet, with experience in applying data science techniques to business.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store