How to Build a Web Scraping Pipeline in Python Using BeautifulSoup

Published
Mar 28, 2022
Reading Time
Rate this post
(11 votes)
How to Build a Web Scraping Pipeline in Python Using BeautifulSoup

Execution time: 0.0004 seconds

ACF

ID52447
keyfield_61fcd4e1daf53
labelJump to Section
nameblog_jum_to_section
prefixacf
typewysiwyg
value<p class="entry-title"><a href="#how-to-build-a-web-scraping-with-python-beautifulsoup">How to Build a Web Scraping with Python Using BeautifulSoup</a></p> <ul> <li><a href="#step-1-identify-your-goal-and-explore-the-website-of-interest"><span style="font-weight: 400;">Step 1: Identify your goal and explore the website of interest</span></a></li> <li><a href="#step-2-inspect-web-pages-HTML"><span style="font-weight: 400;">Step 2: Inspect web page’s HTML</span></a></li> <li><a href="#step-3-install-and-import-libraries"><span style="font-weight: 400;">Step 3: Install and import libraries</span></a></li> <li><a href="#step-4-retrieve-website-and-parse-HTML"><span style="font-weight: 400;">Step 4: Retrieve website and parse HTML</span></a></li> <li><a href="#step-5-extract-clean-and-store-data"><span style="font-weight: 400;">Step 5: Extract, clean, and store data</span></a></li> <li><a href="#step-6-save-file"><span style="font-weight: 400;">Step 6: Save File</span></a></li> </ul>
menu_order1
parent52446
wrapperArray
tabsall
toolbarbasic
_nameblog_jum_to_section
_valid1

Module Settings

custom_identifierJump to section
acf_namefield_61fcd4e1daf53
is_author_acf_fieldoff
post_object_acf_namenone
author_field_typeauthor_post
linked_user_acf_namenone
type_taxonomy_acf_namenone
acf_tagdiv
show_labelon
label_seperator
custom_labelJump to section
visibilityon
empty_value_optionhide_module
use_iconoff
icon_color#7EBEC5
use_circleoff
circle_color#7EBEC5
use_circle_borderoff
circle_border_color#7EBEC5
use_icon_font_sizeoff
icon_image_placementleft
image_mobile_stackinginitial
return_formatarray
image_link_urloff
image_link_url_acf_namenone
checkbox_stylearray
checkbox_radio_returnlabel
checkbox_radio_value_typeoff
checkbox_radio_linkoff
link_buttonoff
email_subjectnone
email_body_afternone
add_css_classoff
add_css_loop_layoutoff
add_css_class_selectorbody
link_new_tabon
link_name_acfoff
link_name_acf_namenone
url_link_iconoff
image_sizefull
true_false_conditionoff
true_false_condition_css_selector.et_pb_button
true_false_text_trueTrue
true_false_text_falseFalse
is_audiooff
is_videooff
video_loopon
video_autoplayon
is_oembed_videooff
defer_videooff
defer_video_iconI||divi||400
video_icon_font_sizeoff
pretify_textoff
pretify_seperator,
number_decimal.
show_value_if_zerooff
text_imageoff
is_options_pageoff
is_repeater_loop_layoutoff
linked_post_stylecustom
link_post_seperator,
link_to_post_objecton
loop_layoutnone
columns4
columns_tablet2
columns_mobile1
repeater_dyn_btn_acfnone
text_before_positionsame_line
label_positionsame_line
vertical_alignmentmiddle
admin_labelTable of contents
module_classblog-table-of-contents
_builder_version4.16
_module_presetdefault
title_css_font_size14px
title_css_letter_spacing0px
title_css_line_height1em
acf_label_css_font|600|||||||
acf_label_css_text_color#2c38b1
acf_label_css_font_size28px
acf_label_css_letter_spacing0px
acf_label_css_line_height1em
label_css_font_size16px
label_css_letter_spacing0px
text_before_css_font_size14px
text_before_css_letter_spacing0px
text_before_css_line_height1em
seperator_font_size14px
seperator_letter_spacing0px
seperator_line_height1em
relational_field_item_font_size14px
relational_field_item_letter_spacing0px
relational_field_item_line_height1em
background_color#f7f7f7
background_enable_coloron
use_background_color_gradientoff
background_color_gradient_repeatoff
background_color_gradient_typelinear
background_color_gradient_direction180deg
background_color_gradient_direction_radialcenter
background_color_gradient_stops#2b87da 0%|#29c4a9 100%
background_color_gradient_unit%
background_color_gradient_overlays_imageoff
background_color_gradient_start#2b87da
background_color_gradient_start_position0%
background_color_gradient_end#29c4a9
background_color_gradient_end_position100%
background_enable_imageon
parallaxoff
parallax_methodon
background_sizecover
background_image_widthauto
background_image_heightauto
background_positioncenter
background_horizontal_offset0
background_vertical_offset0
background_repeatno-repeat
background_blendnormal
background_enable_video_mp4on
background_enable_video_webmon
allow_player_pauseoff
background_video_pause_outside_viewporton
background_enable_pattern_styleoff
background_pattern_stylepolka-dots
background_pattern_colorrgba(0,0,0,0.2)
background_pattern_sizeinitial
background_pattern_widthauto
background_pattern_heightauto
background_pattern_repeat_origintop_left
background_pattern_horizontal_offset0
background_pattern_vertical_offset0
background_pattern_repeatrepeat
background_pattern_blend_modenormal
background_enable_mask_styleoff
background_mask_stylelayer-blob
background_mask_color#ffffff
background_mask_aspect_ratiolandscape
background_mask_sizestretch
background_mask_widthauto
background_mask_heightauto
background_mask_positioncenter
background_mask_horizontal_offset0
background_mask_vertical_offset0
background_mask_blend_modenormal
custom_buttonoff
button_text_size20
button_bg_use_color_gradientoff
button_bg_color_gradient_repeatoff
button_bg_color_gradient_typelinear
button_bg_color_gradient_direction180deg
button_bg_color_gradient_direction_radialcenter
button_bg_color_gradient_stops#2b87da 0%|#29c4a9 100%
button_bg_color_gradient_unit%
button_bg_color_gradient_overlays_imageoff
button_bg_color_gradient_start#2b87da
button_bg_color_gradient_start_position0%
button_bg_color_gradient_end#29c4a9
button_bg_color_gradient_end_position100%
button_bg_enable_imageon
button_bg_parallaxoff
button_bg_parallax_methodon
button_bg_sizecover
button_bg_image_widthauto
button_bg_image_heightauto
button_bg_positioncenter
button_bg_horizontal_offset0
button_bg_vertical_offset0
button_bg_repeatno-repeat
button_bg_blendnormal
button_bg_enable_video_mp4on
button_bg_enable_video_webmon
button_bg_allow_player_pauseoff
button_bg_video_pause_outside_viewporton
button_use_iconon
button_icon_placementright
button_on_hoveron
positioningnone
position_origin_atop_left
position_origin_ftop_left
position_origin_rtop_left
widthauto
max_widthnone
min_heightauto
heightauto
max_heightnone
custom_margin50px||50px||true|false
custom_padding30px|20px|30px|20px|true|true
filter_hue_rotate0deg
filter_saturate100%
filter_brightness100%
filter_contrast100%
filter_invert0%
filter_sepia0%
filter_opacity100%
filter_blur0px
mix_blend_modenormal
animation_stylenone
animation_directioncenter
animation_duration1000ms
animation_delay0ms
animation_intensity_slide50%
animation_intensity_zoom50%
animation_intensity_flip50%
animation_intensity_fold50%
animation_intensity_roll50%
animation_starting_opacity0%
animation_speed_curveease-in-out
animation_repeatonce
hover_transition_duration300ms
hover_transition_delay0ms
hover_transition_speed_curveease
link_option_url_new_windowoff
sticky_positionnone
sticky_offset_top0px
sticky_offset_bottom0px
sticky_limit_topnone
sticky_limit_bottomnone
sticky_offset_surroundingon
sticky_transitionon
motion_trigger_startmiddle
hover_enabled0
acf_label_css_font_size_tablet24px
acf_label_css_font_size_phone22px
acf_label_css_font_size_last_editedon|phone
title_css_text_shadow_stylenone
title_css_text_shadow_horizontal_length0em
title_css_text_shadow_vertical_length0em
title_css_text_shadow_blur_strength0em
title_css_text_shadow_colorrgba(0,0,0,0.4)
acf_label_css_text_shadow_stylenone
acf_label_css_text_shadow_horizontal_length0em
acf_label_css_text_shadow_vertical_length0em
acf_label_css_text_shadow_blur_strength0em
acf_label_css_text_shadow_colorrgba(0,0,0,0.4)
label_css_text_shadow_stylenone
label_css_text_shadow_horizontal_length0em
label_css_text_shadow_vertical_length0em
label_css_text_shadow_blur_strength0em
label_css_text_shadow_colorrgba(0,0,0,0.4)
text_before_css_text_shadow_stylenone
text_before_css_text_shadow_horizontal_length0em
text_before_css_text_shadow_vertical_length0em
text_before_css_text_shadow_blur_strength0em
text_before_css_text_shadow_colorrgba(0,0,0,0.4)
seperator_text_shadow_stylenone
seperator_text_shadow_horizontal_length0em
seperator_text_shadow_vertical_length0em
seperator_text_shadow_blur_strength0em
seperator_text_shadow_colorrgba(0,0,0,0.4)
relational_field_item_text_shadow_stylenone
relational_field_item_text_shadow_horizontal_length0em
relational_field_item_text_shadow_vertical_length0em
relational_field_item_text_shadow_blur_strength0em
relational_field_item_text_shadow_colorrgba(0,0,0,0.4)
border_radiion|5px|5px|5px|5px
button_text_shadow_stylenone
button_text_shadow_horizontal_length0em
button_text_shadow_vertical_length0em
button_text_shadow_blur_strength0em
button_text_shadow_colorrgba(0,0,0,0.4)
box_shadow_stylenone
box_shadow_colorrgba(0,0,0,0.3)
box_shadow_positionouter
box_shadow_style_buttonnone
box_shadow_color_buttonrgba(0,0,0,0.3)
box_shadow_position_buttonouter
text_shadow_stylenone
text_shadow_horizontal_length0em
text_shadow_vertical_length0em
text_shadow_blur_strength0em
text_shadow_colorrgba(0,0,0,0.4)
disabledoff
global_colors_info{}

Execution time: 0.0014 seconds

Authors: Yuan Yin and Mulugheta T. SOLOMON

Use Case: Extracting Information about Products from an Online Store

In this tutorial, you will learn how to:

  • Create a web scraping pipeline in Python
  • Navigate and parse HTML code
  • Use Beautiful Soup and Requests to fetch and extract data from websites
  • Go through multiple pages and avoid crash by handling exception
  • Clean and store extracted data in a meaningful way
  • Build a mindset to mentally prepare for web scraping

Web Scraping is the process of automating data extraction from websites. You can build a web scraper to take something out of a web page, such as gathering reviews of books from a third-party platform, downloading all the lyrics of your favorite songs, or just for fun as a surfer.

Several popular tools are available for web scraping, like Beautiful Soup, Scrapy, Selenium, and so on. Beautiful Soup and Scrapy are both excellent starting points. While Selenium is powerful in web automation, such as clicking a button or selecting elements from a menu, etc., it’s a little bit tricky to use. 

This tutorial focuses on Beautiful Soup and will build a web scraper step-by-step to extract information about books listed on Book to Scrape website. Book to Scrape is a demo website for web scraping purposes, with a typical and well-presented structure of retail websites. In this tutorial, we assume that you’re new to web scraping, so using a static and durable website will be a good choice for your learning and practicing. Enjoy the journey!

We will go through three phases involving six steps: 

  • Phase 1 – Setup: i.e., identifying your scraping goal, exploring and inspecting the website, installing or importing necessary packages. 
  • Phase 2 – Acquisition: i.e., accessing the website and parsing its HTML. 
  • Phase 3 – Extraction and processing: that is, extracting, cleaning, and storing data of interest, saving the final result. 

Let’s start!

STEP 1: Identify Your Goal and Explore the Website of Interest

Yes, the initial step is not to open a Jupyter Notebook or your favorite IDE. You will start web scraping by clarifying your scraping goal and how the target website presents the information you want. In this tutorial, our goal is to extract information about products listed on a book store website, which may include category, book title, price, rating, availability, etc.

Book Store Website

By visiting the home page, you will find that your web scraper may be able to do a lot of things, such as: 

  • Filtering books by category
  • Getting the information you’re interested in, like book title and price
  • Going to next page to load more books
  • Going to single product page to get more detailed information about a book

STEP 2: Inspect Web Page’s HTML

HTML (Hypertext Markup Language) is the standard markup language for Web pages. With HTML, you can create your website or scrape existing websites.

It is fine to follow the tutorial without HTML background knowledge, as we will introduce the basic structure of HTML to make your work easier. You also can refer to Wikipedia or other resources for more information about HTML.

After exploring the website, it’s time to switch your identity from a customer to a scraper and get yourself familiar with the HTML structure of the target website. By right-clicking on the item of interest and selecting ‘inspect’, you are opening the Developer Tools. You can see the HTML text and how it can be expanded,  collapsed, and even edited. 

 Inspect Web Page’s HTML

Inspect Web Page’s HTML

Don’t be intimidated by the whole HTML page if it’s your first time to inspect a website. All you need is to be patient, and you will find what you want! In this example, all the books are listed under <ol class="row">. For each book, the structure of HTML is the same: under <li class = "col-xs-6 col-sm-4 col-md-3 col-lg-3"> and then under <article class="product_pod">, there are several sub-sections containing book image, rating, title, price, and so on. Here,  <ol>, <li>, <article> are tags of HTML; they represent the ordered list, article, and list item respectively. Note that all the tags come in pairs. 

Let’s take a closer look at the <h3> tag.

...
<ol class="row">
<li class="col-xs-6 col-sm-4 col-md-3 col-lg-3">
<article class="product_pod">
<div class="image_container">...</div>
<p class="star-rating Three">...</p>
<h3>
<a href="../../../the-secret-garden_413/index.html" title="The Secret Garden">The Secret Garden</a>
</h3>
<div class="product_price">...</div>
<article>
<li>
<li class="col-xs-6 col-sm-4 col-md-3 col-lg-3">...<li>
<li class="col-xs-6 col-sm-4 col-md-3 col-lg-3">...<li>
...

<h1> to <h6> defines HTML headings, the <a> tag under <h3> defines a hyperlink, i.e., the URL of the product page. Also, we can find book title within <a>

By right-clicking on the price and inspecting it, you will find the price information is located as follows:

...
<div class="product_price">
<p class="price_color">£15.08</p>
<p class="instock availability">
<i class="icon-ok"></i>
In stock
</p>
...

In addition to price, the in-stock status is also available here.

Right-clicking what you’re interested in then inspecting its HTML text is the must-have skill to fetch information from a website, and you will be familiar with the HTML structure of a website through using this skill again and again. The more you explore and inspect the website, the more familiar you are with its HTML, and therefore the better your web scraper will work.

STEP 3: Install and Import Libraries

It’s coding time!

Your web scraping goal and the target website determine what libraries will be installed or imported. Generally, you need to get the following tools to be ready:

  • Requests: allows you to send access requests to website easily
  • Beautiful Soup: pulls data out of HTML files
  • Pandas: handles extracted data (The alternatives include csv, json, or other libraries you prefer) 

Optional:

  • Re: a Python’s standard library, can be used to check if a string contains the specified search pattern. 
  • Datetime:  a Python’s standard library, supplies classes for manipulating dates and time

Install beautifulsoup4, requests, and pandas packages first, if you haven’t done so (Code Snippet 1). Then import these modules(Code Snippet 2). re and datetime are both Python built-in libraries, which can be imported directly as needed. 

pip install beautifulsoup4
pip install requests
pip install pandas

Code Snippet 1

from bs4 import BeautifulSoup as soup
import requests 
import pandas as pd 
import re
import datetime

Code Snippet 2

The next two steps will help you construct the main body of your script, that is, retrieving the website and parsing its HTML, as well as extracting and storing data of interest.

STEP 4: Retrieve Website and Parse HTML

This step is extremely important but pretty easy to achieve; only a few lines of code are needed (Code Snippet 3):

# Identify the target website's address, i.e., URL
books_url = 'https://books.toscrape.com/index.html'
# Create a response object to get the web page's HTML content
get_url = requests.get(books_url)
# Create a beautiful soup object to parse HTML text with the help of the html.parser
books_soup = soup(get_url.text, 'html.parser')
# Check website's response
print(get_url)

Code Snippet 3

If you get 200 by running print(get_url), your target website has replied to you saying that it’s ok to connect. Getting a 403 means your request is a legal request, but the server is refusing to respond to it. 404 means the requested page could not be found but may be available again in the future. 

# Get some intuition by printing out HTML
# This step is not required to build a web scraper
print(get_url.text)
print(books_soup)
# Use prettify() method to make the HTML be nicely formatted
print(books_soup.prettify())

Code Snippet 4

If you print out the HTML text of the website (Code Snippet 4), you will notice that the output of print(get_url.text) looks like the HTML text you inspected in Step 2. Beautiful Soup makes the HTML content easy to access, and the prettify() method can be used to present the output nicely.

Now, you have successfully retrieved the target website and created a beautiful soup object to parse HTML of the website.

STEP 5: Extract, Clean, and Store Data

Step 5 is a bit complicated. It involves several sub-steps within.

STEP 5.1: Get Ready By Answering Questions (Mental Preparation)

Before extracting data, you have to clarify some problems, such as:

  • What data will you scrape?
  • How do you store and save the extracted data?
  • What is your workflow?
  • What are some problems you may encounter?

It is extremely important to take time to think about such questions before scraping. But where to find the answers to these questions? Here are some hints:

  • Answer the first question based on your purpose and the content of your target website
  • Answer the second question based on the types of your extracted data (Number? Text? Image?) or your preference (Store data in a list? A dictionary? A list of dictionaries? Save data as a CSV file? JSON? Excel?)
  • Answer the third question based on the structure of your target website
  • Answer the fourth question as you try to answer the above three questions

Let’s go through the questions one by one. 

Q1: What data will you scrape?

A1: Suppose we will use the data extracted from the website for some particular purposes, including analyzing price volatility, tracking in-stock status, and comparing books of different categories. Based on these purposes, we consider scraping the following information: 

  • scraping date: Information changes over time, like price, rating, in-stock status, etc.. You may want to do a time series analysis in the future, but firstly, you need to record the date of scraping. How to leave a timestamp? (Notes: Book to Scrape is a demo website; prices and ratings were randomly assigned and remain unchanged. But that is not the case for real-world websites.)
  • book id: The unique id to identify a book (Generally, books need an ISBN and non-book products need a UPC as an identifier, but in this case, only UPC is available. ¹) It also can be a unique product id created by the website itself.

You can’t find a book’s UPC on the products list page. It’s on the single product page, and within a table. How to get information from a table?

How to get information from a table

¹ “ISBN and UPC product code information – ECPA.” https://www.ecpa.org/page/Codes/ISBN-and-UPC-product-code-information.htm?page=businesssolutions. Accessed 15 Feb. 2022.

  • book title: No worries? Yes worries!

Book Title

Not all the books show the complete book title. If you insepet the book, you will find the text part of <a> is incomplete but the value of <title> looks good. How to fetch what you want exactly?

<a href="../../../and-then-there-were-none_119/index.html" title="And Then There Were None">And Then There Were ...</a>

  • category: When you fetch categories from the website, you may find ‘Books’ is also listed in the extracted data. You definitely don’t want ‘Book’ as a category, but how to drop what you don’t want exactly?
  • price: There are different prices listed in the product information table above. Which price needs to be fetched? It’s up to you. In this tutorial, we will not extract price from the table but use the single price located under the book title. Can we keep the numeric part only by removing the currency symbols? In this case, the currency symbols might be £.
  • in stock: In-stock status may be either in stock or out of stock. You also can use Y and N to represent the status through some data cleaning work. Is it necessary to take data cleaning into account when building a web scraper?
  • availability: As you can see, in the product information table, availability is combined with in-stock status, like ‘In stock (5 available)’. But what if it is ‘Out of stock (0 available)’? How to extract content having the same pattern but with different contexts?
  • rating: The rating on Book to Scrape websiteIt is odd. Generally, data is not embedded in the class name. For example:

On Amazon, the rating is 4.8 out of 5 stars instead of a-icon-alt

<span class="a-icon-alt">4.8 out of 5 stars</span></i>

On Walmart, the rating is (4.6) instead of f7 rating-number

<span class="f7 rating-number">(4.6)</span>

But on Book to Scrape: The rating is Two

<p class="star-rating Two">

So Is there a silver bullet to handle all forms of websites?

  • link: The unique web address, i.e., URL. 

Note: URL is the string assigned to href in the <a> element, but the problem is it looks abnormal. For example: 

<h3>
        <a href="../../../the-secret-garden_413/index.html" title="The Secret Garden">The Secret Garden</a>
</h3>

The unique web address

You can go to the specific book page by clicking the link directly when you do inspection, but if you copy (right-click the HTML text then select Copy element) and paste the string into your notebook, you will find it does not work. That is because the link put in the <a> element is a specific site location, which is the path to a book’s unique resource. The complete URL is as follows: 

How do you make your scraped URL (the incomplete one) work well?

After deciding what information to scrape, and of course, raising more questions, let’s go to the next one.

Q2: How to store and save the extracted data?

A2: Choose a meaningful way to store data. As you can imagine, in this case, the extracted data will be numbers or text, unless you want to get the image of a book. In the following work, we will build a list of dictionaries to store all the books. Why? Because it’s easy to append a dictionary, which contains the information of a book, to a list, just like how you put a real book into a shopping cart. And after extracting all the data, you can simply convert the list of dictionaries to a dataframe and save it as a CSV file. That is one of the meaningful ways we stated before. It is based on the types of your extracted data, your purpose, and sometimes, your preference.

Q3: What is your workflow?

A3: You are visiting the website’s home page, exploring it, and intending to get the information of all the books by category; you are not satisfied with the information listed on the products list page and want to get more. So we guess that your workflow would be: 

Find all categories (from the first to the last), go to the products list page under a category, scrape all the books listed on the current page, including the detailed information of a book on the single product page, go to scrape the next page if the current page is not the last one. Done!

We can build several functions to go through the workflow.

def find_categories():
# Find all categories

def fetch_books_by_category():
# Fetch all the books under a category page by page

def fetch_current_page_books():
# Fetch all the books listed on the current page
# Build a dictionary to store extracted data 
# Append the dictionary to a list
# Go to next page if current page is not the last one

def fetch_more_info():
# Get detailed info about a book

def fetch_all_books():
# Fetch all the books of all the categories
# Return the list of dictionaries that contains all the extracted data
return books_all

Code Snippet 5

Q4: What problems you may encounter?

A4: The problems highlighted in bold above and more! You don’t need to (and it’s also impossible) resolve all the problems before you start to extract data. But it is definitely a good practice to consider these problems. You will explore the website, inspect and search HTML elements, get the answers (and there might be more questions), write and test your codes, back and forth.

If this is your first web scraper, we advise you to begin with a small chunk, such as fetching categories only, to get some experience, which is also a good way to see if your script works well.

STEP 5.2 Start by Fetching a Single Variable

Start by Fetching a Single Variable

Let’s take a look at the HTML elements of categories.

...
<ul class="nav nav-list">
<li>
<a href="catalogue/category/books_1/index.html">Books</a>
<ul>
<li>
<a href="catalogue/category/books/travel_2/index.html">Travel</a>
</li>
</li>...</li>
</li>...</li>
</li>...</li>
...

Books is under <ul class="nav nav-list"> and then under <li>. If we consider <ul class="nav nav-list"> as a grandparent element and <li> as a parent element, <ul> (without class name) under <li> can be treated as a child element, which has a lot of grandchildren elements, also named as <li>, and each of them represents a category. 

Our purpose is to fetch categories, excluding Books. So the most important thing here is to find the right element in the books_soup. The books_soup is a beautiful soup object, created in Code Snippet 3. A beautiful soup object contains all the HTML elements and can be accessed by find() or find_all() method.

# Find all the categories listed on the web page
# This step is used for testing and practicing, which can be skipped for the final scraper
categories = books_soup.find('ul', {'class': 'nav nav-list'}).find('li').find('ul').find_all('li')

Code Snippet 6

find() method finds out the specific element you want in the soup object, and find_all() method captures all matching elements on the page. Here, we first find the correct <ul> element with a specific class name 'nav nav-list'(enclosed in a dictionary), and then find <li> and  <ul> successively, under which we will find out all the <li>

Print out categories and len(categories) to see what you have found. If we change the code to books_soup.find('ul', {'class': 'nav nav-list'}).find_all('li'), what will happen? Try it!

For a single category, the text within the pair of <a> tags is the category name, which can be acquired by text attribute. The value of 'href' is the category’s URL, which can be extracted by the get() method. We will loop through categories to fetch the name and URL of each category.

# Loop through categories
for category in categories: 
# Get category name by extracting the text part of <a> element
# Strip the spaces before and after the name
category_name = category.find('a').text.strip()
# Get the URL, which leads to the products list page under the category
category_url_relative = category.find('a').get('href')
# Complete category's URL by adding the base URL
category_url = base_url_for_category + category_url_relative
print(f"{category_name}'s URL is: {category_url}")

Code Snippet 7

Did you notice the base_url_for_category? What’s it for? The URL you find and get is "catalogue/category/books/travel_2/index.html", which is a relative URL and can not lead to any effective web page after extracting it. The absolute URL is "https://books.toscrape.com/catalogue/category/books/travel_2/index.html". So here, we complete the link by assigning "https://books.toscrape.com" to base_url_for_category and add the variable before category_url_relative

The partial result of running the for loop is:

Travel's URL is: https://books.toscrape.com/catalogue/category/books/travel_2/index.html
Mystery's URL is: https://books.toscrape.com/catalogue/category/books/mystery_3/index.html
Historical Fiction's URL is: https://books.toscrape.com/catalogue/category/books/historical-fiction_4/index.html

Up to now, you have successfully fetched all the categories with name and the URL to the products list page. We are going to fetch all the items.

STEP 5.3 Fetch All the Items through Searching for HTML, Extracting information, Cleaning and Storing Data

Be prepared by getting some variables ready.

# Identify base URL
base_url_for_category = 'https://books.toscrape.com/'
base_url_for_book = 'https://books.toscrape.com/catalogue'
# Get the date of scraping
scraping_date = datetime.date.today()
# Create a dictionary to convert words to digits
# We will use it when fetching rating
w_to_d = {'One': 1,
'Two': 2,
'Three': 3,
'Four': 4,
'Five': 5
}
# Create a list to store all the extracted items
books_all = []

Code Snippet 8

Can you recall the workflow and the functions we mentioned above (Code Snippet 5)? Please look back if you can’t.

Let’s start by building the last function, i.e.,  fetch_all_books() to integrate the workflow. To run this function, we need to input the books_soup created before, which contains all the HTML information in text.

def fetch_all_books(soup):
    # Fetch all the books information
    # Return books_all, a list of dictionary that contains all the extracted data
    
    # Find all the categories by running find_categories() function
    categories = find_categories(soup)
    # Loop through categories
    for category in categories:
        # Fetch product by category
        # Within the fetch_books_by_category function, we will scrape products page by page        
        category_name = category.find('a').text.strip()
        fetch_books_by_category(category_name, category)
        
    return books_all    

Code Snippet 9

It’s easy to find categories as we have done before.

def find_categories(soup):
# Find all the categories

categories = books_soup.find('ul', {'class': 'nav nav-list'}).find('li').find('ul').find_all('li')

return categories

Code Snippet 10

Next, we will fetch books under a single category page by page. Sometimes, it’s a bit tricky to get all the books under a category since there may be one or more pages. We can scrape the next page only if it exists! 

But how can we figure out if the current page is the last page or not? Inspect the next button!

Inspect Web

We take the Fiction category as an example. There are a total of 4 pages.  When you inspect the next button on page 1 through page 3, you can find the next page’s URL, which is under <li class="next">.

<li class="next">
<a href="page-4.html" style="">next</a>
<li>

But for the last page, i.e., page 4, there is no next button. So we can let our web scraper try to find <li class="next"> and <a>. If successful, go to the next page and then fetch products; if failed, break the fetching work and then switch to the next category.

One thing we want to highlight here is that the following three lines of code will be written for every time when you intend to retrieve a web page. It might be the website’s home page, the first/last/any products list page under a category, or a single product page. You will identify the URL of your target web page, create a response object to get the page’s HTML content, and create a Beautiful Soup object to parse the HTML text, repeatedly.

# Identify the target website's address
web_page_url = 'https...'
# Create a response object to get the web page's HTML content
get_url = requests.get(web_page_url)
# Create a beautiful soup object to parse HTML text with the help of the html.parser
soup = BeautifulSoup(get_url.text, 'html.parser')

Now, we can create a function (Code Snippet 11) to fetch all the books under a category, page by page. Notes: This function runs within the for loop under the fetch_all_books() function (Code Snippet 9).

def fetch_books_by_category(category_name, category):
# Fetch books by category
# Scrape all the books listed on one page
# Go to next page if current page is not the last page
# Break the loop at the last page

# Get category URL, i.e., the link to the first page of books under the category
books_page_url = base_url_for_category + category.find('a').get('href')
# Scape books page by page only when the next page is available
while True:
# Retrieve the products list page's HTML
get_current_page = requests.get(books_page_url)
# Create a beautiful soup object for the current page
current_page_soup = soup(get_current_page.text, 'html.parser')
# Run fetch_current_page_books function to get all the products listed on the current page
fetch_current_page_books(category_name, current_page_soup)
# Search for the next page's URL
# Get the next page's URL if the current page is not the last page
try:
find_next_page_url = current_page_soup.find('li', {'class':'next'}).find('a').get('href') 
# Find the index of the last '/'
index = books_page_url.rfind('/')
# Skip the string after the last '/' and add the next page url
books_page_url = books_page_url[:index+1].strip() + find_next_page_url 
except:
break

Code Snippet 11

When the web scraper reaches the last page, it’s impossible to find <li class="next">. As a result, current_page_soup.find('li', {'class':'next'}).find('a') will raise an AttributeError, that is , ‘NoneType’ object has no attribute ‘find’. 

We definitely don’t want the web scraper crash just because of an avoidable error. So here, Exception Handling is very important. We can skip the error raised from running the try block, and do the things defined under except. In the case above, once an unexcited element raises an error, the web scraper will break the while loop. You can actually do anything in the except block as you need. 

Besides, within the try block, we acquire the next page’s URL by modifying the current page’s URL. You can get some hints by comparing them. For example: 

The URL of a landing page, i.e., the first page under a category,  is

https://books.toscrape.com/catalogue/category/books/fiction_10/index.html

The second page’s absolute URL is

https://books.toscrape.com/catalogue/category/books/fiction_10/page-2.html

The second page’s relative URL, i.e., the incomplete one, is

page-2.html

What we do above is to change the last part of the URL. 

The biggest section of the scraper is to build a function to fetch one-page books (Code Snippet 12). We can get the most information from a book through it. In this function, every single piece of code has been discussed before.

def fetch_current_page_books(category_name, current_page_soup):
# Fetch all the books listed on the current page
# Build a dictionary to store extracted data 
# Append book information to the books_all list

# Find all products listed on the current page
# Here, we don’t need to identify the class name of <li> (Do you know why?)
current_page_books = current_page_soup.find('ol', {'class':'row'}).find_all('li')

# Loop through the products 
for book in current_page_books: 
# Extract book info of interest

# Get book title
# Replace get('title') with text to see what will happen
title = book.find('h3').find('a').get('title').strip()

# Get book price
price = book.find('p', {'class':'price_color'}).text.strip()

# Get in stock info
instock = book.find('p', {'class': 'instock availability'}).text.strip()

# Get rating
# We will get a list, ['star-rating', 'Two'], by using get('class') only, so here, we slice the list to extract rating only
rating_in_words = book.find('p').get('class')[1]
rating = w_to_d[rating_in_words]

# Get link 
link = book.find('h3').find('a').get('href').strip()
link = base_url_for_book + link.replace('../../..', '')

# Get more info about a book by running fetch_more_info function
product_info = fetch_more_info(link)

# Create a book dictionary to store the book’s info
book = {
'scraping_date': scraping_date, 
'book_title': title, 
'category': category_name, 
'price': price,
'rating': rating,
'instock': instock,
# Suppose we’re only interested in availability and UPC only
'availability': product_info['Availability'],
'UPC': product_info['UPC'],
'link':link 
}
# Append book dictionary to books_all list
books_all.append(book)

Code Snippet 12

We do some data cleaning work in this function. For example,  we use strip() method to remove spaces at the beginning and at the end of a string; the w_to_d dictionary created before can help us convert rating from words, i.e., One, Two, Three, Four and Five, to digits, i.e., 1, 2, 3, 4, and 5. These simple data cleaning work will make your scraped data ready to use. 

You may be wondering why we did not process the price by removing the currency symbol £? Actually, we can do it, and it’s better to do so. But before that, we need to confirm all the prices are in GBP.  If that is not the case, the extracted digit-only price will be misleading.

Last but not least, the fetch_more_info() function. The main purpose of this function is to fetch the product information table on the single product page. It happens a lot for a web scraper to go in depth to fetch more information as needed.

def fetch_more_info(link):
# Go to the single product page to get more info 

# Get url of the web page
get_url = requests.get(link)
# Create a beautiful soup object for the book
book_soup = soup(get_url.text, 'html.parser')

# Find the product information table
book_table = book_soup.find('table',{'class':'table table-striped'}).find_all('tr')
# Build a dictionary to store the information in the table
product_info = {}
# Loop through the table 
for info in book_table:

# Use header cell as key
key = info.find('th').text.strip()
# Use cell as value
value = info.find('td').text.strip() 
product_info[key] = value

# Extract number from availability using Regular Expressions
text = product_info['Availability']
# reassign the number to availability
product_info['Availability'] = re.findall(r'(\d+)', text)[0]

return product_info

We use Regular Expressions to extract the number of available books.  Regular Expressions can be used to find a string with the particular pattern from text. Here, r'(\d+)' represents one or more digits. We can find 12 from ‘In stock (12 available)’, and we also can find 0 from ‘Out of stock (0 available)’. 

We have got all we want by running the five functions. And now, we can save the result with joy.

STEP 6: Save File

At the end of scraping, all the extracted data is stored in a list of dictionaries, which can be saved as a csv, json, or excel file as you prefer (Code Snippet 13). It’s better to remove the duplicates at the same time.

def output(books_list):
# Convert the list with scraped data to a data frame, drop the duplicates, and save the output as a csv file

# Convert the list to a data frame, drop the duplicates
books_df = pd.DataFrame(books_list).drop_duplicates()
print(f'There are totally {len(books_df)} books.')
# Save the output as a csv file
books_df.to_csv(f'books_scraper_{scraping_date}.csv', index = False)

Code Snippet 13

Take a look at the final result:

Result

Summary:

In this tutorial, we built a web scraper in Python using Beautiful Soup and requests. The steps are as follows:

  • Step 1: Identify your goal and explore the website of interest
  • Step 2: Inspect web page’s HTML
  • Step 3: Install and import libraries
  • Step 4: Retrieve website and parse HTML
  • Step 5: Extract, clean, and store data
    • Step 5.1: Get ready by answering questions
    • Step 5.2: Start by fetching a single variable
    • Step 5.3: Fetch all the items through searching for HTML, extracting information, cleaning and storing data
  • Step 6: Save File

In addition to the pipeline and fundamental skills, we also emphasize the way of thinking, that is:

  • Ask yourself a bunch of questions before scraping

Some tips for web scraping:

  • Real-world websites change constantly. After practicing your scraping skill on a durable demo website, it’s time to go to real-world websites. Please keep in mind, it’s normal that you find your web scraper worked well yesterday but crashed today because of the website’s changes. Fortunately, most of the changes are small and you just need to modify your scraper a little.
  • Each website is unique. If you want to scrape products from different retail websites, you have to build a customized web scraper for each retailer. 
  • Modern websites are usually dynamic websites, which are customized to the clients’ browsers. When interactivate with such websites, web scrapers also need to be smart enough. For example, a Selenium-equipped web scraper can automatically accept/decline cookies, maximize or scroll down the screen to display all the content, and so on. 

Learn by doing and good luck! 

ACF

ID58157
keyfield_623341deec7d0
labelPhoto
namephoto
prefixacf
typeimage
valueArray
menu_order1
parent58155
wrapperArray
return_formatarray
preview_sizethumbnail
libraryall
_namephoto
_valid1

Module Settings

custom_identifierImage
acf_namefield_623341deec7d0
is_author_acf_fieldoff
post_object_acf_namenone
author_field_typeauthor_post
linked_user_acf_namenone
type_taxonomy_acf_namenone
acf_tagdiv
show_labeloff
label_seperator:
visibilityon
empty_value_optionhide_module
use_iconoff
icon_color#7EBEC5
use_circleoff
circle_color#7EBEC5
use_circle_borderoff
circle_border_color#7EBEC5
use_icon_font_sizeoff
icon_image_placementleft
image_mobile_stackingcolumn
return_formatarray
image_link_urloff
image_link_url_acf_namenone
checkbox_stylearray
checkbox_radio_returnlabel
checkbox_radio_value_typeoff
checkbox_radio_linkoff
link_buttonoff
email_subjectnone
email_body_afternone
add_css_classoff
add_css_loop_layoutoff
add_css_class_selectorbody
link_new_tabon
link_name_acfoff
link_name_acf_namenone
url_link_iconoff
image_sizefull
true_false_conditionoff
true_false_condition_css_selector.et_pb_button
true_false_text_trueTrue
true_false_text_falseFalse
is_audiooff
is_videooff
video_loopon
video_autoplayon
is_oembed_videooff
defer_videooff
defer_video_iconI||divi||400
video_icon_font_sizeoff
pretify_textoff
pretify_seperator,
number_decimal.
show_value_if_zerooff
text_imageoff
is_options_pageoff
is_repeater_loop_layoutoff
linked_post_stylecustom
link_post_seperator,
link_to_post_objecton
loop_layoutnone
columns4
columns_tablet2
columns_mobile1
repeater_dyn_btn_acfnone
button_alignmentcenter
text_before_positionsame_line
label_positionsame_line
vertical_alignmentmiddle
image_max_width_last_editedon|phone
admin_labelPhoto
_builder_version4.16
_module_presetdefault
title_css_font_size14px
title_css_letter_spacing0px
title_css_line_height1em
acf_label_css_font_size14px
acf_label_css_letter_spacing0px
acf_label_css_line_height1em
label_css_letter_spacing0px
text_before_css_font_size14px
text_before_css_letter_spacing0px
text_before_css_line_height1em
seperator_font_size14px
seperator_letter_spacing0px
seperator_line_height1em
relational_field_item_font_size14px
relational_field_item_letter_spacing0px
relational_field_item_line_height1em
background_enable_coloron
use_background_color_gradientoff
background_color_gradient_repeatoff
background_color_gradient_typelinear
background_color_gradient_direction180deg
background_color_gradient_direction_radialcenter
background_color_gradient_stops#2b87da 0%|#29c4a9 100%
background_color_gradient_unit%
background_color_gradient_overlays_imageoff
background_color_gradient_start#2b87da
background_color_gradient_start_position0%
background_color_gradient_end#29c4a9
background_color_gradient_end_position100%
background_enable_imageon
parallaxoff
parallax_methodon
background_sizecover
background_image_widthauto
background_image_heightauto
background_positioncenter
background_horizontal_offset0
background_vertical_offset0
background_repeatno-repeat
background_blendnormal
background_enable_video_mp4on
background_enable_video_webmon
allow_player_pauseoff
background_video_pause_outside_viewporton
background_enable_pattern_styleoff
background_pattern_stylepolka-dots
background_pattern_colorrgba(0,0,0,0.2)
background_pattern_sizeinitial
background_pattern_widthauto
background_pattern_heightauto
background_pattern_repeat_origintop_left
background_pattern_horizontal_offset0
background_pattern_vertical_offset0
background_pattern_repeatrepeat
background_pattern_blend_modenormal
background_enable_mask_styleoff
background_mask_stylelayer-blob
background_mask_color#ffffff
background_mask_aspect_ratiolandscape
background_mask_sizestretch
background_mask_widthauto
background_mask_heightauto
background_mask_positioncenter
background_mask_horizontal_offset0
background_mask_vertical_offset0
background_mask_blend_modenormal
custom_buttonoff
button_text_size20
button_bg_use_color_gradientoff
button_bg_color_gradient_repeatoff
button_bg_color_gradient_typelinear
button_bg_color_gradient_direction180deg
button_bg_color_gradient_direction_radialcenter
button_bg_color_gradient_stops#2b87da 0%|#29c4a9 100%
button_bg_color_gradient_unit%
button_bg_color_gradient_overlays_imageoff
button_bg_color_gradient_start#2b87da
button_bg_color_gradient_start_position0%
button_bg_color_gradient_end#29c4a9
button_bg_color_gradient_end_position100%
button_bg_enable_imageon
button_bg_parallaxoff
button_bg_parallax_methodon
button_bg_sizecover
button_bg_image_widthauto
button_bg_image_heightauto
button_bg_positioncenter
button_bg_horizontal_offset0
button_bg_vertical_offset0
button_bg_repeatno-repeat
button_bg_blendnormal
button_bg_enable_video_mp4on
button_bg_enable_video_webmon
button_bg_allow_player_pauseoff
button_bg_video_pause_outside_viewporton
button_use_iconon
button_icon_placementright
button_on_hoveron
positioningnone
position_origin_atop_left
position_origin_ftop_left
position_origin_rtop_left
width100%
max_widthnone
max_width_tablet25%
max_width_phone25%
max_width_last_editedon|tablet
module_alignmentcenter
min_heightauto
heightauto
max_heightnone
custom_margin_tablet||0px||false|false
custom_margin_phone||0px||false|false
custom_margin_last_editedon|phone
filter_hue_rotate0deg
filter_saturate100%
filter_brightness100%
filter_contrast100%
filter_invert0%
filter_sepia0%
filter_opacity100%
filter_blur0px
mix_blend_modenormal
animation_stylenone
animation_directioncenter
animation_duration1000ms
animation_delay0ms
animation_intensity_slide50%
animation_intensity_zoom50%
animation_intensity_flip50%
animation_intensity_fold50%
animation_intensity_roll50%
animation_starting_opacity0%
animation_speed_curveease-in-out
animation_repeatonce
hover_transition_duration300ms
hover_transition_delay0ms
hover_transition_speed_curveease
link_option_url_new_windowoff
sticky_positionnone
sticky_offset_top0px
sticky_offset_bottom0px
sticky_limit_topnone
sticky_limit_bottomnone
sticky_offset_surroundingon
sticky_transitionon
motion_trigger_startmiddle
hover_enabled0
title_css_text_shadow_stylenone
title_css_text_shadow_horizontal_length0em
title_css_text_shadow_vertical_length0em
title_css_text_shadow_blur_strength0em
title_css_text_shadow_colorrgba(0,0,0,0.4)
acf_label_css_text_shadow_stylenone
acf_label_css_text_shadow_horizontal_length0em
acf_label_css_text_shadow_vertical_length0em
acf_label_css_text_shadow_blur_strength0em
acf_label_css_text_shadow_colorrgba(0,0,0,0.4)
label_css_text_shadow_stylenone
label_css_text_shadow_horizontal_length0em
label_css_text_shadow_vertical_length0em
label_css_text_shadow_blur_strength0em
label_css_text_shadow_colorrgba(0,0,0,0.4)
text_before_css_text_shadow_stylenone
text_before_css_text_shadow_horizontal_length0em
text_before_css_text_shadow_vertical_length0em
text_before_css_text_shadow_blur_strength0em
text_before_css_text_shadow_colorrgba(0,0,0,0.4)
seperator_text_shadow_stylenone
seperator_text_shadow_horizontal_length0em
seperator_text_shadow_vertical_length0em
seperator_text_shadow_blur_strength0em
seperator_text_shadow_colorrgba(0,0,0,0.4)
relational_field_item_text_shadow_stylenone
relational_field_item_text_shadow_horizontal_length0em
relational_field_item_text_shadow_vertical_length0em
relational_field_item_text_shadow_blur_strength0em
relational_field_item_text_shadow_colorrgba(0,0,0,0.4)
border_radiion|100%|100%|100%|100%
border_radii_tableton||||
border_radii_phoneon|100%|100%|100%|100%
border_radii_last_editedon|phone
button_text_shadow_stylenone
button_text_shadow_horizontal_length0em
button_text_shadow_vertical_length0em
button_text_shadow_blur_strength0em
button_text_shadow_colorrgba(0,0,0,0.4)
box_shadow_stylenone
box_shadow_colorrgba(0,0,0,0.3)
box_shadow_positionouter
box_shadow_style_buttonnone
box_shadow_color_buttonrgba(0,0,0,0.3)
box_shadow_position_buttonouter
text_shadow_stylenone
text_shadow_horizontal_length0em
text_shadow_vertical_length0em
text_shadow_blur_strength0em
text_shadow_colorrgba(0,0,0,0.4)
disabledoff
global_colors_info{}
Logo

Execution time: 0.0054 seconds

ACF

ID58156
keyfield_623341caec7cf
labelName
nameblog_author_name
prefixacf
typetext
valueOmdena
parent58155
wrapperArray
_nameblog_author_name
_valid1

Module Settings

custom_identifierACF Item
acf_namefield_623341caec7cf
is_author_acf_fieldoff
post_object_acf_namenone
author_field_typeauthor_post
linked_user_acf_namenone
type_taxonomy_acf_namenone
acf_tagp
show_labeloff
label_seperator:
visibilityon
empty_value_optionhide_module
use_iconoff
icon_color#7EBEC5
use_circleoff
circle_color#7EBEC5
use_circle_borderoff
circle_border_color#7EBEC5
use_icon_font_sizeoff
icon_image_placementleft
image_mobile_stackinginitial
return_formatarray
image_link_urloff
image_link_url_acf_namenone
checkbox_stylearray
checkbox_radio_returnlabel
checkbox_radio_value_typeoff
checkbox_radio_linkoff
link_buttonoff
email_subjectnone
email_body_afternone
add_css_classoff
add_css_loop_layoutoff
add_css_class_selectorbody
link_new_tabon
link_name_acfoff
link_name_acf_namenone
url_link_iconoff
image_sizefull
true_false_conditionoff
true_false_condition_css_selector.et_pb_button
true_false_text_trueTrue
true_false_text_falseFalse
is_audiooff
is_videooff
video_loopon
video_autoplayon
is_oembed_videooff
defer_videooff
defer_video_iconI||divi||400
video_icon_font_sizeoff
pretify_textoff
pretify_seperator,
number_decimal.
show_value_if_zerooff
text_imageoff
is_options_pageoff
is_repeater_loop_layoutoff
linked_post_stylecustom
link_post_seperator,
link_to_post_objecton
loop_layoutnone
columns4
columns_tablet2
columns_mobile1
repeater_dyn_btn_acfnone
text_before_positionsame_line
label_positionsame_line
vertical_alignmentmiddle
admin_labelName
_builder_version4.21.0
_module_presetdefault
title_css_text_alignleft
title_css_font_size14px
title_css_letter_spacing0px
title_css_line_height1em
acf_label_css_text_alignleft
acf_label_css_font_size14px
acf_label_css_letter_spacing0px
acf_label_css_line_height1em
label_css_fontRoboto|700|||||||
label_css_text_alignleft
label_css_letter_spacing0px
text_before_css_font_size14px
text_before_css_letter_spacing0px
text_before_css_line_height1em
seperator_font_size14px
seperator_letter_spacing0px
seperator_line_height1em
relational_field_item_font_size14px
relational_field_item_letter_spacing0px
relational_field_item_line_height1em
background_enable_coloron
use_background_color_gradientoff
background_color_gradient_repeatoff
background_color_gradient_typelinear
background_color_gradient_direction180deg
background_color_gradient_direction_radialcenter
background_color_gradient_stops#2b87da 0%|#29c4a9 100%
background_color_gradient_unit%
background_color_gradient_overlays_imageoff
background_color_gradient_start#2b87da
background_color_gradient_start_position0%
background_color_gradient_end#29c4a9
background_color_gradient_end_position100%
background_enable_imageon
parallaxoff
parallax_methodon
background_sizecover
background_image_widthauto
background_image_heightauto
background_positioncenter
background_horizontal_offset0
background_vertical_offset0
background_repeatno-repeat
background_blendnormal
background_enable_video_mp4on
background_enable_video_webmon
allow_player_pauseoff
background_video_pause_outside_viewporton
background_enable_pattern_styleoff
background_pattern_stylepolka-dots
background_pattern_colorrgba(0,0,0,0.2)
background_pattern_sizeinitial
background_pattern_widthauto
background_pattern_heightauto
background_pattern_repeat_origintop_left
background_pattern_horizontal_offset0
background_pattern_vertical_offset0
background_pattern_repeatrepeat
background_pattern_blend_modenormal
background_enable_mask_styleoff
background_mask_stylelayer-blob
background_mask_color#ffffff
background_mask_aspect_ratiolandscape
background_mask_sizestretch
background_mask_widthauto
background_mask_heightauto
background_mask_positioncenter
background_mask_horizontal_offset0
background_mask_vertical_offset0
background_mask_blend_modenormal
custom_buttonoff
button_text_size20
button_bg_use_color_gradientoff
button_bg_color_gradient_repeatoff
button_bg_color_gradient_typelinear
button_bg_color_gradient_direction180deg
button_bg_color_gradient_direction_radialcenter
button_bg_color_gradient_stops#2b87da 0%|#29c4a9 100%
button_bg_color_gradient_unit%
button_bg_color_gradient_overlays_imageoff
button_bg_color_gradient_start#2b87da
button_bg_color_gradient_start_position0%
button_bg_color_gradient_end#29c4a9
button_bg_color_gradient_end_position100%
button_bg_enable_imageon
button_bg_parallaxoff
button_bg_parallax_methodon
button_bg_sizecover
button_bg_image_widthauto
button_bg_image_heightauto
button_bg_positioncenter
button_bg_horizontal_offset0
button_bg_vertical_offset0
button_bg_repeatno-repeat
button_bg_blendnormal
button_bg_enable_video_mp4on
button_bg_enable_video_webmon
button_bg_allow_player_pauseoff
button_bg_video_pause_outside_viewporton
button_use_iconon
button_icon_placementright
button_on_hoveron
positioningnone
position_origin_atop_left
position_origin_ftop_left
position_origin_rtop_left
text_orientationleft
widthauto
max_widthnone
module_alignmentleft
min_heightauto
heightauto
max_heightnone
custom_margin_tablet||10px||false|false
custom_margin_phone||10px||false|false
custom_margin_last_editedon|tablet
custom_padding5px||||false|false
filter_hue_rotate0deg
filter_saturate100%
filter_brightness100%
filter_contrast100%
filter_invert0%
filter_sepia0%
filter_opacity100%
filter_blur0px
mix_blend_modenormal
animation_stylenone
animation_directioncenter
animation_duration1000ms
animation_delay0ms
animation_intensity_slide50%
animation_intensity_zoom50%
animation_intensity_flip50%
animation_intensity_fold50%
animation_intensity_roll50%
animation_starting_opacity0%
animation_speed_curveease-in-out
animation_repeatonce
hover_transition_duration300ms
hover_transition_delay0ms
hover_transition_speed_curveease
link_option_url_new_windowoff
sticky_positionnone
sticky_offset_top0px
sticky_offset_bottom0px
sticky_limit_topnone
sticky_limit_bottomnone
sticky_offset_surroundingon
sticky_transitionon
motion_trigger_startmiddle
hover_enabled0
title_css_text_align_tabletcenter
title_css_text_align_phonecenter
title_css_text_align_last_editedon|phone
acf_label_css_text_align_tabletcenter
acf_label_css_text_align_phonecenter
acf_label_css_text_align_last_editedon|phone
label_css_text_align_tabletcenter
label_css_text_align_phonecenter
label_css_text_align_last_editedon|desktop
text_orientation_tabletcenter
text_orientation_phonecenter
text_orientation_last_editedon|phone
module_alignment_tabletcenter
module_alignment_phonecenter
module_alignment_last_editedon|desktop
title_css_text_shadow_stylenone
title_css_text_shadow_horizontal_length0em
title_css_text_shadow_vertical_length0em
title_css_text_shadow_blur_strength0em
title_css_text_shadow_colorrgba(0,0,0,0.4)
acf_label_css_text_shadow_stylenone
acf_label_css_text_shadow_horizontal_length0em
acf_label_css_text_shadow_vertical_length0em
acf_label_css_text_shadow_blur_strength0em
acf_label_css_text_shadow_colorrgba(0,0,0,0.4)
label_css_text_shadow_stylenone
label_css_text_shadow_horizontal_length0em
label_css_text_shadow_vertical_length0em
label_css_text_shadow_blur_strength0em
label_css_text_shadow_colorrgba(0,0,0,0.4)
text_before_css_text_shadow_stylenone
text_before_css_text_shadow_horizontal_length0em
text_before_css_text_shadow_vertical_length0em
text_before_css_text_shadow_blur_strength0em
text_before_css_text_shadow_colorrgba(0,0,0,0.4)
seperator_text_shadow_stylenone
seperator_text_shadow_horizontal_length0em
seperator_text_shadow_vertical_length0em
seperator_text_shadow_blur_strength0em
seperator_text_shadow_colorrgba(0,0,0,0.4)
relational_field_item_text_shadow_stylenone
relational_field_item_text_shadow_horizontal_length0em
relational_field_item_text_shadow_vertical_length0em
relational_field_item_text_shadow_blur_strength0em
relational_field_item_text_shadow_colorrgba(0,0,0,0.4)
button_text_shadow_stylenone
button_text_shadow_horizontal_length0em
button_text_shadow_vertical_length0em
button_text_shadow_blur_strength0em
button_text_shadow_colorrgba(0,0,0,0.4)
box_shadow_stylenone
box_shadow_colorrgba(0,0,0,0.3)
box_shadow_positionouter
box_shadow_style_buttonnone
box_shadow_color_buttonrgba(0,0,0,0.3)
box_shadow_position_buttonouter
text_shadow_stylenone
text_shadow_horizontal_length0em
text_shadow_vertical_length0em
text_shadow_blur_strength0em
text_shadow_colorrgba(0,0,0,0.4)
disabledoff
global_colors_info{}

Omdena

Execution time: 0.0012 seconds

Execution time: 0.0002 seconds

ACF

ID58158
keyfield_6233420dec7d2
labelLinkedin URL
nameblog_linkedin_url
prefixacf
typeurl
valuehttp://www.linkedin.com/company/omdena
menu_order2
parent58155
wrapperArray
_nameblog_linkedin_url
_valid1

Module Settings

custom_identifierLinkedin
acf_namefield_6233420dec7d2
is_author_acf_fieldoff
post_object_acf_namenone
author_field_typeauthor_post
linked_user_acf_namenone
type_taxonomy_acf_namenone
acf_tagspan
show_labeloff
label_seperator:
visibilityon
empty_value_optionhide_module
use_iconon
font_icon||divi||400
icon_color#0d8de2
use_circleoff
circle_color#7EBEC5
use_circle_borderoff
circle_border_color#7EBEC5
use_icon_font_sizeoff
icon_image_placementleft
image_mobile_stackinginitial
return_formaturl
image_link_urloff
image_link_url_acf_namenone
checkbox_stylearray
checkbox_radio_returnlabel
checkbox_radio_value_typeoff
checkbox_radio_linkoff
link_buttonoff
email_subjectnone
email_body_afternone
add_css_classoff
add_css_loop_layoutoff
add_css_class_selectorbody
link_new_tabon
link_name_acfoff
link_name_acf_namenone
url_link_iconon
image_sizefull
true_false_conditionoff
true_false_condition_css_selector.et_pb_button
true_false_text_trueTrue
true_false_text_falseFalse
is_audiooff
is_videooff
video_loopon
video_autoplayon
is_oembed_videooff
defer_videooff
defer_video_iconI||divi||400
video_icon_font_sizeoff
pretify_textoff
pretify_seperator,
number_decimal.
show_value_if_zerooff
text_imageoff
is_options_pageoff
is_repeater_loop_layoutoff
linked_post_stylecustom
link_post_seperator,
link_to_post_objecton
loop_layoutnone
columns4
columns_tablet2
columns_mobile1
repeater_dyn_btn_acfnone
button_alignmentleft
text_before_positionsame_line
label_positionsame_line
vertical_alignmentmiddle
admin_labelLinkedin
module_classlinkedin
_builder_version4.19.5
_module_presetdefault
title_css_text_alignleft
title_css_font_size14px
title_css_letter_spacing0px
title_css_line_height1em
acf_label_css_text_alignleft
acf_label_css_font_size14px
acf_label_css_letter_spacing0px
acf_label_css_line_height1em
label_css_text_alignleft
label_css_letter_spacing0px
text_before_css_font_size14px
text_before_css_letter_spacing0px
text_before_css_line_height1em
seperator_text_alignleft
seperator_font_size14px
seperator_letter_spacing0px
seperator_line_height1em
relational_field_item_text_alignleft
relational_field_item_font_size14px
relational_field_item_letter_spacing0px
relational_field_item_line_height1em
background_enable_coloron
use_background_color_gradientoff
background_color_gradient_repeatoff
background_color_gradient_typelinear
background_color_gradient_direction180deg
background_color_gradient_direction_radialcenter
background_color_gradient_stops#2b87da 0%|#29c4a9 100%
background_color_gradient_unit%
background_color_gradient_overlays_imageoff
background_color_gradient_start#2b87da
background_color_gradient_start_position0%
background_color_gradient_end#29c4a9
background_color_gradient_end_position100%
background_enable_imageon
parallaxoff
parallax_methodon
background_sizecover
background_image_widthauto
background_image_heightauto
background_positioncenter
background_horizontal_offset0
background_vertical_offset0
background_repeatno-repeat
background_blendnormal
background_enable_video_mp4on
background_enable_video_webmon
allow_player_pauseoff
background_video_pause_outside_viewporton
background_enable_pattern_styleoff
background_pattern_stylepolka-dots
background_pattern_colorrgba(0,0,0,0.2)
background_pattern_sizeinitial
background_pattern_widthauto
background_pattern_heightauto
background_pattern_repeat_origintop_left
background_pattern_horizontal_offset0
background_pattern_vertical_offset0
background_pattern_repeatrepeat
background_pattern_blend_modenormal
background_enable_mask_styleoff
background_mask_stylelayer-blob
background_mask_color#ffffff
background_mask_aspect_ratiolandscape
background_mask_sizestretch
background_mask_widthauto
background_mask_heightauto
background_mask_positioncenter
background_mask_horizontal_offset0
background_mask_vertical_offset0
background_mask_blend_modenormal
custom_buttonon
button_text_size14px
button_bg_use_color_gradientoff
button_bg_color_gradient_repeatoff
button_bg_color_gradient_typelinear
button_bg_color_gradient_direction180deg
button_bg_color_gradient_direction_radialcenter
button_bg_color_gradient_stops#2b87da 0%|#29c4a9 100%
button_bg_color_gradient_unit%
button_bg_color_gradient_overlays_imageoff
button_bg_color_gradient_start#2b87da
button_bg_color_gradient_start_position0%
button_bg_color_gradient_end#29c4a9
button_bg_color_gradient_end_position100%
button_bg_enable_imageon
button_bg_parallaxoff
button_bg_parallax_methodon
button_bg_sizecover
button_bg_image_widthauto
button_bg_image_heightauto
button_bg_positioncenter
button_bg_horizontal_offset0
button_bg_vertical_offset0
button_bg_repeatno-repeat
button_bg_blendnormal
button_bg_enable_video_mp4on
button_bg_enable_video_webmon
button_bg_allow_player_pauseoff
button_bg_video_pause_outside_viewporton
button_border_width0px
button_use_iconoff
button_icon_placementright
button_on_hoveron
button_custom_padding0px|0px|0px|0px|true|true
positioningnone
position_origin_atop_left
position_origin_ftop_left
position_origin_rtop_left
text_orientationleft
widthauto
max_widthnone
module_alignmentleft
min_heightauto
heightauto
max_heightnone
custom_margin_tablet||10px||false|false
custom_margin_phone||10px||false|false
custom_margin_last_editedon|tablet
filter_hue_rotate0deg
filter_saturate100%
filter_brightness100%
filter_contrast100%
filter_invert0%
filter_sepia0%
filter_opacity100%
filter_blur0px
mix_blend_modenormal
animation_stylenone
animation_directioncenter
animation_duration1000ms
animation_delay0ms
animation_intensity_slide50%
animation_intensity_zoom50%
animation_intensity_flip50%
animation_intensity_fold50%
animation_intensity_roll50%
animation_starting_opacity0%
animation_speed_curveease-in-out
animation_repeatonce
hover_transition_duration300ms
hover_transition_delay0ms
hover_transition_speed_curveease
link_option_url_new_windowoff
sticky_positionnone
sticky_offset_top0px
sticky_offset_bottom0px
sticky_limit_topnone
sticky_limit_bottomnone
sticky_offset_surroundingon
sticky_transitionon
motion_trigger_startmiddle
hover_enabled0
title_css_text_align_tabletcenter
title_css_text_align_phonecenter
title_css_text_align_last_editedon|phone
acf_label_css_text_align_tabletcenter
acf_label_css_text_align_phonecenter
acf_label_css_text_align_last_editedon|tablet
label_css_text_align_tabletcenter
label_css_text_align_phonecenter
label_css_text_align_last_editedon|desktop
relational_field_item_text_align_tabletcenter
relational_field_item_text_align_phonecenter
relational_field_item_text_align_last_editedon|tablet
text_orientation_tabletcenter
text_orientation_phonecenter
text_orientation_last_editedon|phone
module_alignment_tabletcenter
module_alignment_phonecenter
module_alignment_last_editedon|desktop
custom_css_main_elementdisplay:block;
title_css_text_shadow_stylenone
title_css_text_shadow_horizontal_length0em
title_css_text_shadow_vertical_length0em
title_css_text_shadow_blur_strength0em
title_css_text_shadow_colorrgba(0,0,0,0.4)
acf_label_css_text_shadow_stylenone
acf_label_css_text_shadow_horizontal_length0em
acf_label_css_text_shadow_vertical_length0em
acf_label_css_text_shadow_blur_strength0em
acf_label_css_text_shadow_colorrgba(0,0,0,0.4)
label_css_text_shadow_stylenone
label_css_text_shadow_horizontal_length0em
label_css_text_shadow_vertical_length0em
label_css_text_shadow_blur_strength0em
label_css_text_shadow_colorrgba(0,0,0,0.4)
text_before_css_text_shadow_stylenone
text_before_css_text_shadow_horizontal_length0em
text_before_css_text_shadow_vertical_length0em
text_before_css_text_shadow_blur_strength0em
text_before_css_text_shadow_colorrgba(0,0,0,0.4)
seperator_text_shadow_stylenone
seperator_text_shadow_horizontal_length0em
seperator_text_shadow_vertical_length0em
seperator_text_shadow_blur_strength0em
seperator_text_shadow_colorrgba(0,0,0,0.4)
relational_field_item_text_shadow_stylenone
relational_field_item_text_shadow_horizontal_length0em
relational_field_item_text_shadow_vertical_length0em
relational_field_item_text_shadow_blur_strength0em
relational_field_item_text_shadow_colorrgba(0,0,0,0.4)
button_text_shadow_stylenone
button_text_shadow_horizontal_length0em
button_text_shadow_vertical_length0em
button_text_shadow_blur_strength0em
button_text_shadow_colorrgba(0,0,0,0.4)
box_shadow_stylenone
box_shadow_colorrgba(0,0,0,0.3)
box_shadow_positionouter
box_shadow_style_buttonnone
box_shadow_color_buttonrgba(0,0,0,0.3)
box_shadow_position_buttonouter
text_shadow_stylenone
text_shadow_horizontal_length0em
text_shadow_vertical_length0em
text_shadow_blur_strength0em
text_shadow_colorrgba(0,0,0,0.4)
disabledoff
global_colors_info{}
custom_css_main_element_last_editedon|phone
custom_css_main_element_tabletdisplay:block;
custom_css_main_element_phonedisplay:block;

Execution time: 0.0011 seconds

Vetted Senior AI Talent

Work with our top 2% hidden gems, vetted through over 300 real-world projects.

Top Talent

Leave a comment.
0 Comments
Submit a Comment

Your email address will not be published. Required fields are marked *