Working with ParseHub, BoW, and Td-IDF

  1. First, make sure to download and install ParseHub. We will use this web scraper for this Project.
  2. Open ParseHub, click on "New Project," and use the URL from Minecraft games review result page. On the Landing page, Paste the URL on the field "Enter a website you'd like to extract data from." The page will now be rendered inside the parsehub app. Click on "Start a project from this URL."
  1. Once the site is rendered, click on the reviews, i.e. (parent ) 99 reviews in all. Next, click on the plus + sign beside the select page. Next, choose the select command and click on age. The rest of the ages will be highlighted yellow. Also, click on another age to select the remaining age range on the page. The name you've clicked will become green to indicate that it's been selected.
  2. On the left sidebar, rename your selection to Reviewers_age. You will notice that ParseHub has now extracted the age.
  3. Click the PLUS(+) sign next to the Reviewers_age selection and choose the Relative Select command on the left sidebar.
  4. Using the Relative Select command, click on the first age range (say age 6+) on the page and its reviews. You will see an arrow connecting the two selections.
  1. Repeat steps 3 through 4 to extract any other reviewer's info rating. Make sure to rename your new selections accordingly.

Adding Pagination

For any project, you may want to scrape numerous pages of data. So far, we’ve simply scraped the first page of Minecraft review results. Let’s use ParseHub to browse the next 20 review pages.

  1. On the left sidebar.
  2. Click on the PLUS(+) sign next to the page selection and choose the Select command.
  3. Then select the Next page link at the bottom of the Minecraft reviews page. Rename the selection to next_button.
  4. By default, ParseHub will extract the text and URL from this link, so expand your new next_button selection and remove these two commands.
  5. Now, click on the PLUS(+) sign of your next_button selection and use the Click command.
  6. A pop-up will appear asking if this is a "Next" link. Click Yes and enter the number of pages you'd like to navigate to. In this case, we will scrape 20 additional pages.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ayanlowo Babatunde

Ayanlowo Babatunde

Industrial Engineer with interests in Machine learning/Robotics/IOT