Web scraping involves extracting data from websites and converting it into structured, usable information. Modern websites employ sophisticated bot detection mechanisms that identify automated scraping through unnatural request patterns, nonhuman interaction behaviors, and suspicious client signals. To bypass these protections, developers can leverage specialized scraping infrastructure like Evomi's Scraper API and residential proxy services, which route requests through rotating IP addresses to appear as legitimate human users. This enables successful data extraction from challenging targets like Amazon, where standard automation tools fail. The extracted data can then be stored in databases and displayed through full-stack web applications using technologies like Node.js, React, and MongoDB.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Web Scraping with Python & JavaScript – MERN Stack Full CourseAdded:
Welcome to this comprehensive course on modern web scraping using Python and the MES stack. You will learn to bypass sophisticated bot detection using Evomi's scraper API and scraping browser to extract data from high-value targets like Amazon. The course progresses from simple scripts to building a productionready full stack application featuring Playright, Cheerio, and VIT.
By the end of this course, you'll be able to store, parse, and display real-time data with professional efficiency. Gavin Lawn developed this course. And thanks to Iomi for providing a grant to make this course possible. Hi and welcome. I'm Gavin Lawn. I'm very excited to bring you this course on web scraping, a powerful skill that allows you to automatically extract data from websites and turn it into structured, usable information. In the world of software engineering, we often say that algorithms are like the engine, but data is the fuel. However, as we move deeper into a world dominated by AI, that analogy has evolved. Data isn't just fuel. It is the architectural blueprint that defines what an AI can and cannot do. In today's datadriven world, vast amounts of valuable information are published online every second. Whether you're interested in tracking prices, analyzing trends, gathering research data, or building intelligent applications, web scraping enables you to collect this data efficiently and at scale.
To provide this course with a real world context, we are going to create a full stack web application using the MER stack with NodeJS on the back end, React on the front end, and MongoDB as our data storage facility. This app will be used to scrape and capture data from the Tyobi index and Amazon.
When you run the completed app, you'll see the data scraped, captured, and displayed on your own React front end in real time. It unfortunately is not always an easy task to simply automate the scraping of valuable data from the web. This is because websites that contain publicly available valuable information I would say in most cases will have bot detection mechanisms that can ultimately result in preventing your code from scraping and capturing the data from targeted web pages.
So the question is how can we get around these bot detection mechanisms to capture the desired data from the targeted web pages? Iomone provides an infrastructure that we can leverage remotely from our code to get over the bot detection hurdles that prevent us from capturing the desired data from the web. So in our full stack web application, we are going to leverage tools provided in two of Eomi's subscription plans. The first plan is called the scraper API which in most cases will enable us to get past any bot detection mechanisms and enable us to scrape our targeted websites.
We'll use the scraper API to scrape the Tyobi index. However, when it comes to scraping Amazon, Amazon is notoriously difficult to scrape due to rather formidable bot detection mechanisms. So to overcome this, we use the tools provided in Evomi's core residential plan to scrape Amazon. This plan includes aggressive proxy rotation functionality, which means each request will come from a different residential IP address, which helps us get over Amazon's bot detection hurdles. More on proxy rotation a little bit later. In this course, I use Evomi's scraper API plan. Evomi's core residential plan and Evomi's scraping browser plan. You can sign up to use these plans for a free trial if you wish to follow along with the practical examples provided in this course. So let's look at the target websites we are going to scrape in this course. Let's firstly look at the Tyobi index. The Tyobi index is a monthly ranking of programming languages based on their popularity. These rankings help identify trending languages, declining ones, and long-term shifts in the industry. So, this data could be very useful perhaps in a combination with other related data in providing powerful decision support functionality. You'll learn how you can leverage IO's powerful scraper API to scrape the ranked table from the relevant Tyobi index web page.
So why do we need Evomi's scraper API to scrape a web page? Can't we just make a HTTP request to the target web page, grab the data, and then use JavaScript and CSS selectors to drill into the HTML and grab the relevant data. For very simple web pages, you could absolutely do that. But in most cases, data of value will be protected by bot detection mechanisms designed to stop automation processes, bots scraping data from the relevant web pages. Evomi services provide a sophisticated infrastructure in order to get over those bot detection hurdles. Websites use a mix of technical, behavioral, and policybased protections to limit or control automated scraping.
Broadly, these aren't about stopping bots completely, which is clearly impossible, but about detecting, slowing down, or blocking suspicious activity.
Let's look at a few examples of how your scraping bot can be detected. Unnatural request patterns, speed, and volume.
Humans browse at irregular pace. Bots often don't. Here are some red flags, i.e. telltale signs that a visitor to a website is a bot. Dozens or hundreds of requests per second. Perfectly consistent time intervals between requests and hitting many pages in a short burst. Real users pause, read, scroll, and click unpredictably. Bots tend to be too fast and too consistent.
Nonhuman interaction behavior. Modern sites track how you interact, not just what you request. Here are some red flags. No mouse movement or scrolling.
Instant clicks with no hesitation.
Linear, perfect navigation paths. Humans are messy. We hesitate. Move the mouse unevenly. Scroll back up. Bots often lack these natural patterns.
suspicious or incomplete client signals.
Every browser sends a rich set of identifying data. Bots often get this wrong. Here are some red flags. Missing or inconsistent HTTP headers, user agent that doesn't match actual behavior, no JavaScript execution when the site expects it. A real browser environment is complex and internally consistent.
Bots frequently appear off or incomplete. Identity reuse or instability, IP and fingerprinting.
Websites track whether a visitor looks like a stable real user over time. Here are some red flags. Many requests from the same IP across different accounts.
Rapid switching between IP addresses.
Repeated fingerprints appearing across sessions. Humans typically have stable identities. Bots often show patterns of reuse or constant switching. These are just a few examples. So, you can see there are a lot of telltale signs that a visitor to a website is in fact a bot and not a human. Another common bot detection mechanism is capture. And I'm sure we're all familiar with those rather annoying little tests that pop up from time to time when trying to log into a website. For example, you get a series of images and you have to select a picture that contains an owl or something.
Capture stands for completely automated public touring test to tell computers and humans apart. Captures work by exploiting the gap between human abilities and automated scripts.
So IMI's infrastructure and services help with getting over those bot detection hurdles so that you are able to scrape the data you require. As mentioned earlier, it is far easier to scrape the Tyobi index than Amazon.
Amazon web pages are notoriously difficult to scrape because of more sophisticated or more aggressive bot detection mechanisms.
You'll find you are able to in most cases scrape your targeted web pages like for example the Tyobi index using Evomi's scraper API. But in some cases, for example, if you want to scrape Amazon web pages at scale, you'll need to use aggressive proxy rotation.
Evomi provides the necessary infrastructure for proxy rotation through their core residential plan. So we are going to use Eomi's core residential services to scrape Amazon.
So let's just discuss what exactly proxy rotation is. Proxy rotation is a technique where requests are sent through a changing pool of IP addresses instead of a single one. If for example you're trying to scrape Amazon at scale from one IP address, your scraper is likely to be detected very quickly. But using Iomi's infrastructure through IO's core residential plan, each request is routed through a different proxy server.
So to the website, the traffic appears to come from multiple users or locations.
Throughout this course, you will learn how to navigate and extract data from HTML pages and how to automate the entire process using modern tools and programming techniques.
We'll use CSS selectors and technologies like beautiful soup in our Python code and Cheerio from our JavaScript code to drill down into the raw HTML scraped from the targeted web pages to mine those nuggets of data as it were.
Through these technologies and associated coding techniques, we'll scrape the desired data from our target web pages appropriately. structure the scraped data, capture the data in our own database, and display the data in an aesthetically pleasing style on our React front end. Let me just share a quick anecdote with you. Before I became a professional software developer many years ago, I studied business and marketing. One subject that amused our young early 20s student minds was to do with market research. Specifically, a subject called garbology.
>> Garbology. It literally involved diving into human garbage >> a specimen >> to capture data regarding human behavior. I think garbology is still a thing actually.
>> Garbology.
>> You can still become a garbologist if you want to. But I was thinking how much I prefer capturing data through digital automation where possible.
So no offense to any garbologists or aspiring garbologists or professors of garbology out there, but I prefer cleaner. the cleaner digital approach to capturing my data. But perhaps that's just me. Anyway, I just thought I'd mention garbology.
It still kind of amuses me. I think it's great that in the modern internet AI age, we don't have to dive into dumpsters perhaps as much as we used to.
And think about capturing data at scale.
You'd need an army of garbologists.
You can imagine that would be rather expensive and ultimately impractical.
Anyway, back to the focal point of this course, web scraping. No matter your background, whether you're a beginner or looking to expand your technical toolkit, this course will equip you with practical realworld skills you can immediately apply. Right, let's get started.
In this part of the video, I'll walk you through a series of real world web scraping scenarios to highlight a common challenge developers face. Modern websites actively detecting and blocking automated scraping tools. I have already prepared the code in the form of Python scripts. Later, we are going to create a full stack web application using Node on the back end, React on the front end, and MongoDB as the data storage facility. for scraping and capturing data from the tyobi index and scraping and capturing data from Amazon.
We'll create this full stack web application line by line with comprehensive explanations. This part of the video is not about the coding details but focuses on how using Evomi's proxies and scraping services enables the code to avoid being detected as a bot i.e. an automated scraping process and therefore avoids our code being blocked from scraping the desired data.
So in this part of the course we are going to attempt to scrape booking.com indeed.com and amazon.com. In each case we'll start by using standard playright to scrape the desired data. While playright is a powerful automation tool, you'll see that in many cases it struggles when faced with antibbot protections. These failures will help illustrate the limitations of using default configurations in high friction environments. Our first example focuses on Booking.com. I'll attempt to extract data using standard playright where the process results in a timeout. Although the data exists on the page, it is effectively inaccessible due to blocking mechanisms. Immediately after, I'll attempt the same task using Evomi's scraping browser via a remote connection over CDP and the WSS protocol. This approach successfully bypasses the restrictions, allowing the data to be retrieved and displayed directly in the VS Code terminal. Next, we move on to indeed.com.
Using standard playright, the request results in an HTTP 403 forbidden response, an explicit block or hard block indicating that access has been denied. I'll then switch to Iomi's scraper API to scrape and output the relevant job listings where the same request returns a HTTP 200 status. This successful response enables us to scrape and output the relevant job listing data without interruption.
Finally, we'll attempt to scrape a web page pertaining to a search URL on Amazon. A standard playright attempt leads to the request rejected due to antibbot defenses. In contrast, when using Iomi's residential proxy service, the request succeeds with an HTTP200 response.
This allows us to extract the desired data and display it in the terminal.
Across these examples, the goal is to demonstrate not just that failures occur, but why they occur and how leveraging specialized scraping infrastructure like Evomi services can overcome these barriers.
By the end of this demonstration, you'll have a clearer understanding of how proxy strategies and remote browser execution can dramatically improve scraping success rates in restricted environments. Right, let's get into it.
Okay, so I'm firstly going to create a local directory to house files for my Python project.
So, I'm going to go into my development folder here, and I'm just going to create a folder called Python Scraper Solutions.
Okay.
And I'm going to invoke an instance of Visual Studio.
And I'm going to open that folder, Python Scraper Solutions.
Select folder.
Great. And we're now ready to begin. So the first thing I want to do is invoke the terminal window. So we just go control then tilda character to do that.
And firstly, I just want to see what version of Python I'm running. So, python d- version is the command we want to run to see what version of Python we're running. And I believe that's the latest stable release as of the time of recording this video.
So, that's great.
So, if you haven't got the latest stable release and you want to install it, you can do that by going to www.python.org/d org/d downloads and you can then download and install the latest release, the latest stable release of Python.
Great. And now we want to create our virtual environment. A virtual environment is an isolated space with its own Python interpreter and installed packages. The isolation solves a few very real problems. It basically keeps your Python projects from stepping on each other as it were. Okay, so we're isolating our Python projects from one another by including a virtual environment. Great. And then the next command we want to run to create our virtual environment is python dash mv and then dot venv like this.
Great. You can see here it's created this these folders here and this uh denotes our virtual environment. And now we want to activate our virtual environment. So we just need to run this file. So to do that on a Windows machine you would type the following. So dot venv that's this folder here and then forward slashcript.
So, we're targeting the activate file here. Then forward slash activivate. And that should activate our virtual environment. Let's press the enter key.
And there we go. Now we're in the virtual environment space. And then we want to make sure that we select the appropriate Python interpreter. Press control shiftp to invoke the command pallet.
And then we want to select this option from the dropdown which is python colon select interpreter. And you can see here our virtual environment. We want to select this option here.
Slash.env/scripts back/python.exe.
And you can see it's recommended here.
So let's select that option.
Excellent.
So the next thing I want to do is install the relevant packages that we're going to use. for example, for uh making HTTP requests to an endpoint, a remote endpoint to parse the various HTML that we get back from websites.
We're going to use a package called beautiful soup for this, which will target various CSS selectors so that we can target the desired data we want to extract from the raw HTML. And the other one is Python.v.
And this is so that we can read av file so that we can read certain values from AENV file. So we don't have to include the actual values in our code. We can read them from an environment variable in AENV file. And then we of course want to install playright and playright is what we're going to use to access the relevant data within target web pages.
So let's type the relevant command to install these packages and then we'll go through each of the packages.
in a little bit more detail. So, pip install requests.
Let's make myself a bit more room here.
Request space.
Beautiful soup for space python dash.
Nenv and then space playright.
And before I press the enter key to install these packages, let's just go through each of them. So requests is a simple HTTP library for sending web requests, get post, etc. You can use it to fetch data from APIs or download web pages without dealing with low-level networking. And then we have beautiful soup. Beautiful soup is a library for parsing and navigating HTML and XML. It is commonly used with requests to extract specific data like titles, links, tables, etc. from web pages. So in our Python code, this is going to be very instrumental in getting the relevant data from the returned raw HTML. And then we have Python.env.
This loads environment variables from AENV file in your app. It is useful for keeping secrets, API keys, passwords, etc. out of your code. So we can store those that sensitive information in AENV file and that can get read those values stored in thatv file get read into memory at runtime. So that sensitive information will not be exposed within your code. And then of course we have playright. Oh bit of a typo. Not platrite playright there. And playright is a tool for automating real web browsers. Chromium, Firefox, WebKit, for example. Used for scraping dynamic sites, testing web apps, or interacting with pages that require JavaScript. Okay, perfect. Let's press the enter key and install those packages.
Excellent.
And it is installing our packages.
Great. Beautiful soup. Playright getting installed.
Excellent. Let's clear the screen.
So the last thing we need to actually do is install the Playright browsers. And to do that, we type playright space install. So to complete our installation of playright, we type playright install at the command prompt and press the enter key.
Excellent. So now our installation of Playride has completed successfully and we're ready to create our scraping files and test the scraping functionality.
Let's clear the screen. Excellent. Okay.
So now we're going to attempt to scrape um booking.com.
Firstly, we're going to use standard playright to attempt to scrapebooking.com.
And then we will see that this actually fails. We get blocked using just standard playright. And then subsequently, we're going to use evomi's scraping browser to scrapebooking.com.
And you'll see how that helps us get over those bot detection hurdles. So, what is Booking.com? Booking.com is an online platform for finding and booking travel related services. Most people use it to reserve hotels, apartments, and host, compare prices, and read reviews, book flights, car rentals, or airport taxis. It works as a marketplace. Accommodation providers list their properties, and travelers browse, compare options, and book directly through this site or app. So, we're going to scrape the website for relevant data and we're just going to do a basic search for uh properties pertaining to New York City. So, I'll show you what I mean now. But firstly I want to create thev file that will contain sensitive information which will be important when we enlist the evomi services because I've signed up to use various plans with evomi and my credentials will be stored within thisv file which is sensitive information that I don't want to make available within the actual code. Okay.
And I'm just going to copy those across.
Great. Obviously, you can see I'm not showing you the actual sensitive information. You can see I've got an API key that will be for when we use Evomi's scraper API. This API key will be pertinent to this part of the course because I'm going to be enlisting Evomi's scraping browser. So that we're not just using standard playright, we're using Evomi's scraping browser to scrapebooking.com. and you'll see how using Evomi scraping browser will overcome those bot detection hurdles.
Okay, so I'm going to hit this new file icon and we're going to create the code for scrapingbooking.com using standard playright firstly. Okay, so let's call this file scrape booking std for standard playright.
So standard playright.py like this. And I've already prepared the code and I'm just going to copy this in here.
And here's the code for using standard playright. Okay. So we've got our environment variable set up and we've created a new file called scrapebooking standard playright where we're going to scrapebooking.com for various search results. um specifically a search for New York City here for New York. We are going to have the HTML returned to us. You can see we've got a response here page.go the URL which is the search URL here. We're printing out the status that's returned to us and and then we're going to attempt to actually extract the relevant data from the raw HTML and we're just using the most important thing to note here is that we are using standard playright to navigate to the relevant search URL. So you can see here here's the code for standard playright here. So we're not using evomi services in this particular code this python code here we are using playright to navigate to this URL a search URL on booking.com I'm just going to show you exactly what this search brings back within the browser. So I'm going to go to the browser here.
Okay. And you can see here I'm just pasting in that search URL. So we need new and then percentage between you and York should be percentage 20 which represents a space. And that is our search on booking.com. And this is exactly what we're going to be emulating using playright to do that. And you can see here it's bringing back these results.
And in the code, I'm actually just bringing back the name of the property, the name of the hotel for example, and its rating. That's all I'm bringing back. Just a simple example. We're first going to try scraping this web page using standard playright. And then we're going to use Eomi scraping browser to do the same thing. You'll see that we get blocked when we're using standard playright. And we can see the advantage of using Evomi scraping browser which I'll demonstrate once we've looked at trying to scrape the relevant data using standard playright. So this here, this code here just uses standard playright to scrape the relevant search URL. And then the rest of the code is really just using beautiful soup to try and extract the data so that we you can see we're scraping the name of the property and then the rating and then we're just outputting those results here within a for loop here. So let's see what happens when we try to run this code. So to run the code we type python and then scrape booking std for standard play right pie. Let's press the enter key and see what happens.
Okay, so it's returned a HTTP status code of 202.
And let's see if it returns the data that we want.
You see what's actually happening here is it's timing out.
So that hasn't succeeded. We've actually been blocked by Booking.com.
So let's just clear the screen. And now now I want to try using Evomi services, Evomi scraping browser to perform the same scraping operation. And let's see what happens. So I'm going to create a file here called scrape booking uh evomi. And I'm just going to go scraping browser here. Py. Okay. I'm going to paste that code in here.
Great.
So, let's have a quick look at the code. And you can see the difference here. Now what we're actually doing is using Evomi's scraping browser from our code here. So we're able to control that browser remotely from our code. And this is how we are able to do it. We're using the WSS protocol which stands for secure websocket connection. So think persistent realtime connection rather than a one-off request. So through this connection our code can communicate with Evomi scraping browser. barcode can both send messages to the remote browser service and receive messages from the remote browser service over the secured websocket connection. We're using WSS for that. And you can see we're able to communicate with the scraping browser because I've actually signed up for a plan on on Iomi. I've signed up for one of Evomi's services and you can do the same. You can get a free trial if you want to test out the code yourself. So you just include the URL here, the browser URL and then using connect over CDP. So the connection is made via CDP which stands for ChromeDev tools protocol CDP. CDP is the same protocol tools like Playright use internally to control a browser. But we are controlling Evomi's scraping browser by using the WSS protocol here. So, we're not using standard playright here. We're using Evomi's scraping browser to see if we can get around the bot detection hurdles that um that Booking.com puts in front of us. So, all this is doing is connecting via connecting to the scraping browser via the WSS protocol. And you can see that there and we're reading this particular credential. This is the API key that you'll be given when you sign up for the scraping browser plan. And you can get a free trial if you want to just test out the code yourself. So, we're getting that API key from thev file here. And then we're able to access Evomi scraping browser through this line of code here.
And then we can control the browser through our code remotely using secure webs a secure web socket connection WSS. Okay. And you can see here we're waiting for the code. We're printing out the status of the HTTP request and then we are using the various CSS selectors to actually scrape the relevant search results and we're outputting the name of the property and the rating of the relevant property. For example, the hotel over here. We're getting the results by reading the relevant data using raw HTML and specific CSS selectors to extract the relevant data, the name of the hotel and the rating of the hotel. We're adding that to an array and then we are looping through that array and we're outputting the results here. Let's see if the scraping browser solves our problem with scraping this booking.com search URL where we're looking for hotels in New York City for example. Okay. So, let's type Python and then the name of the file that we want to run which is scrape scrape booking Evomi scraping browser. We are now using Evomi's scraping browser service to try to scrapebooking.com.
and let's see if we have any luck scraping the relevant data. Press the enter key.
Okay, so it's returning the same 202 HTTP status.
Okay, and this is just outputting status as we try to scrape the relevant search URL.
Wow.
And that is awesome. Look at that. We've been able to scrape the relevant web page at Booking.com. So, we've scraped the name of the hotel and for example, the rating here, very good. That's based on 423 reviews. Spring Hill Suites by Marriott, New York, Manhattan, Time Square, South. And we've got the review based on 600 reviews there. And we've been able to scrape the relevant results using Evomi services using Evomi's scraping browser service. So we were unable to scrape the relevant web page using standard playright.
So trying to do it ourselves. But through Evomi's scraping browser service, we have no problem scraping the results that we want on Booking.com.
Excellent. Okay, so now we're going to scrape indeed.com. So indeed.com is an online job search website that connects people looking for jobs with employers who are hiring. So we're going to do a basic search. So let's just go to indeed.com. We've been to Booking.com.
Now, we're going to go to indeed.com and perform a search. So, www.indd.com.
Let's go there.
Um, okay. See how it actually wants to verify me because I've been doing quite a lot of scraping from this uh from my IP address. So what we want to do is search for software developer and let's do a search and it brings back the relevant data. So we're scraping certain fields from these cards is what we're going to attempt to do for this particular search criteria here. So firstly I'm just going to go into VS Code. Let's create a new file.
And I'm going to call this one scrape indeed.
Std for standard play right. So we're going to try and use standard playright to scrape indeed. And I've actually prepared the code for us.
So I'm just going to copy that over here.
And you can actually see the search URL here.
So we're scraping for software developer. That's our search criteria in New York City. And that's what the search URL looks like. We could even just take that as is. I've just hardcoded it there. And we can paste it in here. And you should get the same kind of results. So hopefully we can scrape these these results.
from this search URL here on indeed.com.
But let's first try to scrape Indeed just using the standard playright code here. And you can see here we're just going p.chromium.launch.
This is just standard playright code.
We're not using evomi services. We're not using proxy servers or anything like that. So it's going to try to scrape this web page that contains the relevant search results for software developer in New York City.
And it's going to use various pausing code, the details of which are not important for this. But basically all this code is doing is extracting the title and the company and the location of the the relevant job opportunities and displaying it to the terminal window here. But you'll see that we actually get blocked. And it's not just a subtle blocking either. It's a hard block, meaning it will return an actual HTTP status that isn't 200. Sometimes you get soft blocked and it will still return you a HTTP success status of of 200, but you've actually been blocked.
It won't return the data that you want.
Um but you'll see here we actually get hard blocked and it returns a status code of 403 which means forbidden which means we are being explicitly blocked.
So let's see that in action. I'm going to run that code now and you'll see that we get hard blocked when we try to scrape this particular web page with this search URL on indeed. So, we're searching for software developer New York. And if we look at Chrome again, you can see the data returned in our browser, but we want to automate that through a bot, which is basically what this Python code is doing. And we want to be able to scrape the desired data, like for example, the job title and the company.
And then we're going to print that that data to the terminal window. So anyway, let's run that code and see what happens. Okay, so Python and then what did I call it here? It is scrape indeed stdplay.py.
Press the enter key and let's see what this does.
403. And if we look up what 403 is, so if we go to Google and we type in, so 403 HTTP status code, and press enter. So HTTP 403 is an HTTP status code, meaning access to the requested resource is forbidden. So we've been explicitly forbidden from scraping this particular data. So how are we going to get around it? We're going to use this time we're going to use Evomi's scraper API. So this will just involve a post request to a specific Evoi URL as well as the URL that we want to scrape and it will handle getting over all the relevant bot detection hurdles for us on the server and return the raw HTML to us and then we'll pause that data ourselves in our code.
So let's get that code up and running within VS Code. So I've actually already prepared the code. So we've just been blocked when trying to scrape that search URL using standard playright. And now we're going to use Evomi's scraper API service to try to scrape the same page and see if it gets over this explicit block here. So I'm going to just create a new file. Let's call this scrape indeed. Scraper scraper or let's go evomi scraper api dot here.
Okay.
And we are going I'm just going to paste that code in.
Okay. So, let's quickly go over this code so you can see what's going on here.
Um, okay. So, we got the main method here.
This is the evomi endpoint. So, we're reading these settings from thev file and the relevant credentials because I'm obviously set up on the relevant scraping scraper API evomi plan. So, I've been given an API key which will allow me to use their services. And then this is the URL that we want to scrape the search URL on Indeed. And then we've got these headers here. So all of this is basically trying to mimic a typical browser um which we can pass through to services. So the referer for example setting user agent we're basically trying to add these settings to our header our HTTP request header so that we can mimic somebody visiting that web page. So a typical human being just surfing the web and performing that search on indeed.com rather than a bot attempting to scrape the relevant web page. So this is what this is all about.
here including the user agent here and the referer. We're trying to mimic a human being visiting that search URL.
So obviously that's to try to fool the bot detection mechanisms into thinking we're a human being and not a bot. Okay. And then here's the code. We're just it's a typical HTTP post request where the end point which is evomi services is passed in as the first argument and then we've got JSON equals to payload and that is this here which includes the search URL the target web page that we want to scrape and then we include the relevant headers here and some of this one one of the settings is the API key um which allows us to use iomi's scraper API. Now, you can sign up for a free trial to try out this sort of code yourself and you'll be given an API key where you can use the scraper API service on Evomi. And then this information is just trying to pretend to be a human being um viewing the search URLs content through the browser rather than a bot trying to scrape the content, which is what we're actually doing here. And then I'm just printing out the status and the message here um from the returned response from this HTTP post request. And then this returns the HTML. We pass that to scrape indeed and it returns the jobs in an array. And then we can just print out those jobs to the terminal window. And let's see if we are able now to get over those bot detection hurdles on Indeed by using Evomi's scraper API. Okay, so great. Here's all the code for pausing the relevant data. It's quite detailed.
It's not necessarily it's not necessary to understand all this detail here. All we're trying to do here is demonstrate that using standard playright, we get blocked with a 403. So, it's a hard block.
And then when we use Evomi's scraper API, we are able to overcome those bot detection hurdles and the data that we want is extracted and we can print that data to the screen. Okay, so let's run the code and see what happens.
Um, and it's scrape indeed evomi scrape API. So let's try that.
Okay. And let's see if we can scrape the relevant data.
Okay. So, let's press the enter key and run our code.
And hopefully we are successful.
Look at that. And we've been able to scrape the relevant data from indeed.com using Evomi's scraper API service. Excellent.
Right. And lastly, I want to demonstrate using standard playright for scraping an Amazon search URL. Here you can see we are scraping Amazon. The query is JavaScript and this search URL just means we're scraping Amazon for books pertaining to JavaScript. You can see here we've got this parameter I equals strip books.
We're searching books on Amazon pertaining to the JavaScript search criteria and we're using standard playright. You've seen this code before.
Browser equals awaitp.chromium.launch.
We're launching a browser and it's in a headless state, meaning it doesn't have it doesn't actually show the browser and we're going to the particular that particular search URL. We're getting the status back here and we're printing the status there and we're just using standard play right here. So, I'm going to type Python scrape Amazon std play right dot Pi.
Um, let me just show you what the results will look like in a browser before we actually demonstrate the scraping. So, I'm just going to go to amazon.com.
So, this is essentially the search that we the search URL that we are going to target. And if we go JavaScript, you can see that we are being returned these JavaScript books here. And we're scraping the relevant data. For example, the title, the author, the rating of the book, and that sort of information.
We're scraping and we're going to display that in the terminal window.
Anyway, let's run the code for scraping Amazon, the relevant Python script. And we're using just standard playright to do this for now.
Okay, we're getting a 503 from Amazon.
we've actually been blocked from performing uh the relevant Amazon scraping functionality. So let's run our scraper code.
Okay, so you can see we're getting a 503 error when we try to scrape Amazon. Um if we go to um Google here, we can see what a 503 error means.
So 53 service unavailable error is a serverside code indicating the server is temporarily overloaded or undergoing maintenance, meaning it cannot process the request. So sometimes the HTTP request that you get back isn't always clear. So what's happening here is we're actually just getting blocked by Amazon. So I'm going to try now use Iomi's proxies to overcome this 503 issue that we're getting and hopefully we can then scrape the relevant search URL on Amazon successfully.
Great. Okay. Okay. So, I'm going to create a new file here and I'm going to call this scrape am Amazon core residential evomi core residential. So this is a plan that you can sign up for on Iomi and you can get a free trial using the core residential proxies on iomi and you can then test the relevant code yourself and see if you are able to successfully scrape Amazon and you'll see now that it's fairly easy to scrape Amazon using this Evomi service. Okay, I'm just going to copy in that code there.
So you can see the code structure is slightly different here. from when we used the scrape API. We're also performing a HTTP request to Evomi.
We're using Evomi's proxies here. So you can see here the relevant JSON that we pass within our HTTP get request.
This object includes these URLs here.
And I'm reading that from thev file. And these are the settings that you will have access to when you sign up for the the core residential plan, the core residential service provided by Evomi.
Um you can sign up for a free trial and you'll be you'll get these settings that you can plug in to your code and then test your code scraping Amazon.
Okay. And these headers are just to basically try to fool the bot detection mechanisms into believing that we're a human being rather than a bot. So they look like it's just natural browser settings. So users just through their browser performing a search on Amazon.
And this is what all of these header settings are all about. And you can see how we're performing just a get request to that page URL. We're passing in the proxies that we have defined here.
And these are the settings that we've attained from signing up to the core residential plan on Evomi. And as I said, you can sign up for a free trial to test this out yourself.
And then we're performing a get request.
And this page URL is the search URL on Amazon. So where we're searching for JavaScript textbooks and then we're pausing the relevant HTML. So we're returning the HTML response.ext passing that through to this pause Amazon HTML there and then we're printing the results here. So we're passing through passing in those results and I've written code to print them out. the the authors, the title, the price, rating, reviews in a fairly neat way to the terminal window.
So, let's try to run that. Let's see if we can get over this 503 error that has been sent to us back from Amazon. Okay, so we've got the scrape Amazon Evomi core residential file here and we are using Amazon's core residential plan to now scrape the relevant search URL on Amazon. And you can see here I'm including some header values to further uh try to fool the bot detection mechanisms into thinking that we're a human browsing the web. Great.
So let's try to scrape Amazon using the core residential services provided by Vomi. So to do that we type Python and then scrape m whoops Amazon evi core redential pi and let's press the enter key and see if that works.
Excellent. So great, that's wonderful. You can see all this data has been scraped from the relevant Amazon web page.
So you've got a status of 200 returned instead of that horrible 503 status, meaning we've been blocked.
And you can see here, status 200. Okay.
And we've got 24 results. JavaScript the definitive guide master the world's most used programming language David Flanigan and we can see we have all these results returned to us and that is some useful data and you'll see later on we're going to use node we're going to use JavaScript within a node project to scrape Amazon much in the same way as we're doing here using the core residential service provided by iO for me. Excellent. And we could just for good measure try another search criteria. So let's scrape for Python this time and see if we get some data back from that. So let's clear the screen.
Python. Let's run the Python script. Scrape Evomi.
Sorry. Scrape Amazon.
vomi core residential.py and let's see if we're successful again.
Excellent. 200.
So we've got a HTTP status of 200 returned.
Okay. And Python crash course third edition. We've got all this data. We've got the authors. We've got the price and all this useful data that we've scraped from the relevant search URL on Amazon.
Great. And let's try another search. So, let's look for Java books, books on Java, and see if we can successfully scrape the relevant Amazon search URL, the web page pertaining to the relevant Amazon search URL. Let's see if we can successfully scrape for the criteria the search criteria of Java. So we want book data pertaining to Java returned to us and then we can scrape the title, the author, the price and the desired data that we want and we can display that to the screen. So let's try that. So so let's run the relevant script. Python scrape Amazon evomi corresp.
And let's see if we get decent search results.
There we go. So the 10th result that we've got back, Java, the complete reference from fundamentals to advanced concepts with modern AI assisted development from Bruce Herbert.
Brilliant.
We've got let's see how many results we got. 33 results and of course the HTTP status is 200 and it's not 503. And we've got some useful data returned from the relevant search URL on Amazon.
And we're able to use beautiful soup to you can see here we're using the CSS selectors and beautiful soup the beautiful soup package to extract the data that we want and we're displaying it here in our terminal.
Excellent.
So before we develop our full stack application, let's look at a tool that can be very useful in showing us how we are perceived as it were by targeted websites. Are we perceived as a bot or do we pass the relevant bot detection tests? The results of such tests can be done through a web page at this URL.
https colon/bot.sanisoft.com.
This is a great tool to help with getting around bot detection tests. So in this part of the course, we are going to firstly use standard playright to access a web page using a headless browser through JavaScript code. I'll explain what playright is in just a bit.
When I say standard playright, I mean we are going to access the web page at sysoft.com where we'll be tested for tail telltale signs as a bot or not without any help from Iomi's scraping browser. You can sign up to Evomi's scraping browser plan for a free trial. If you wish to follow along when we use Evomi's scraping browser to access the web page at sanisoft.com to assess the differences in results when compared to when we access the same page using standard playright. The web page we are going to access is not just a normal web page. It is a web page specifically created to output status information regarding how a website will see your incoming HTTP requests in the context of bot detection. It specifically outputs tables of information highlighted in red where your incoming HTTP requests have failed common bot detection tests. So as mentioned this web page can be found at this URL sanisoft.com and it will tell you why you are being detected as a bot or reveal the fact that you are passing the bot detection tests once we have tested accessing the Sanoft web page through standard playright functionality and observed the results of the test i.e. while we are being detected as a bot or not. We'll switch to accessing the SanSoft web page, but this time leveraging Iomi's scraping browser remotely from our code. As mentioned, you can test the relevant code by signing up for a free trial using Evomi's scraping browser plan. And when you run the relevant code, you'll see that the red flags highlighted in red on the SYSoft web page which you observed when accessing the Sanoft web page through standard playright will now be green. Meaning you have passed the relevant bot detection tests. So, this part of the course highlights how using Evomi's scraping browser gets around certain bot detection tests that many websites conduct to prevent bots from scraping their websites. So, what is Playright?
Playright lets you simulate a real user interacting with a website, clicking buttons, filling forms, navigating pages, and then check if everything works correctly. Here are some key features of Playright. Multi browser support works with Chromium, Firefox, and WebKit Safari Engine. Crossplatform runs on Windows, Mac OS, and Linux.
Auto waiting automatically waits for elements to be ready. So in terms of scraping, you can use Playright to wait for the entire page to load before executing your scraping code. If you started scraping and the page is not fully loaded, the elements that contain the data you wish to target may not yet be loaded. Headless mode so that you can run the browser without opening a visible browser, i.e. a browser with a UI.
Powerful selectors easily find elements on the page. Network control, intercept, and mock API requests. Great. So, let's get into this part of the course. So, we're going to set up a few JavaScript scripts here um that we'll run using Node to illustrate how we can use Evomi's headless browser to get around the various bot detection mechanisms that exist out there. So the first script that we're going to run is this standard playright script. So what we're going to do is we're going to access a website called httpsbot.sanisoft.com.
bot.anisoft.com is a specialized tool used to test a web browser's stealth and detectability by automated bot detection systems. So I'm just going to copy this code. You can find this code at this uh URL on GitHub and you can do the same. You can just copy it into your into VS code and run the code yourself to um test this out yourself. So I'm just going to copy that code like this.
And I'm going to firstly create the file here. So what did we call it here?
Test standard playright. So test_std playright.js and I'm just going to paste that code in here. So we got that script pasted in here and you can see we're using standard playright to navigate to this URL. So what that does is it makes sure the entire page is loaded before it takes a screenshot of the page. We're using a headless browser, a playright's headless browser for this purpose. So the primary difference between a headless browser and a head full or normal browser lies in the presence of a graphical user interface. While both use the same underlying engine to reveal web pages, they are used for different purposes and operated in different ways.
So, we're using a headless browser here to access this web page and then we're scrolling down the page, making sure the entire page has been loaded with all its content and then we're using playright here, this page object PDF method here, passing in this object and we are essentially screenshotting what's on the page and saving it as a PDF here.
Okay, that's what that code does.
So, we're going to see a screenshot of the web page at this URL, and this will show us our status in terms of if we've been detected as a bot or not. So, let's run this. So, I'm just going to go into the terminal here. Let's just set this project up for node npm init like this.
And that should generate our package.json file. We got a few dependencies we need to include. We want to read a particular environment variable from from AENV file. So we need to install the relevant package for that. So we can do that through this command npm install and then env like that.
Okay, great. Let's clear the screen.
I'll create thatv file in a bit. And then we also want to install a particular extension so that we can view PDF files from within Visual Studio Code. So the extension for that is VS Code PDF like that.
And there it is here. So we want to install this just for convenience so that we can view the relevant PDF file that our code's going to generate within Visual Studio Code. So let's press install like that. Trust and install.
Excellent. Okay. Go out of that. And you can see we're going to be saving that PDF just to our current directory here when we run it.
And then we must install playright. So let's install playright. And I'll explain what playright is in just a bit.
So we can install playright using this command npm init. And then playright at latest like that.
Okay. Okay. And it's just prompting us here. Okay. To proceed. Yes. So y enter.
Okay.
Do you want to use TypeScript or JavaScript? So I want JavaScript. Enter key.
Okay. I'm just going to press enter here.
Uh add to GitHub actions workflow. I'm just going to go in here.
Um, install Playright browsers can be done manually. NPX and install.
Okay, I'm just going to go. Yes. And it's now installing.
Okay. So, it's installing all the Playright browsers for us.
Okay, sure. That took a bit of time, but now it's done. Good. So, we've downloaded loaded all the Playright browsers. Okay. So, I'm just going to clear the screen there. Okay. So, let me just quickly explain what Playright is.
So, Playride is an open-source automation library developed by Microsoft designed specifically for end-to-end testing and web automation.
It is widely considered the modern successor to tools like Selenium and Puppeteer because it is faster, more reliable, and built to handle the complexities of modern web applications that use for that are built using React or Vue. For example, we are using a headless browser to access the httpsbot.sanisoft.com web page here to see what our status is regarding whether how we are either being detected as a bot or not being detected as a bot. Essentially, it's going to access the web page at this URL. And then we're taking a snapshot, a screenshot of the web page because remember this is a headless browser we're using here. So, we're not going to be able to view the web page in a browser, but what we can do is we can stealthily take a screenshot of what the browser would look like if you were just accessing it using a headful browser, for example. So, let's uh test the code by typing node and then test std play write.js like that. Okay.
Excellent. See what it does.
Okay, great. So that code has executed successfully.
So it's written out the title which is antibbot and it's PDF saved as standard playright.
PDF. So if we we can now access that PDF. Let's just click on it. We can see it through Visual Studio Code because we installed the relevant extension. So we can actually view the PDF through VS Code.
So if you look here and this is the point of this section of the course really to show you how we've been detected as a bot. You can see the user agent for example old web driver. These are the ways that we've been detected as a bot. these red highlights here. Uh we've passed web driver advanced. Okay, that's green. So the green shows where we've passed the test and the red shows where we've failed the test, i.e. where we've been detected as a bot. So now we come to the point of this section of the course where we'll be able to use Evomi's headless browser using the WSS protocol and I'll explain what that is in just a bit. We'll be able to connect to Evomi's headless browser via Playright and essentially run the same code basically screenshot the web page and we'll be able to see hopefully it will turn where we see these red highlights it will turn them green which means we've passed those tests. So you can see here fail fail we're being detected as a bot.
Basically, we wouldn't we wouldn't get past that hurdle and we wouldn't be able to scrape the relevant web page is basically what we're being told here.
So, let's go to GitHub here. And we've got this test_omi browser script here.
And I'm going to paste and I'm going to copy that. Copy raw file. I'm going to create Oops. I'm going to create this script. So, it's test evomi browser.js.
So let's create the file test evomi browser.js and let's paste that code in to the new script file here.
And now this is the important bit here.
You see we're not just using standard playright. We're connecting to Evomi's headless browser via WSS secure webs a secure websocket connection. So the WWS protocol is the protocol used to establish a persistent birectional and encrypted connection between your local script and a remote server. In this case the browser instant hosted at browser.eomi.com.
So this is what this code is doing here.
this connect over CDP and then we can use playright pretty much in the same way that we used it in this script here. So all we're doing is scrolling down the page and and taking a snapshot, a screenshot.
We're making sure that the entire page loads and then we're taking a screenshot of the relevant web page. we'll be able to inspect the the PDF file that it creates which will be similar to this but hopefully these red highlights will be turned green meaning we've passed the relevant bot detection tests. Okay, so let's go to test.eomi here and let's run this code. Actually before we can run this I'm storing an API key.
You'll need to, if you want to run this, you'll need to sign up for a free trial with Evomi to use Evomi's browser. So, you can see I'm trying to access an environment variable, but I do not have an ENV file.
I'm just going to do that quickly. Going to create a dot aenv file here. We need to create an environment variable called API_key browser. And I'm just going to paste the relevant key in offscreen. Okay. And we're now ready to run this code. And then we can compare this PDF where we use standard playright to what the PDF will look like when we scrape using Evomi's browser remotely.
Let's run the script.
So node test vomi browser.js JS.
Press the enter key.
Great.
Great. So, it's saved the evomi browser.pdf file. So, it's taken a screenshot of bot.
The web page at that URL, but we've connected it. We've connected to that URL via Evomi's browser. And we're connecting to Evomi's browser remotely using WSS. the WSS protocol secure websocket connection.
Okay. Okay. And you can see we've had to pass in as a parameter the key there which you which will be revealed to you when you sign up for a for the relevant Evoi plan which in this case would enable you to use Evomi's browser at this remote location but you need to include your credentials in the form of an API key which will be given to you once you sign up for the relevant plan and you can sign up for a free trial and try out this code yourself. Okay, great. So, let's go to the PDF file that's been created now. Evomi browser.pdf.
And look at that. So, now we want to compare standard playright.pdf. You got all these red highlights here, meaning we failed all these tests. So, it's detected us as a bot here. And if we go evomi browser, we've got lots of green, meaning we've passed all these tests.
Look at that. So when we use standard playright we get we we fail many of the tests and that is denoted by all these red highlights here. But when we use Evomi's browser using the WSS protocol as you can see here we connect via playright to Evomi's browser and then through Evomi's browser we we are connecting to the relevant URL and then we're taking a snapshot a screenshot of what the page looks like and we can see it here and you can see all the green meaning we're passing all the bot detection tests. meaning we're getting past those bot detection mechanisms and we can successfully scrape the relevant web page. Excellent.
Okay, so now we have had some hands-on experience using Evomi's scraper API and we've also utilized aggressive proxy rotation using Evomi's core residential plan in order to scrape the notoriously difficult to scrape website Amazon. So let's create a full stack web application with Node on the back end, React on the front end, and we'll use MongoDB as our database storage facility. So this can hopefully provide us with a realworld context by implementing web scraping functionality within an application built on top of the MER stack. In this full stack web application, we are going to implement serverside code and JavaScript within a node project to scrape the Tyobi index through Evomi's scraper API. The first time our code scrapes the Tyobi index, our code will save the scraped data to a MongoDB collection. So when a request for the Tyobi index data is subsequently made by a calling client, the cached data saved to the relevant MongoDB collection will be returned from the database to the calling client. We'll implement React code to display the Tyobi index data in an aesthetically pleasing way on our own website. We'll then implement code to scrape Amazon within our node project. As seen earlier on in the course, it will be better to scrape Amazon using aggressive proxy rotation which we are able to leverage through Evomi's core residential infrastructure provided to us through Evomi's core residential plan. As mentioned earlier, you can sign up for a free trial using this plan if you wish to follow along with the practical example in this course. So when Amazon is firstly scraped regarding specific search criteria for example JavaScript or karate or whatever the relevant search results are scraped from Amazon and then saved to a collection within our MongoDB database. Subsequent requests from the client will result in the data being returned from the database. So if the user inputs for example JavaScript on the client and sends a request for relevant Amazon data. If Amazon data pertaining to the search criteria JavaScript is found within the relevant collection within our MongoDB database that data will be returned from the database to the client. If the data is not found within the database, this will trigger our code on the server side to scrape Amazon for the relevant search results.
The search results will then be saved to the relevant collection within the MongoDB database. So this full stack web application that leverages the classic MER stack provides us with a realworld code example of how we can capture data from the internet, structure the data appropriately and save that data to our own databases.
We could then for example use LLMs to glean insights about the captured data or even train our own models on the captured data. In this world of AI where more inbuilt intelligence is required, applications are certainly becoming more datadriven.
Right, let's create the code for our full stack web application. Okay, so the first thing we're going to do before we start developing our full stack web application using the MER stack. First thing we're going to do is create a root directory to house our project. And then we can load up VS Code and start developing our full stack web application which uses node on the server side and React on the client side. And we're using MongoDB for our data storage facility. So first thing I'm going to do is I'm going to create a root directory for our project. So I'm just going to create a folder here called Evomi.
And then I'm going to go into the evomi folder and this will be our root directory and I'm going to call this info data because that's the name of our project. We're going to be capturing data and caching it within a MongoDB database. So we're going to capture we're going to firstly capture the data using web scraping through Evomi. We're going to capture the data from the Tyobi index web page which contains data that ranks the top 20 programming languages and we're going to scrape that data and we're going to put it into a table or a collection rather within our MongoDB database. So let's call this folder info data.
Okay. And then let's load up Visual Studio Code.
Okay.
Great. Let's make that full screen.
And I'm going to open that folder. Open the root folder that we've just created within CDEV. And this root folder is called info data as discussed. So let's select folder.
And now we want to create the root folder that will house the serverside node code. So we're going to first create a folder called server.
And within server we're going to create a folder called info data capture because we're going to scrape the data from the relevant web pages. So we're capturing the data from the web pages through Evomi's services and then we're going to cache that data within our databases. So it will our code will first check whether that data exists within the database. If it doesn't, it will scrape that data from the relevant web page and save that data to a relevant collection, an appropriate collection within our MongoDB database.
Great. So that is the root directory for our serverside code, our node code. So I'm going to invoke the terminal window.
To do that, we can go control tilda character like that.
And it's defaulted us to the info data root directory. That's the root directory for the entire project. But we want the root directory for the server side code. So let's go cd server.
Whoops. Server info data capture. And now we're in the root directory for our server side code. And now we can create our node project. And to do that, well, firstly, let's just check that we've got an appropriate installation of Node. I recommend that you install the latest version of Node. If you haven't yet installed Node, we can type Node and then hyphen V. So, I'm on version 25.6.1. I think that'll do. That's good.
I think that might actually be the latest stable release of Node. So, that'll work for me. So, just check your version of Node before you get started.
And then to initialize your node project, you type npm init y. So in case you're wondering what the hyphen y is, it basically skips all the questions and automatically creates the package.json file using default values. So if we didn't include the hyphen y and just np npm init, it's going to take us through a whole bunch of questions that we have to answer before it initializes our project. So we're bypassing that process by including hyphen y and it will just in it will just include the default values within our package.json file.
That's the consequence of including hyphen y to get past all the relevant questions. Let's press the enter key and that's done. And you can see it's created that package.json file there.
And now we want to install a few dependencies. We want express which is a minimum web framework for NodeJS and used to build APIs and web servers. It handles rooting middleware and HTTP requests and responses. And then we also need to install a package to handle cores. So, we're installing the cause package so your server can allow requests from different domains, different clients that are hosted in different domains, different origins.
So, our client's actually going to be hosted on a different domain because when we test our code, we're going to be running it on local host, but the port number will be different. The port number will be 5173 for the React client, and the server will be running on port 3000. So we must account for cause and that's a security precaution.
Okay. And then we're going to include a package called Mongoose. Mongus is a MongoDB object modeling library for NodeJS. And it lets you define schemas and models to interact with MongoDB in a structured way. So to do that we type npm install.
Okay. Let's just give ourselves a bit more space here. I'm actually going to go out of that. So, npm install and we want express and we want mongus and we want the cause package.
Excellent. Press the enter key.
Okay, it's doing its thing. As always, when installing software, one has to have patience, but we are getting there. And that's done. Excellent. Okay. And one other dependency we want to install because we want to store our environment variables in a file called env.
So because we want to do that, we need to enable our node code to be able to read that file in an easy way. So we need to install another package for this. And this package is called env. So let's install the env package. So to do that you go npm install and then env like this.
Great. And it's done that for us. And we are now ready to create some code. So firstly within the info data root directory for our node code we're going to create a file called server.js. And this is going to be the entry point for our node application, the node code here. So we're going to go through creating that code now here. So the first thing we're going to do is import the express framework to handle routting and server logic. So to do that we go constress equals to required then in braces. So it's not required, it's require. So remove that d there and then within single quotes we include express and we just installed that package.
So that's to import the express framework to handle routting and server logic. The next line of code we want to import cause and this we import cause. This is middleware to allow cross origin resource sharing. So to import cause we go const cause equals to require then open the braces and include whoops cause like that.
Now we want to include mongus. So to do that we go const. Let's call this variable mongus like that. And we go it's the same basic thing. We go require and then braces mongus like that semicolon and then we want to be able to read values from a file called env. And we we installed thev package for this purpose.
So we need to import this. It loads the environment variables with av file into process.env.
So in order for that to work, we need to include this line of code. Okay.
ENV is the package and then config like this.
Excellent. Okay.
And now I'm just going to set a constant to a controller that we haven't yet created. But we are going to create that. Have no fear. We will create this controller fairly soon. And that's not how you spell controller. Controler like that equals to require. We haven't obviously created this but we are going to create this path this file at this path for the controller. We're going to create a directory called control lurs and then we're going to create a file called tyobic controller.js.
Okay. And for this code, we just need to include tyobi controller using camel case like this. And we don't need to include the extension, but we will create a file within this folder called tyobic controller.js in just a bit. Great.
So, we're first going to start by scraping the Tyobi index and saving that to our the relevant MongoDB collection.
So that's what we're going to do first and then later we're going to get a little bit more complex and we're going to try to scrape Amazon. Great. So firstly we're going to focus on the Tyobi index. It's a much easier web page to scrape and you'll see why this is in in a little bit later on when we try to scrape Amazon. So the next line of code is const app equals to express.
Okay.
So that initializes a new express application instance and that's our web framework that we're using in our node project and then app do use we want to use cause and this is how you do that use cause like that okay and that enables the application to accept requests from different domains as discussed earlier. So we're handling cause. It's just done very basically through this line of code. And then we want to configure the server to automatically pause incoming JSON request bodies. So to do that we need to include this line of code app dot use express.json like that. Open and close braces.
Excellent. And then we need to connect to our database.
So what I'm going to do here is I'm going to create av file because we're going to read the connection string for our MongoDB database from thev file. And we're going to do that in a specific way that thev package allows us to do. Of course, we only very recently installed thev package. So within info data within this root directory we include a file called env. And because we've installed the package it's going to automatically know where our node code is going to know automatically where to read the relevant environment variables that we're going to create here in this file here. Okay. Okay, so the environment variable that we're going to create is called uri uri and we're going to set it to the relevant URI for MongoDB. So it uses the MongoDB protocol to connect. So MongoDB, that's the protocol. Colon/ And we're running it on our local host just during testing. local host and then this is the default port number for MongoDB. If you've installed MongoDB, you'll be able to connect to your local instance of MongoDB on 27017 like that.
And then we want to connect it to a database called info data.
And if that database doesn't exist, it will actually create it.
So we've got our URI environment variable set up for that. Excellent.
Great. So let's go back to the server code there.
And now we want to create the relevant connection to MongoDB. So we go mongus. We go through mongoose. We go mongus.connect.
And now to read that environment variable, we include process.env env sorry uri URI like that and that connects to the database using the URI stored in the environment variables and then we can include dot then if we are able this is the condition for when we when our code successfully connects to the relevant instance of MongoDB and you write the code like this using an arrow function console whoops console.log and then we want to output connected and this will just output the relevant text to your terminal window to whoops to Mongodb like that. Okay. And if an error occurs, we can catch that error through catch like this. So we're chaining these methods and then use an error function and we can log out the error like this. So let's go connection error and we can include the details of the error by including the error like this there.
So this code here logs a success message if the database connection is established. And this code here logs an error message if the database connection fails.
Okay, perfect.
Great.
Now what we want to do is map a route to the controller that we haven't yet created.
So to do that we go app.get we want to map a HTTP get request to this path this relative path. So API tyobi like that and then our controller will be tyobi controller like that dot and it's going to be called get tyobi rankings.
Okay, so we're mapping a get request for Tyobi data to its respective controller function and that should have a K in it like that. Get Tyobi rankings. Okay, excellent.
And then we want to read the port number. So let's establish the port number for our server side code. And the port will be 3000 which is the default port for a node project node application.
Okay. And we can read that port number like this. So we first want to see if we can read it from thev file and we do that through this code because we're using thev package port. That's the name of the environment variable we've just created. Then we can use the or operator.
If no value is read here. So if that doesn't return anything, then we can default the port to 3000 like this.
Okay. And then we want express to listen. And we can do that with this line of code. So that starts the server and listens for incoming traffic on the specified port.
So app.listen and let's include the port, and then let's output some status.
And we can just output server running on port. Let's include these between backtick characters so that we can include variable values within the string.
So include these include this string within back to characters and then we can include the port number very basically by including the dollar symbol and then wrapped within curly braces port. So it's that constant there we're including within the string here and we want that outputed to the terminal window when Express is successfully listening on this port which will be 3000 and we've actually configured it here within ourv file as 3000 there.
Excellent. Okay, I've just noticed a bit of an issue here. A bit of a typo. So C O R S E this should be just cause co R S great and now we want to move on and create the model for our language and that will mean that it denotes the fields. So basically if this was a relational database table it's the table with all its fields. We're defining what columns will be in our tables and what their data types are. But this is for a MongoDB collection. So we're going to define what the schema for our a collection item or a document will be.
What the schema for a document will be is what we're going to define now within a model. So let's create a folder called models. M O D ls like that. And then let's create a file that's going to house our model. And we're going to call this file language.js because we're scraping the TY index for the top 20 ranked programming languages.
So let's just keep this simple and we can call this language.
Okay. And then let's define each of the fields for a language document. basically uh a language r ranking document. We're going to call it language ranking. We're going to call this model language ranking but the file is just called language.js.
Let's firstly write the code to import mongus.
So we go const mongus equals require and then single quotes mongus like that. So it imports the mongoose library to interface with the MongoDB database, the relevant MongoDB database.
Okay.
And then let's create the schema. And to do that we type const and let's call this const language schema equals to new and then mongus.
We're going to use mongus to create our schema.
And then the method is called schema.
And we want to pass an object to the schema that will denote each of the relevant fields within our schema. The first field is ranking. So 1 2 3 4 5.
Python is one two. It could be uh JavaScript whatever. I don't think it is JavaScript nowadays, but might be anyway. So ranking and we're going to type. So this is a number.
Okay. Com, required. It's a required field value. So you must have a field value for this field. True like this.
So the schema defines a new schema structure for your programming language data.
And then the field here ranking specifies a mandatory numeric field to track the language's current rank. So this is what this field here is. And now let's Whoops. Now let's move on to the next field. Okay. And that is I'm just going to call this p lang. So programming language just a just for short pang. And then put an object next to that and define its type as string.
And that should have capital I believe string comma and this must also be required.
So required is true like that. And then we want to trim off any white spaces. So we include trim equals true. So we've defined a required string for the language name and automatically it will remove any extra white spaces.
Okay. And then let's move on to the next field.
Okay. And that field will be image path.
So it's going to be the path of the actual icon representing the language.
So, so that we can scrape the path for that image and we can include it on our web page at the appropriate time and we'll do that when we create the relevant React code. But we want to just capture the raw data for now. So, image path colon type and this is a string required. Yep. We want we want each programming language to have an associated icon or image. So required equals to true. So it stores the file path or URL for the programming languages logo icon. That's what image path is for.
Comma. And then the last one we want to automatically record the current timestamp whenever a document is created within the relevant collection. So every time a document is inserted into the relevant collection, the default value will be included in the update at field and the default value will be date now. So it'll be the current date and time when that document is inserted into the collection.
Okay, excellent.
So, we got one last piece of code to write to complete this file, and it's module.export like this equals to mongus.
Oops, that's not what I wanted. Mongus dot model.
And then we want to call this model language ranking.
And we pass the schema that we've just defined here. This here. So now we want to pass this const here, the definition for the schema into this model method here.
Okay.
And what that does is it compiles the schema into a model and exports it for use in other files. And we're going to create those other files in just a bit.
In fact, we're going to create those other files right now. Excellent.
Okay. So, now we're going to create the service code. Now, the service code is crucial and a core part of the this functionality because what it does is it actually executes the scraping of the Tyobi index. Brings back the raw HTML and then we're going to use a technology called Cheerio. So, we're going to um install a package called Cheerio. And what we can leverage this for is to parse the HTML and create an array from that HTML. So we can scrape the relevant field values that we defined in the language schema ranking pang and image path. And we can scrape those values and include a an array scraped from this HTML here. I'll just show you here.
Here's the Tyobi index. So we're going to use Evomi's web scraper to scrape this table. Scrape the data here. So you can see here we got Python at number one, C at number two, C++, Java, C, JavaScript, Visual Basic. Huh, Visual Basic's gone up one. That's interesting.
Looks like Visual Basic's becoming more popular. So anyway, so we're going to be scraping the relevant data here, which will be include the ranking value. So if we look here at the schema, we got the ranking value 1, two, 1, 2, 3, 4, up till 20.
So one Python, two C, three C++, four Java, five C, etc. Um, and we're going to be scraping the language name. So that will be the actual language name. And if we look at the table, so that will be this value here as well as the image. So we're going to be scraping the path to that image. So we can include that on our own web page. Okay?
And we're going to scrape that path and put it into our an array of data. Each item in that array will have these fields. ranking pang image path and that's the fields that we've created that we've defined within this language schema that we've called and we've called the model language ranking. So now we're going to create the service that will use Evomi's scraper functionality to scrape the Tyobi index all the data and then we're going to use Cheerio to take that raw HTML that will be returned from the Evomi endpoint and we're going to use Cheerio to parse that raw HTML and turn that raw HTML into an array of data based on this schema here. So firstly our code will check to see if the relevant data if the tabby index exists within the relevant collection. If it doesn't exist it'll scrape that data and save it to the collection in the MongoDB database.
Okay. So let's create our service. So firstly let's create a folder with an info data capture and let's call this services like that. Within services, let's create a file called Tyobi service tyobi service.js.
And we're now going to create the relevant JavaScript code for the Tyobi service. Firstly, we need to install Axios. We're going to use Axios to make our HTTP requests and get our responses from the relevant HTTP request. So, we're going to use Axios for that purpose. So let's install Axios. But we also want to install Cheerio, the Cheerio package, which will help us parse the raw HTML that will be returned from the relevant Evomi endpoint, which will kick off the relevant scraping functionality. When we call that Evomi endpoint, the scraping functionality gets kicked off on the Evomi servers and Evomi returns the raw HTML scraped from this web page here. so that we can grab this table and grab the information from this table and save it to our own collection in our MongoDB database. Okay, great. So, first thing we need to do is install Axios and we also want to install Cheerio. So, to do that, we go npm install then Axios and then space cheerio.
Whoops.
Cheerio like that. And that's all there is to it. And we're installing those two packages that we need. So we got those two dependencies.
Okay. And it's doing its thing. Great.
We've installed those packages and now we can write our code. So the first line of code const axio whoops aios equals to require axios.
Okay, like that.
So that line of code imports the Axios library to make HTTP requests to external APIs or websites. That's what we can use this Axios functionality for.
And let's create another line. And now we want to import Cheerio. So const.
Let's call this Cheerio equals to require. Whoops. What have I done there?
Doesn't look right. Require.
And then we want to pass Cheerio as the value here. So we can import Cheerio into our code. So it imports Cheerio to pause and manipulate HTML markup using a query like syntax.
And we'll look at the details of this in just a bit. But for now, let's just import that functionality, that library. Okay. And now I'm going to actually firstly create the method that pauses the Tyobi HTML.
So it returns Evomi is going to return the raw HTML from this web page. And we're going to pause this table here.
We're going to use CSS to target the relevant values we want in our collection.
And we're going to use Cheerio to get those values. And then we're going to turn each of these rows. Each of these rows will denote an item in an array. So we're going to create an array out of this table basically. And we're going to save that array to a collection within our MongoDB database. So let's write the pausing code first. So, we're going to write a function called pario whoops tyobi html and it accepts a parameter of that we're going to call HTML. So, that's the HTML we've we'll scrape through iomi service scraper service. So that returns the HTML from the service and we're parsing it in our code here turning that HTML into an array of relevant values. So we're declaring a function that transforms raw HTML a raw HTML string into a structured data array. Okay, first line of code is const and dollar symbol equals to cheerio dot load. We want to load the HTML into Cheerio and Cheerio using a J um a jQuery like syntax we we're able to parse the raw HTML and extract the data we want in our array. Okay. So it loads the HTML into Cheerio so we can query it using CSS selectors and that's how we are able to extract the values through CSS selectors.
Okay. And we want to push that data into our own array. So let's define the array.
Okay. So const ranking error equals to empty array and this is what will store the structured data array that we're going to extract from the raw HTML. And we want so all rows to target this table here to target this table within the raw HTML. Bear in mind the raw HTML is going to include everything including all of this. So we need to target that.
We're going to use that. We're going to target this table through a CSS selector.
And this is what the selector looks like. So we go all rows like this equals to dollar. And you can see the syntax is very jQuery like. And now we include the relevant CSS selector dash top 20 Tbody. So this is how we are able to target that table that contains the relevant rankings through this CSS selector TR like that. And you can actually see that by going to through your browser and just sort of right clicking that and inspect.
And there you go. You can see that CSS right here.
So we're using developer tools within the browser within the Chrome browser here. And you can see why we've why we are using that particular CSS because it's enabling us to target that table.
And here's the table there. And if we look at the CSS code, the CSS selector table top 20 Tbody tr.
So that is returning an array of values.
So you can see there table table dash top 20 top 20 which is the relevant CSS class and we're using a CSS selector to target that table and then an array of these rows each column these columns essentially store the data that we want.
You can see number one that's the ranking this is the one we in fact want number one. And you can see for example this image here and we are actually going to scrape this path. And you can see all that does is it gives you a the logo the icon logo for Python. So it means we'll be able to include that logo within our own web page. And we'll look at that when we get to the do the React code. But this is how we are able to use CSS selectors to in to successfully target um I lost that there sorry to effectively target the Tyobi index here and you can in inspect the relevant elements on your web page and infer what CSS selectors to use and then you can just use those CSS selectors within your Cheerio code. this JSON-l like code here and you can extract the relevant values from the web page and that's essentially how we are scraping the TYOB index here. Great. And then so this should return an array of rows and we can go all rows do each we want to loop through iterates over each table row found in the HTML structure. So all rows do each index. So that's just a counter. So each row will have an associated index value.
So we want to include that double brackets like that.
Comma element like that.
Okay. And then we want to include an arrow. This is an arrow function. Curly braces. Let's include the function code.
We want to close that round bracket that we created there. So that round bracket is associated with that round bracket there. And then we've encapsulated these two parameters within round brackets.
And this is just the syntax for an arrow function. So all rows do each and then we can look inspect each column within that table. So const and then dollar td equals to dollar element. And this is just how you use Cheerio element dot find.
So we want to find the TD in each row.
So this finds all table data cells within the current row being processed.
So that's an array. This TD const here is an array of TDs.
So if we look at the table here one one Python the image path all of that all of those columns are included in the array of TDs. So one row so we're looping through all the rows and then it's a two-dimensional array and then each column we're looping through all the values in the columns for each row.
Essentially that's what we're doing here. So that's returned an array there.
Okay. Okay. And then we're going to ranking array because we want to now extract all the values and push them into a structured array. And that's what we're doing here. Okay. Open curly braces there.
Okay. And now we can include name value pairs ranking.
So it's an object of name value pairs.
It's an object array.
Okay. So let's name this name value pair ranking and then we pause int because this is an integer.
We defined that earlier on in our schema. And now this is the query code.
So we're using TD here and we're actually going to use TD. Must have a dollar there.
EQ. This is how we're traversing the relevant elements, HTML elements, and we want the first column in the array. So, we want this value here, number one, and we're using CSS to target that value. So, I'm finding all the TDs. This is a TD array.
Get and get the first column value. And we want the text. So it's going to that's for example for Python that will return one for C it's going to return two it's the ranking associated with the programming language comma okay and then we want the actual language name so P lang like this okay we go TDS so let's call this TDs because this is an array of TDs. So let's call that TDS.
So dollar TDs and then EQ.
And we want the fourth column here.
Sorry, EQ.
We want the fourth column. So get four.ext.
And we want to trim any white spaces off there.
So we're just sanitizing the data a little bit there. Pang.
And you can see here that for example in the first row that would be Python 0 1 2 3 4 Python.
And that's what that corresponds to there. And we're doing that for all the rows. You can see here all rows do all rows each. We're looping through each of the rows and we're targeting the values within an array of columns. Each row contains an array of columns. And then we want the image path path. We're going to call this name value pair image path. So that's the name of the name value pair. And we want at@ TDS. Sorry, dollar TDs. I don't know why I said at dollar TDS.EQ.
And we're getting the third column value here. dot find we're going to find it from the image element image HTML element and we want the src attribute because that contains the path for the relevant language image and we can get that from src the src attribute and that of course is just basic HTML src within an image tag there and then we can return that array Okay, that ranking array here to the calling code and that is now a structured array containing the values we want from the web page. We've used Cheerio to pass the raw HTML and extract a structured array of values that we want to capture in our database and display to the users, our users via relevant React code that we'll create a little bit later on the front end code.
Okay. Hey, and look at that. So, we've written our pausing code now called par tyobi html.
And now we want to actually kick off the scraper.
So, let's do that. Let's create a function called fetch tyobi rankings. So const fetch tyio whoops tyobi rankings equals to it's an asynchronous function.
So let's include the async keyword and then an arrow function like this.
Okay. const payload equals to now this is an object with a property called URL and we're reading this value the tyobi URL this will be this particular URL here so it's going to be that particular URL and I've defined that within the env file I'm not going to show the NV file because it's got sensitive information in it at this point. So I did show you earlier on the basics of how to define your environment variables with their relevant values. So we're going to read a particular environment variable which I've configured offscreen but at this stage you don't need to know what the value is but just how to define your own environment variable. So it's called Tyobi URL. We're reading that from thisv file here.
Okay. So, we've got the payload and that's the the basically all that is.
This isn't particularly sensitive. Of course, it isn't. And that's all that is. It is. It's the URL for the Tyobi index. That's all we're passing through because we need to pass that URL to the relevant Evoi service, the scraper service. Okay. And then we go const response. So now we're going to call Evomi to kick off the scraping process.
And we're going to use Axios for that.
Axios.
We go dotpost. We want to include a HTTP post request here. Process env. And this one's called evomi endpoint.
Okay.
Vom me and point like that payload.
Okay.
Right. And then comma and we want to open curly braces there.
And this is we've opened those curly braces because we want to pass in a header which will contain the relevant API key. And of course, API keys are sensitive information. Your API key is going to be different from mine. You've signed up to the Evomi services. Perhaps you've paid. So, you don't want someone else stealing that key and and using up your Evoi credits. So, this is why I've I'm storing those values in thev file here. So, we're securely storing those values in thev file. And then you must include X API. This is the scraper.
This is your scraper key, your API key.
And then we are including the actual value in the HTTP header called x-api- key. And we're reading the x-api- key value from process.env. We're reading it from the environment variable file dot and this one's called API_key.
And that enables you to use Evomi's scraper service. Okay. And there seems to be a problem and that's because I've included a comma and it should be a colon like that. And that looks better.
And now that response is going to contain the raw HTML data.
So we don't want to return the raw HTML data. We want to return the pedl data.
So we're going to call our par tyobi HTML method that we just created here that uses Cheerio to pass the HTML and create a structure structured array from that raw HTML returned from the Evomi service. So we're going to pass response data because Evomi is returning the raw HTML in this data property here. And we're sending that into our custom pausing method here that leverages Cheerio to take that raw HTML which will be from here and then we're using the relevant code to extract the values we want and put them into a structured array and we're returning that structured array and that's being returned here. So fetch tyobe rankings this asynchronous function returns the structured array of so 20 rows and within those 20 rows each row will contain for each language the its ranking its language name and its image path the image logo for the relevant programming language.
Great. And last thing we need to do is export this because it's in a separate module now. So module.exports equals to and fetch rankings. So we export this method fetch rankings like that.
So exports the fetching functionality so it can be used by our controller or our controllers but we are only going to use this from one particular controller and we're going to create that in the next section.
Right. So now all we've got to do before we can test our server side code is create our controller. So we got our service, we've got our model ready there and now we can create our controller. So let's create a folder called control.
Whoops. We don't want it there. We want it within the info data capture directory. So let's right click that and go new folder. And let's call this controllers. And within controllers, let's create a file called tyobi controller.js.
Tyobi. And we're using camel case for this controller. File name.js like that. And now let's create the relevant controller.
Okay. So firstly we want to import the mongus model. And to do that we write this line of code. Blangage.
In fact, that should be camel case language ranking equals to require open brackets and the relative path is dot dot slash models and then the language.
So this file which contains our model schema here.
So we're importing that. You see we've exported it here and we're now importing that there.
Okay.
Okay. Comma. It's got a red squiggly.
Okay. So, that needs to be in lower case like that.
Perfect. Okay. And then we want to import the Tyobi service where all the heavy lifting is done.
And we can do that with this line of code require.
And we need to include the relative path to the service that we just created. So services is where our Tyobi service.js file resides which contains our service code. So Tyobi service like that. So that imports the service responsible for scraping data from the Tyobi website and paring that data into a structured array. And now the main method for our controller we're going to call get tybee rankings. And we can just export it at the same time as we create it. And to do that we use the export object there. and we go get tio rankings. That's what we're calling it.
And let's use an arrow function for our method here.
Wreck. These are parameters. Whoops, we don't want that there.
It's going to be wreck. And then rest.
So that's representing HTTP request and this is the HTTP response. And we create our arrow function curly braces. And we can now include our the code for our arrow function. Let's include this code within a try catch block like that catch error.
Okay.
And we'll include the relevant code for the error block in just a bit. But let's first include the code for the try the main code. Okay. Const cacheed rankings equals to language ranking dotf find and then we want it sorted also.
Okay. So we're querying the MongoDB database here and we want to sort by ranking in ascending order. So that's what that means there by including this object here. We want to sort from 1 to 20. We want to sort our data from 1 to 20. Okay. So we're first checking whether the tyobi index data has been scraped and exists within the relevant MongoDB collection language ranking and we're doing that through mongus like that.
Okay.
So queries the database for all rankings ordered numerically from one upwards. This is what this code does here.
So if cached rankings length is greater than zero. So we've found the data. That's what that means. that the data has been scraped and now exists within the relevant collection in our MongoDB database. We want to return a response a HTTP response automatically in JSON format from the database. So let's mark this source as database meaning the data resides in the database and this particular data has not been scraped during this particular request.
The data has actually been retrieved from the database which means that in a prior request the data was scraped from the website and saved to our MongoDB database. So we're actually returning the cached data and not freshly scraped data cached rankings in this particular scenario.
Okay. But if the data does not exist, so if the data exists, it's going to return it immediately, which means the function basically ends at this point because we're returning the relevant JSON data to the calling client, which will be our React component, which we will write later on.
So now at this point the data has not been found in the database. So we need to scrape the data from the relevant Tyobi index web page. So we go scrape rankings equals to a wait and we want to kick off the Evomi service now. And we wrote we wrote the service earlier on in the previous section. And all we need to do now is just call a method and it will handle that for us because we've encapsulated the relevant functionality that calls the evomi scraping service within this fetch tobi rankings method here. So we're calling that method. So if scraped rankings length is greater than zero. So we've successfully scraped the Toby index here for this condition. That's what we're checking for. So if we've successfully inscripted that data and we have data now available structured array, we want to save that data to the relevant collection within our MongoDB database scraped rankings.
So that's retrieving the scraped data in a structured array and then we're inserting that into the relevant collection in our MongoDB database. Which means the next time this method is called there will there should be data in the MongoDB database and it will return that the relevant JSON data containing the array of ranked data ranked programming languages to the calling client. That's what this line of code is here. And then no other code is called subsequently. But if that data is not found within the database, it means we need to scrape the Tyobi index this web page for the relevant data.
And that's what this code is here. And then when we get to this point, we can just simply call res denoting the response, the HTTP response to the calling client. So res.json, JSON. We want to return the JSON data and we're going to return information to the client that this is freshly scraped data. Tyobi scraper and not from the database.
Okay.
Great. And let's include the error handling code here. And we're just going to console error. So we're going to write out the error to the console window, the terminal window.
Tyobi control error.
Okay. Comma. Whoops. Comma. Error. dot message like that.
Okay.
And then we want to return a HTTP status code of internal error which is denoted by the numeric value of 500 status 500.json JSON. We return this JSON with the details of the error. And this error 500 is an internal server error.
And we can just return the text internal server error to the client like that. And that is our controller written.
So we are now ready to test our serverside functionality for scraping and capturing the data from the Tyobi index. Excellent.
Right. So we are allegedly ready to test our code. But before we do that, I have introduced a few bugs I think.
So um if we go to language.js JS here.
Just make sure that for a start this models this language.js file within the models folder. Make sure that that L is not capitalized. So all in lowerase here. And then when we're exporting the model, I want that to be in a lowerase there. That language the L in language there. That should be lowerase just to just making sure that everything is consistent. And if we go to the Tyobi controller where we're importing it, this should be in lowerase language.
That L should be in lowerase. And this should this language part of this language ranking const name. This should also be in lowerase. And make sure that this is all in low the L for language ranking is in lowerase for all these references here.
And that should be pretty pretty good.
And I've noticed here that this should have an await here because this function is an async function. So the await keyword should be present within this line of code here where we are looking for the data within the actual MongoDB database.
We're looking for the collection and making sure that it actually has data.
And if it does have data, we're returning the data from the database.
But if it doesn't, we're scraping the Tyobi index for its data. We're scraping this table here for its 20 ranked programming languages. For the data pertaining to those 20 ranked programming languages. Great. And that all looks good and ready to test. And then I just want to check one more thing here. If we go to the Tyobi service here, this is incorrect.
Um, slightly incorrect here. So for this line of code here, this should be EQ, not get. Okay, so TDS EQ and that the rest of that line looks correct. And then here, this should not have get. It's just EQ for referencing the relevant column.
So TDs as all the TDs TD elements within the row EQ we're pointing to the fourth column and remember that it's the array is zerobased. The TDs are zerobased. So that' be the fifth column because zero counts as a column within the row. And then the last one this should be EQ. So not get find image. And that should be good. Okay. And make sure you save your changes.
Um, and I think we're ready to test it.
And I'm going to test this code through Postman. So I've got that all loaded up.
Um, so it's going to lazy create, if you like, the the database. You can see the database doesn't exist. We don't need to actually manually create the database.
Um, info data there.
data. Okay, I'm actually going to delete that just so there's no confusion.
Data drop database. But this is a way you can test it. If you have the info data database within your instance of MongoDB, if you have that if that info data database exists, you can delete it because it's going to lazy load or lazy create, if you like, that that database for us because of the way we've structured the code. So, it's going to create that for us. So, we don't have to manually create it. Okay. Um, and I think we're ready to test this.
So, sec server, that all looks good. So, we need to place a get request to this path here, and it should do everything we want it to do. Okay. So, we're going to test this through Postman. So, I'm going to go to Postman here. Send an API request.
Let's do that. And we want to send the get request to API/Tobi here. And you can see we're running our serverside code on localhost port 3000.
And now we can send that request to our code. But we first need to run our code.
So we do that by making sure we're in the root directory of our serverside code. Info data capture there. and we are. So we can just type node and then server the entry point for our application is here. We want it to listen on port 3000 on our local host.
So we go node server.js.
Press the enter key. It's given us some status server running on port 3000 connected to MongoDB.
And now we shouldn't have any data. We don't even have a database existing in MongoDB at the moment. Our info data database. So when we run our code, it should scrape the data freshly from this web page and insert that data into our database. So let's see if that does what we expect. So let's send that across.
Brilliant. So the source is Tyobi scraper. If we look at our code, you can see it scraped all the relevant data.
We've got in first place Python. We've got its image path. We could actually just copy that image path and paste it into our browser window and it should display that icon, the relevant icon.
There it is. C. Excellent. So that verifies that the data is accurate.
Um, so we got C, Java, JavaScript, Visual Basic, and it scraped the relevant data. But this source is what I want to point out. First of all, a Tyobi scraper, which means we've scraped the Tyobi index. We haven't retrieved the data from the database with our get request here, it's freshly scraped the Tyobi index and saved it to our MongoDB database. And if we look at MongoDB through compass here, which gives us a graphical user interface. So firstly, I'm just going to reload the data. And you can see info data has now appeared there. And if we look within info data, you can see language rankings. This collection has appeared has been created. And here it has captured the relevant data. So it's captured all that data from the Tyobi index. So that has worked correctly.
Excellent. And next we'll create the react code to display the data on our front end.
So but what we can do now is we can run the we can test that again. So if we go to Postman and we run that again, you'll see that this Tyobi scraper value should change to what did we change? What should it change to? Let's go to Tyobi controller.
There you go. So Tyobi scraper. It should change. That source value should change to database there. So let's go back to Postman and let's run that. So this source should change to database.
Excellent. So it's retrieving the data from our database. It's subsequently retrieving the data from our database because we've cached that scraped data that scraped Tyobi index data within our MongoDB database within a database called info data that we created through our code.
And so you can see here if we go to compass it's created a collection called the language rankings and all those values and all this data that we've scraped now exists within the database. So if we now deleted this database so info data we delete the database drop the database it doesn't exist anymore it will scrape the tyobi index again.
So if we go here, you'll see that this source value will should change to Tyobi scraper. And let's see if that works.
Excellent. Tyobi scraper. So now this data was returned from a freshly scraped Tyobi index. So this data is returned from Evomi scraping the Tyobi index for us. And it's returned the relevant JSON data. And we've now saved it within the database. So next time we run it, sources database because it's retrieving it from the cached data that's been cached within our MongoDB database.
And if we refresh this now or reload data info data appears and you can see the relevant collection now contains the Tyobi index data. Excellent. So that's all working perfectly and we can now move on to creating the React code that will display that data in an aesthetically pleasing manner to the user. Okay, so now we've created the server and we've tested it through Postman and pretty satisfied with the results. We're going to now create the client. So the React client code. So let's start off by creating a root folder for our client.
So, let's just rightclick here and go new folder client. Excellent. And let's CD into that folder client there. And now let's create our React project. And we're going to use VIT for this purpose. So to create a React project using VIT, we type npm like that. Create vit at latest like that. Press the enter key.
Okay, perfect. And we want the project name to be info data cap. No, not capture. That's the server side root folder. We want this to be info data display. Press the enter key. Press the enter key again. And let's select React here. And we'll just use JavaScript.
Press the enter key. Okay. Install with npm and start now. Yes. And off it goes and installs our React and creates our React project with all the relevant dependencies.
Great. Looking good. takes a bit of time, but as always with installations and deployments and that sort of thing, one needs to exercise patience.
Okay, we'll get there. We're looking good. Done. Let's just test that all the defaults have been created for us. And we can do that by hitting the control key and then clicking the link. And we can see whether our infrastructure has been created successfully for us. And here's the default page loaded for us.
Interactivity. Good. Excellent. And we're good to go. Great. So, let's control C out of this. Clear the screen.
And now we can get going. And you can see here it's created the info data display folder for us. We got our node modules. So, we've got all our dependencies created. We've got our package.json JSON file and we're all good to get started with this src directory has been created for us. And the first thing I want to do is just initialize all the styling CSS code. And by initialize I mean get rid of it. So I'm just going to select all that from the app.css file and delete. Save that because I'm actually going to use Bootstrap, the latest version of Bootstrap for styling and layout purposes. And we also want to get rid of all the styling code within index.css. So let's select all that, delete, save, and we're ready now to get started in creating the code for our React application.
So I want to firstly create a component called tyobi which will be used for displaying the tyobi index. So we basically want to display something similar to this on our own web page. But we're only scraping a few of these fields and we'll display those fields on our own web page. And this is why we're creating this Tyobi component for the purposes of displaying the columns that we've scraped. the column values that we've scraped in a aesthetically pleasing way, not dissimilar to this on our own web page. So let's start with that. So firstly, we want to create a new folder within src cuz we're creating a new react component. And let's name this folder tyobi like this. Press the enter key.
And then within tyobi, let's create a new file. And let's call this one tyobi.
jsx with the t capitalize like this.
Excellent. So, tyobi.jsx.
And we're now ready to create the code for our Tyobi component. So, let's do that.
Okay. So, the first line we import from React. We're importing use state.
Oops, that shouldn't have a capital.
We're importing the use state and the use effect hook from React.
Okay, so the reason we've got underlines here is because we haven't yet used the two hooks. So that's nothing to worry about. And just for neatness, I'm just going to put spaces there. So we're importing the hooks for managing local state and side effects in the component here. Use state for uh storing state and tracking state and use effect for managing the side effects.
Okay.
So let's press the enter key. The next line of code we want to import Axios but we need to install we've installed Axios on the server but we haven't done it for the client. So we need to install Axios.
So to do that, let's first CD into info display data info, sorry, info data display CD into that. And then we want to install Axios. So npm install Axios like that.
And of course, we're going to use Axios to handle our HTTP requests and HTTP responses.
Okay, so we've installed Axios. Let's just clear the screen for good measure.
And let's import Axios. Axios from Axios like this.
Okay, so it imports Axios to handle the asynchronous HTTP request to our back end. That's basically what we're going to use Axios for.
Okay. And let's create the actual component code using an arrow function. So const tyobi like this equals to empty braces.
No parameters are passed in to this arrow function. Create the arrow and then within curly braces we will create the functionality for our tyobi component. So this is basically defining the functional component for displaying the Tyobi rankings. The first thing we want to do is go const languages then set languages equals to use state. So we're now using our use state hook like this. And we're going to pass in a default value of an empty array like that.
Okay.
Excellent. You can see the red squiggle has gone away under use state because we're now using the use state hook. So it initializes a state variable to store the list of languages defaulting to an empty array. So that's what this line of code does.
Perfect. And then we're going to use the use state hook again. And this is just for a loading indicator while it's loading. It might be scraping for a long time. And we don't want the user presented with a blank screen. We want some sort of spinner displayed to the user or just a text saying loading.
Something that tells the user that nothing's gone wrong. We're just waiting for the data. So, we're going to use a const loading indicator for that.
Set loading equals to use state the use state hook.
And then let's default this to true. So we want the page by default to display the loading indicator.
So we initialize a boolean state to track if the data is still being fetched. Basically that's what this line of code does. Okay. Excellent. And now let's use a use effect hook to manage the side effects of the component. And this will run the first time the component loads. And you'll see how we make sure of that right now. So within use effect, we want to also make sure as stated that it only loads once that sorry this code only runs once when the component runs. So we need to pass in in the next for the next argument to the use effect hook, we need to pass in empty square braces indicating that we're passing in an empty array which indicates that this must only run once when the component loads. Okay, so we've got our use effect hook in place. Now we want to include a function within our use effect hook. So go const.
We're going to create an arrow function that's asynchronous.
So fetch tyobi data equals to async like that.
And then open the empty braces and create the arrow function syntax.
And we can create we can fill out the logic now for this fetch Tyobi data. And it's self-explanatory what this arrow function does. It fetches the Tyobi data either from the database or from freshly scraped data scraped off the Tyobi index. Okay. So response equals to a weight like this. Then we're using Axios. You'll see Axios the red line's now disappeared because we're using it here. get we're placing a get request to either kick off a scrape or fetch the data from the database if the data has already been captured. So, HTTP and this is the path to our local globally hosted serverside node code our web API functionality.
So API and then tyobi like this. This is the end point that we want to call to either get the data from the database or kick off scraper functionality through Evomi's services through Evomi's web scraping services that will scrape the data save the data to the database and return that freshly scraped data to our client react component. Okay, so we've done that con response there and then we want to log the result just for debug purposes. So console.log like this.
Okay.
And we can just go response data.
Okay. And comma response data like that.
So that outputs the raw API response to the console just for debugging purposes.
Okay. Excellent.
And this code here const actual array equals to need to check that it is an array is array like that response data.
So we're just making sure that the data that comes back to us is as expected an array. We want an array returned to us.
So if it is an array, response data is an array uses the response directly if it is an array.
Um, else if response do data dot data or we're checking to see if the array could be stored within this property here. So the data is structured like this rather than this in this particular case. or we can use the or operator and check maybe the response is in the languages property like this or else we want to if it's a none of these properties we want to return an empty array like that.
So and with our code the array will be returned in response data. So this code here will fire that will be returned to actual array there.
Okay, perfect.
And then we set languages. We're using the set languages function here that we've defined with for our use state hook here. We're using set languages to change the state of languages because the languages variable will store the response returned from the HTTP request up here, the get request. So we go actual array like that. And that will also rerender our component which is what we want because the data has changed. So we want to reflect that on the front end to the user. Okay. So updates the languages state with the processed array to trigger a rerender. That is basically what that line of code is doing there. Perfect.
Okay. And we want to include a try catch in our code here so that if an error occurs during the retrieval process of our data that we can account for that.
So I'm just going to create the basic code for that here like this. So we've got a basic try catch block. So let's cut that and paste that there like that. Okay. Let's handle the catch part here. And all we want to do is console.log the error. So console error like that.
and then fetch error colon and then comma error like that there. And then all we want to do with the finally block within the finally block we want to set loading to false like that.
Okay. And then so we only want that loading indicator to display when the data is being retrieved. And once the data has been successfully retrieved, we want the loading indicator to disappear. So that's why we're setting loading to false here. And then we want to actually call we still haven't actually called this function anywhere. We've just declared it and created it. Okay. And we call the function. We call fetch tyobi.
over here.
Fetch Tyobi data like that. Um, don't worry about these for now because we're not finished the entire component yet. But for now, just make sure that you have no structural problems. We've written quite a few lines of code there. Just make sure for example here, use effect got this curly brace open there. And you can see that it closes there. And then comma and then the square the empty square braces indicating that we want this code to fire just once when the component loads.
That's what this use effect hook is used for here. So just make sure that your structure is correct here before continuing. So let's continue here. So we've created our use effect hook. Okay. And I just want to tab this across for neatness. So now we want to handle the loading of our component. So this corresponds to whether the data has been fetched or not. So if the data is still loading, we want a loading indicator to display. And this what this code is all about. So open square braces. We're checking the loading variable.
So if it's true, return a loading indicator.
So while the data is being fetched, we want a loading indicator to be displayed to the user. And this is for good user experience. So the user doesn't jump to the conclusion that something's gone wrong. Okay. So class name. Now we're creating Bootstrap code here. And we haven't actually installed Bootstrap. So that's the next thing we're going to do.
So, if we go to terminal here within info data display, we want to install Bootstrap. While we're here, we're going to install React Rout because that's what we will be working on after we finish this component. So, let's install Bootstrap and React Router DOM because we're going to be handling the roots using React the React router DOM package. Okay. Okay. So to install Bootstrap and React Roots DOM, we type npm and then bootstrap like that. No, npm install. Sorry. Install bootstrap and then react.
Oops. Rout like that. Okay, let's press the enter key and it should install those packages for us now. Excellent. So that's been done.
We've got Bootstrap installed. So that means we can continue with creating our Tobi component using Bootstrap for styling and layout purposes. So let's clear the screen. And there's just a few more things we need to do to ensure that Bootstrap can be used. So the first thing we need to do is go into main.jsx and include an import here, an import statement.
So we need to import we need to include this line of code here for styling and layout purposes. And obviously one of the key advantages of using Bootstrap is screen responsiveness.
And so on a mobile phone we are going to be implementing a a basic menu. And on a mobile phone we'll have a hamburger button that they can press to drop down the menu. And for that to work, we actually need some JavaScript imported.
And that's what this line of code here is all about. Great. So, we need to include these two import statements here within the main.jsx file for Bootstrap to work correctly. And now we can continue with our loading functionality.
Okay. So, class name, and we're going to be using a whole bunch of Bootstrap classes here. So, d Whoops.
Deflex just def justify justify content center align items center like that. And let's include some inline styling.
And then I'll explain what we're doing briefly once we've created this code. So we're including within these curly brackets, we're including an object with a property of height and that height is 100% of the view port height. So 100 VH like that. Okay. Excellent.
So what have we done there? We've basically we're centering the loading spinner both vertically and horizontally. This is what this bootstrap code here does with the help of this inline style here. Height equals 100% of the viewport. Okay, great. And then let's include a child div within this div element div like that. And now this code will actually create the spinner. So let's go class name equals to spinner border like that. Text primary like that. Okay. And then let's include um style equals to open curly braces. And we want to include an object. We can define the width and the height of the spinner. So the width will be three rim like that and the height will be also three rim like that. Okay.
Excellent. So this line of code renders a spinner bootstrap circle for visual feedback. That's basically what this line of code is doing here. And then within this div, we want to include a span element. Okay. Visually hidden loading.
And we should put roll equals status up here too.
Like that. Okay.
Right.
colon there. Return. So we that there loading. Okay. So that's just the first return when the component first loads. So the user isn't presented with just a blank screen. The user is presented with a spinner, a loading indicator. So this will just tell the user that nothing's gone wrong. We're just waiting for data to be returned. And then that data, of course, will be displayed once it is returned successfully.
Okay. So now when the data is returned, this is what we want to return in our code. So it returns the main JSX structure once the data has finished loading. This is what this code is all about. Okay. So div like that.
Let's go class name class name equals to container fluid and empty margin top 4.
This is just layout code. Bootstrap layout code here. And let's create a header using H2 class name equals to MB-4 text center like that. Okay. And the text for our header is programming language rankings. So it displays a centered heading for the ranking table.
That's what this does here.
Great.
Okay.
And let's create another div here.
div right. Okay. Class name equals to table responsive like that.
Okay.
And then we want to include a table element now like this.
Okay. And let's include the relevant Bootstrap classes. Class name equals to table. Whoops.
T table hover. We want a hover effect included.
Table strapped. We want strapping included for our table.
And we want to align it in the middle like that.
Okay. So, we're applying Bootstrap styles for row highlighting straps and vertical alignment. And we want the table to be horizontally. So, table responsiveness. We want it to be horizontally scrollable on smaller devices. That's what we're doing there.
Right. So let's include the table headers here. So T head, right? T head like that.
Okay. And let's include the first heading for first column th like that.
Scope equals to call style. Whoops. Yep. Style equals to open curly braces include an object for the inline style here. And we just want the width to be responsive and we want it to be 10%. So we set that to 10% like that.
Okay.
Okay, sorry, that shouldn't be equals. I was wondering what was going on there with the squiggly lines. That should be a colon. Okay, so we got that there. And then let's give this the heading of rank. Like that rank.
Let's just duplicate this.
Okay, we want this to be 15%.
And this should have the heading of icon. This is going to be the heading for the programming logo icon.
Okay. And let's paste that again in there. And just modify the bits we want to modify. And this one's a bit easier.
We just include programming language for the pro the the column furthest to the right and this will be for the language name.
This is the header for the language name there.
Okay. Excellent.
Okay. So, we're getting there.
As I say, if you get stuck, there is a lot of code here. Just check out the relevant GitHub web page. The link to the GitHub repository has been made available in the description of this video. Okay, great.
So, we've got the header for our table there. Now, we want to include the body for our table.
So, we include a Tbody tag here. Let's check if languages is not null. Then we want to go map and we want to loop through the items that have been returned.
Okay, so we want to loop through each item that has been returned. The index is just a count of how many items as we traverse what's in the languages array.
the index increments by one. So that's what the index is. And then we can use the arrow function syntax here and create the code that we want to run as our code traverses the items in the languages array that's been returned from from our HTTP call to our serverside code to the tyobi code. So now let's render out each row in our table. Okay.
So let's create a row here. TR tr Okay, great.
And each row should have a unique key identifier.
This is for React efficiency.
For the benefit of React efficiency, each row should have a unique identifier. And we'll just use the index for this.
Okay.
And then let's create our first TD.
Okay.
And this column will contain our first value which will be item dot ranking.
So it'll be the rank from 1 to 20. So for example, Python has a ranking of one. C has a ranking of two. So that's what this value will be here. And we want this to be in bold. So let's go class name equals 2. And then let's include the relevant bootstrap class.
And we just go FW bold like that. Okay. So it displays the rank number in a bold font.
Great. And then we want next we want the image the which will be the icon logo representing the language. So type TD.
Okay. And within this TD we want an image tag.
Okay. And the source will be the path value returned from our HTTP call. So we go item item dot image path so that it displays the relevant logo for the language in this column.
So and we can actually set the alt to the programming language name. So be item item.p lang like that.
Okay. Excellent.
And we want to include some styling within our image tag. Just going to enter there. And so style, we'll include some inline styling here. And within the curly braces, let's pass an object with width as a property.
Colon. And this will be 40 px 40 pixels.
Let's make the height 40 pixels also.
And we want object fit set to contain so it displays the entire image within the relevant table column like that. Okay.
So it renders the language icon with a constrained size and proper aspect ratio object fit. Excellent.
Okay.
And then the last one is the actual programming. The last column is the programming language name. So TD.
Okay. So TD and within curly braces we want to render out the language name item dot P lang like that. So it displays the text name for the programming language here.
Great. Tbody body tent.
And then we want to make sure that we are exporting our our component so that it can be included within the app.jsx file which we'll do in just a bit. So let's export that.
Export default tyobi like that. and we include a co semicolon there. And look at that. We've created our Tyobi component and we're ready to move on to the next the next component which will be app.jsx where we'll create the routting. And at the moment we only have one component.
So we're just going to create the routting for the Tyobi component. So let's go to app.js. JSX here and let's remove all of this code.
So I'm just going to rightclick uh select all and delete all this code and we're going to start from scratch.
We're going to create the app.jsx code now. So let's firstly import react from react.
So it imports the core react library to create components and manage UI state.
And let's import browser routter.
Remember we installed react routes all these components root root singular and link and we're importing and we're importing this from react router uh DOM like that.
Okay, excellent. So that imports rooting tools to enable multi-page navigation without reloading the browser. We're only going to include the Tyobi component for this routting functionality for now, but we will be including a route for an Amazon component that we'll create a little bit later. Okay, so we need to import import Tyobi. Remember we just exported Tyobi from our component code for our Tyobi index functionality which just displays the Tyobi index on the front end. So import tyobi slashtyobi like that. So that's Tyobi there for slash Tyobi. We're exporting component Tyobi.
Okay, so we're importing that. So it imports the Tyobi ranking component to be displayed on its specific route.
That's what that code does. And now we're going to create a function for the app component. So I'm just going to type function app.
The code that we're going to write now will contain our routting information, a menu to each of the roots, and just a basic, very basic homepage. Okay, so we want to return this here open round braces and we're going to we're going to create the markup that will be returned but we need to wrap it within this routter element.
So routter like that.
Okay, let's include a div element there.
include some bootstrap classes equals to minh 100. So these are just uh layout and styling related bootstrap classes that we need to include to make our page look aesthetically pleasing to the user.
Column like that. So it creates a full height flexible container that stacks its children vertically and that's what these Bootstrap classes here do. Okay. And then we want to create a navigation bar. So we're going to use the nav element for that.
So nav oops nav nav like that.
Let's include the relevant class names.
Okay. So class name camel case class name equals to navbar oops nav bar nav bar hyphen expand hyphen LG nav bar do dark bg dark do dark and these are just Bootstrap classes that are used to create the aesthetic we want for our web page. So in this particular case these Bootstrap classes define a dark themed Bootstrap navigation bar with a subtle shadow. That's all all that's what these Bootstrap classes do. Okay, let's create another div element there. And we just want this to be have a have a container class. So bootstrap class called container here for this element and centers the navigation content and provides responsive padding is what this container bootstrap class does. Okay. And then let's create a link element. And this is from, you can see that that's been imported from React Route to DOM there.
And this will link, this will basically just be a link to the homepage. Okay.
Class name equals to let's just include nav bar brand. So we want the brand here, whatever that is. And we want it in bold. FW bold. Oops. Bold like that.
And then we want the attribute of the link element to point to the homepage which is just a forward slash like that.
So when you click info data here which is the name of our application it will navigate the user to the homepage. That's basically all that is.
So it creates a clickable logo that returns the user to the homepage.
And for responsiveness we're going to include a button toggle here.
which will define a hamburger menu button for mobile devices to toggle the navbar link. So, I like to make all of my UIs responsive and you should also think about responsiveness when creating your your UIs. Very important in modern application development. So, right. So, let's go button there.
Okay.
And let's include the attributes. Class class name equals to navbar toggler just a bootstrap class.
And the type of this button is button.
Okay.
and data BS toggle equals to collapse.
Okay.
area controls.
It's just for accessibility equals to navbar nav like that.
Area x expanded equals to false.
area label equals to toggle navigation.
So this is just for accessibility.
Okay.
And then within the button we create a span tag.
Whoops. Sorry. outside of the button element. I don't know why I put it there.
So, within the button element, we want to include a span tag class name equals 2. And let's include nav bar toggler dash icon.
So it displays the standard threeline icon for the mobile navigation toggle. That's basically all that does there. And we're using make using Bootstrap makes this pretty easy to achieve this responsiveness on on the UI. So in small devices, this hamburger icon will display and you can toggle whatever menu you define open and closed using the navbar toggler provided to us through bootstrap. Okay.
So, and the next thing we want to do is so after button let's include another div and this will be the actual menu that we want to be able to open and close. Okay.
So, we'll create an unordered list to contain our menus. But let's first create this parent div tag. So class name equals to collapse navbar hyphen collapse like that and then exclude ID equals to navbar nav. So this wraps the navigation links so they can be hidden or shown on smaller screens.
Okay.
And let's include the unordered list. So that's defined by ul here. And we want to include a class, a bootstrap class equals to navbar nav ms auto. So, it creates a list of navigation items aligned to the right side of the bar. And that's what this code is doing here for the unordered list.
Great. And then let's include an LI element here.
Okay. Here.
And we include another link tag here.
Link class name equals to nav hyphen link like that. So we this defines an individual list item within the navigation menu like that. Let's close off our link element there.
And this one is going to be the ty index.
Okay, tyobi index right here. So that's a link to our tyobi component which will display the tyobi index on it. And we need to set this two attribute to the path of our tyobi component which will just be tyobi slashtyobi like that with the t capitalized there. Okay. Um so below nav now we want to include the main tag main like that.
Okay.
So within main we create a div element class name equals to container and text center.
So this div element centers the nested root content horizontally within a responsive container. Okay. So we also need to include a whole bunch of bootstrap classes within the main element. So we want flex grow-ash one. We want d flex. We want align align items items center.
We want justify content center like that. And we want BG hyphen light like that.
Okay. And that creates a centered light colored main area that fills the remaining vertical space. That's what that does there. Okay. Sorry. That's what these classes here do here. And then this these bootstrap classes.
Center the nested root content horizontally within a responsive container. And now let's create our roots. So let's go use the roots tag there. Roots. We're using react routed DOM here.
Like that. Okay.
I haven't mastered the art of uh talking while typing yet, but I'm getting better. Okay. So, and then we want a root element.
Hey, it worked that time. Cool. So, we got a root element. Let's set its path to the homepage equals to forward slash.
Okay. And we need to set the element now within the root. So we do that by in fact this is a self-closing tag. So we don't want this closed off route there. Closes like that.
Self-closing. Or we can just set that there like that and go element equals to open curly braces.
Okay. And within here we want a H1 tag H1 element.
Okay. H1 um this will be the home a link to the home page info data.
Okay, like that. And we want to include a whole bunch of Bootstrap classes. And I'm actually sick of writing these out, so I'm going to just copy them in there.
Okay, like that. So, we got display one FW black text uppercase tracking tighter. And that renders a large stylized brand heading for the landing page. And that's what that those bootstrap classes are for. Okay, excellent. I think we're good there.
Let's just neaten the code up a little bit. Okay, great. So, that's for the homepage. And now we want to create a root for the Tyobi index root. Okay. And this is self-closing.
Get rid of that. Okay. Root. And then path equals to and this will be the path to our tyobi component. So it's tyobi like that forward slash tyo b element equals to open curly braces.
element equals to open curly braces.
and then include self-closing tag tyobi like that.
Excellent. You can see we've imported the tyobi component up here and that's what we are mapping this path to this component here which we created there. Okay. And then we want one for Amazon 2.
But problem is we don't yet have an Amazon component. So I'm actually one Oops. I'm actually just going to make a duplicate of that there.
Like that for now. So just two links there. And we'll change this to the Amazon link just a little bit later when we get to the around to the Amazon code.
So root acts as a container that switches between different components based on the current URL path.
So this here renders a large stylized brand heading for the landing page.
And then this code here maps the specific Tyobi URL path to the Tyobi component there. Um, and we don't actually need this. I'm just going to remove that. But that will be for the Amazon route once we've created the code for that. Okay. And then we need to export the app component. I'm just going to include a semicolon here.
And we can type export default app like that. Okay.
And we're good.
I think that looks good. And we could actually run that now and test our code.
Should we try that? I think we should.
Okay, let's just open another terminal window here. So, I'm just going to navigate into server and then cd info data cap like that. And we're going to run the server side code node server server.js like that.
Okay, great. It's connected to MongoDB and we're good to go. Let's run the client now. So to run our client, we created our client using vit. So to run our client, our react component, we type npm and rundev like that. And we can see what happens here now. And now what happens if we when we go to localhost Okay, there we go. Okay, so this doesn't look all that great obviously, but we'll sort that out a little bit later. So, this is essentially our front end. And let's see what happens when we click Tyobi index. Great. There we go. Okay, so it's actually retrieved those values from the database.
Um, and that is not looking great. I think we're missing a few items. just some Bootstrap classes that should be there. Um, but that's the that's basically working.
What that's done is it's called our serverside endpoint to retrieve the Tyobi index.
So, so when we click Tyobi index, it's retrieving that data from our MongoDB database table at the moment here.
um or our collection rather our language rankings collection from our infodata MongoDB database. So we've got the basic functionality in place. So one thing I'd like to quickly test is kicking off the scraping process and the way we can do that is by deleting the database. So let's just delete that database through compass.
So I'm going to drop data, drop the database, and we're going to try and run that again. So this time when we click Tyobi index, it should actually trigger our scraping functionality and save that data to the database. Save the Toby index data to the database. So this should take a little bit longer. So we should see a loading indicator this time. Loading, right?
Okay.
Okay, so something went wrong there, I think.
Okay, so we've got a few uh bugs to contend with. So what was happening there is our request to scrape the TIB index was actually timing out. It times out after 30 seconds, but the background the processing will still continue in the background. And I'll show you how you can actually pull requests to uh Evomi um to see if the results are ready. Even what if the first uh scraping attempt fails and you time out after 30 seconds, I believe tries in the background for about 10 minutes. If scraping is not successful after 10 minutes, I believe then an error is thrown. So I'll show you how you can actually pull a request to a particular Evomi endpoint so that perhaps in subsequent scraping attempts success will be achieved and the relevant Tyobi index data can then be sent back to our server side and then back to our client. But firstly let's try scraping the Tyobi index again because this is actually a day on from the last time I tried to scrape Tyobi.
So we might have been taken off the naughty step. I think we were put on the naughty step because we were scrape trying to scrape the Toby index in short uh succession. So the their security mechanisms detected that we're a bot basically. So their security mechanisms have probably blocked the relevant IP. Okay. But let's try again. Let's see if we can now that we've given it a bit of a time period between scrapings. Let's see if we have success scraping the Toby index. So I'm going to try that all again. The first thing we want to do to force the scraping to occur, let's just delete the info data database. So let's do that data. Let's drop database. Okay. And let's okay clear. Let's firstly cd into the server info data cap chip and let's run it.
Okay, injected env from that's good.
Connected to MongoDB. Excellent. server running on port 3000. Let's create a new terminal window here and let's CD into the client root directory. Client info data display, right? And we're in the root directory of all the client code. And let's run our client. So, npm run oops rundev. Okay. And off it goes.
Brilliant. So, it's listening on localhost 5173. I'm just going to launch that.
Well, I'm just going to run it in Chrome.
Let's run that and see if we can get it to scrape the Tybee index. So, to do that, we just simply press this menu option here, Toby index, and we need to sort out this loading indicator. and it's let us through straight away. So, we've been able to scrape the TY index without much issue at all. Um, but I'm going to just show you various ways you can get around the relevant security mechanisms if you do get blocked. So, one thing you can do is include within the header object here.
So, within Tyobi Services here, just paste that in here.
So you've got your additional headers here like referer. So this basically imitates that you're just a typical internet user using their browser to uh look at websites. So here for example you could have been on Google typed in Tyobi and it's taken you to the Tobi Tyobi index. Therefore the referer reflects that and says https www.google.com here. So you're mimicking somebody just surfing the web essentially. And you can see here user agent. So this is browser related metadata here.
So you could include those additional headers when you are posting to evomi. So you got the PO payload there and you got your headers here and you could include these additional headers within this headers object or you can actually include another object called addition additional headers like that and then additional headers you would include additional headers there. So it' be something like this. And then you would So you'd take that there and you'd create your additional headers like this.
And then you could include those additional headers within your request to Evomi. And when it tries to scrape the relevant target website, uh, the security mechanisms will see all these additional header settings and hopefully will fool them into thinking that you're not an automated process, that you're not a bot. The easiest solution, of course, is just giving some time between scrapes if you're using the scraper API to do so. But I'll show you in the next section when we scrape Amazon. I'm going to use the core residential plan to scrape Amazon. It's advisable to use that plan when scraping at scale so that you're not detected by the various security mechanisms to block you so they can't identify you as a bot, an automated process trying to scrape their the relevant web page. But we've actually had success there. So let's have a look MongoDB and let's refresh data or reload.
Okay. And now if we connect here, we've got info data and it has scraped the Tyobi index and captured that data within our database. So the next time, so if we go back to our browser, our Chrome browser here and we go to the homepage first and then we go to Tyobi index, it's going to be able to retrieve that data super fast. You'll see because the data is now cached within our MongoDB database. So let's press the Toby index menu option. And that is super fast. Okay. So I just want to check one more thing here.
Okay, great.
So it's capturing the entire URL here to the image path. There's one thing you might want to check. I noticed also that in certain situations it was just capturing the relative path here in the database. So you might want to check that this part of the URL exists and if it does not exist, you might want to add that in um because you need the absolute path.
Okay, but that looks good. Great. So we've got the Tyobi index. So we've successfully scraped the Tyobi index.
There's just a few issues that I don't like. I don't like the layout here. And when we click the Tyobi index, we just saw the loading dot dot dot text displayed and we actually coded a spinner which is going to look much better. So it's much better for the user experience. So we'll just fix a few cosmetic issues now and then we can move on to scraping Amazon.
Okay. So let's go to tyobi.jsx here.
And the first thing I noticed is I've introduced some silly mistakes here because class name should be in camel case. So we need to change everywhere that we see class name. We need to include it like this here. Class name.
I'm just going to do it manually because there's not very many instances. So just change class name where you see class name with the N not capitalized. Just change this to camelc case like this.
And that should fix the majority of our cosmetic issues.
Okay, great.
Okay, excellent.
And I think we've covered all the class name instances, references if you like.
Yeah, that looks good. And then we got a few issues to resolve here with an app.jsx. And we'll just go through those quickly. Firstly, let's just do the easy win here. So, RUA expanded that should be RA like that. Save that. We have a small issue here where we need to include data dashbs dash target to resolve this issue.
Equals. And then this should be hash navbar nav like that because we are targeting this div element here in order to expand and collapse the relevant menu. It's got ID navbar nav and we need to include this line of code here in order for that to work.
Hash obviously means we're targeting an ID and there's the ID. Okay, excellent.
So that's sorted there. And I'm going to delete the info data database so that it forces a scrape.
And hopefully we can see that our indicators changed from just displaying that loading dot dot dot text to an actual spinner since we've made the relevant changes.
Okay. So let's run the server. So node server whoops server server.js.
Enter. Great. That's looking good. Let's go to the client here and go npm rundev.
We're now running the client and let's load up the app in our browsers.
Okay.
Okay. So now let's see if we can scrape the ty index. So let's press the relevant menu option here. And there we go. We got our spinner working now. And that has worked perfectly and the layout looks much better.
So we've resolved most of the issues there and we're now successfully scraping the Tyobi index and all our cosmetic issues have now been resolved.
So that looks pretty good.
Great.
So, uh, when your if your scraping does time out, I think the best solution is to give it some time and then you'll be taken off the naughty step and you'll be able to, uh, continue scraping with success. If you scrape the Tyobi index in short succession, you'll no doubt be blocked eventually, and you'll certainly be blocked on Amazon. Amazon's a much tougher website to scrape. So, you will probably need to use core residential if you're going to the core residential plan. Evomi's core residential plan if you're going to scrape Amazon at scale. The Tyobi index is slightly easier to scrape. I've had a lot more success and I've been able to just use the scraper API plan for that.
Okay. Excellent.
Um, but there's a few things you can do.
As I mentioned earlier, you can include additional headers like this here within your post request to the relevant evomi endpoint. You can include those additional headers to try and mimic somebody surfing the net rather and hide the fact that this is actually an automated bot trying to scrape one of their web pages. So you might be able to get past their security mechanisms and fool them by using these additional headers which are typically included when a user is just surfing the the net using their browser. So you could include a method like this within your code because Iomi may still be scraping in the background even though it times out with your first attempt. Like here, here's the first attempt. You're checking the content, okay? And the content is empty. If the if it has been successful on your first attempt, it'll just return the results here.
But if you haven't been successful on your first attempt, what you can do is you can grab a task ID because that scraper will still be running in the background and will return to you. If it doesn't return the content, it will return this task ID. So you can set the task ID const to this value here. And then you can make a request to this endpoint and you can pull that request.
And here I'm pulling that request for 2 minutes, 120,000 milliseconds in this code here. and you may have success with subsequent requests to the relevant Evomi endpoint here. So just bear in mind that if the first attempt fails, you can actually pull the request to Evomi to this endpoint here. You'll have a task ID sent back to you if there's no content delivered after the first request. And you can use this endpoint and this task ID to make several requests. And you can see I'm making requests every two seconds here.
And then after 2 minutes, if it failed, then you throw an exception. Or else, if one of the requests succeeded, you return the content to the calling code like that.
So this is one other option you can use.
If you are unable to successfully scrape, for example, the Tyobi index and it's not mission critical at that time, the best way to do it is just to down tools, come back a bit later and try again, which is what essentially we've done. And and then I was able to successfully scrape the TY index again.
I was basically, as it were, taken off the naughty step and they let me in to scrape the web page. But if you're scraping it in short succession, you may indeed get blocked. Okay. So, hope that all makes sense. So, I'm just going to get rid of this code here. We're not going to pull requests because we are having success with scraping the Toby index.
And it's not mission critical. We can afford just to wait and then scrape the Tyobi index a little bit later and capture that data within our database.
cache that data within our database because now with subsequent calls to get the Tyobi index, it's going to get that data from our database and it won't scrape the Tyobi index every time. But if you had to scrape a a web page with every request, I would recommend using the core residential plan for that because every request will be made by a different proxy IP which will fool their security mechanisms and therefore you won't get blocked. you'll have more success that way. So, if you're scraping at scale, for example, a website like Amazon, I highly recommend that you use the core residential plan, Evomi's core residential plan. Okay, but we we've had success here. It's looking pretty good.
So now if we load our data, reload data, you can see info data is there and we've got language rankings our collection which is populated with the relevant Tyobi index data. So in subsequent requests it just grabs the relevant data from our collection here. But if you wanted to freshly scrape the Tyobi index, if you want freshly scraped data from the Tyobi index, our code is written in such a way that what you can do is just delete the info data database and it will create that info data database and the relevant collection and then scrape the Toby index. And when the relevant data is returned, our code saves that data to the language rankings collection. And then subsequent requests for that data will not result in scraping occurring, web scraping occurring. It will just grab that data from the cache here, which is our collection language rankings here within the relevant MongoDB database info data.
Great. Right now, we're going to scrape Amazon. Now, this is a real challenge, but we've got all the tools we need to get this job done. So, let's get started. So, this is just going to be creating the code. So, we're going to go through the code line by line and build up our scraping functionality. So, firstly, we want to be in the server code here. We want to go to server.js, JS the entry point for our node application and we want to create a new route here. So let's just duplicate that and then change the bits that we need to change. So we're going to create the relative path to the Amazon endpoint here.
So let's do that. And we can let's put the endpoint at API books because we're scraping book data in this part of the course. Amazon book data. So, Amazon products which happen to be books, textbooks, whatever books you want. Um, whatever the user searches for. And so, API books and then Amazon like that. And that is going to be our the path to our endpoint. Going to start with the server side code and then we'll progress to the client side code. So the next part of the code we want to build is the model for our book data. So we're going to call this model no prizes for guessing. We're going to call it book.
So let's rightclick models here and go new file and book.js.
Okay. Let's press the enter key. So we got book.js here. So our first line of code is to do with mongoose. const mongus mongus right equals to require. So we're importing mongus mongus here semicolon. So imports the mongoose library to create and manage the database schema. We're going to be using mongoose for that.
Excellent. Now let's create the actual model. So we're going to write const book schema equals to new mongoose dot schema.
Okay. Open round brackets and then open curly braces. So we're passing an object through to the schema method.
So let's build up our object which is our model basically. So which is a definition for our model. So let's include search term here colon and then let's define what we want this field to be. So type is string, index. So, it's an index because we're going to be searching the the relevant MongoDB collection, the books collection for book data based on search criteria that the user inputs. So, we want it to be an index because that will obviously speed up the search process. Lower case. So, what we're doing here by forcing it into lower case is we're normalizing the search criteria. And then let's trim off any Whoops.
Oops. And then let's trim off any white spaces before it goes into the database.
Great. So, it stores the search criteria indexed for speed and normalized for consistent lookups.
Search term. So this is the search term that the user will input on the front end and then on the back end we'll conduct a our code will conduct a search against the relevant books collection within our MongoDB database. Okay, let's move on to the next field which will be title. So this is the title of the book.
So now we've the users entered the search term. The search term gets propagated through to the server. The server does a search, finds the relevant content within the collection or the data is scraped depending on whether the data already exists in the MongoDB collection or the data is scraped and this is the title of the book. This is the definition for the title field. So it's a string. So it defines a string field to store the title of the book.
That's all that is. Okay. And then we want authors and there could be more than one author. So this is a an array of strings. So we open the square braces and put string inside it like that to define an array of strings to store one or multiple author names. Okay. Whoops.
Okay. So then we go URL colon string with a capital S and it stores the direct link to the book's product page on Amazon.
Okay. Comma. And then let's go price here. And now this is our first number.
So we define it as a number which is a JavaScript data type. And it stores the numerical price of the book for easy sorting and filtering. Great. And then with the rating. So this is what the Amazon users are rating this book as. So it gives it a rating. So stores the average customer rating. For example, 4.5 out of five. And this is stored as a numerical value. So number right okay and then reviews and this is a number so it's just a number that stores the total number of customer reviews as an integer and of course you can build up whichever data is available to you on Amazon but this these are the fields that I've chosen for this particular application but you can of course um build up whatever fields that are available on the Amazon the relevant Amazon web page that we are going to scrape and and then image and now this is the image path. So it will be the image of the book on Amazon. So much like we scraped for the logo representing the relevant language when we scraped the Tyobi index this is the image path relative path on the internet uh where we can find the image representing the relevant book and this will be a string. So, we include string with the capitalized S stores the URL string pointing to the book's cover image. And then let's go created.
Whoops. Created at must be in camel case. And let's open the curly braces.
Type is a date. And we want the default.
Okay. So, created at automatically records the date and time when the book entry was first saved. So that's why we're defaulting to the current date time value here. And then we've only got one more thing to do with this file and that's export the module so that it can be used within our within the relevant controller.
Export okay equals to mongoose dot model open round brackets and this model is going to be called book like that. and let's include book schema.
So we're passing the book schema the definition of each document that will reside within our books collection.
We are we have defined that within the book schema object here this variable here or this const here and we're passing that to the model method provided to us through mongus.
So, compiles the schema into a model named book and exports it for use in your controllers. This is what this line of code does here. And we're now ready to move on to creating the service for our Amazon functionality. So, we've got the Tyobi service here. You can see here within the TY service, we're pausing the scraped data. And here's the scraping code or we're retrieving the relevant data from a database. But this is to do with actually kicking off the scraper and pausing the relevant scraped data here. And we're going to do the same thing for Amazon now. So in services, let's create a new file and let's call this Amazon service like this.js.
Right? Okay. And let's create the code for the service. Amazon service.
Firstly, let's import Axios. const Axios equals to require like this Axios.
So, this imports Axios to handle the complex HTTP requests required for scraping.
Okay. And then we want to import Cheerio.
And we've already installed both Axios and Cheerio, so we don't have to do that now. Require. So we're importing Cheerio so we can pause the scraped data and extract the relevant data into a well constructed array.
Okay. So imports cheerio to navigate and extract data from Amazon's dense HTML structure. So this makes it easy for us to read the relevant data into an array.
So let's do the pausing code first. So, const par Amazon HTML equals to HTML. HTML, the raw HTML scraped from the relevant Amazon web page will be passed into this parameter here. And then we'll use Cheerio to extract the relevant data and put it into a structured array. And search term is what the user searches for. So we're scraping based on that search term which will be used as a search term within Amazon. It'll search Amazon for the relevant book. So the the search term could be Python programming and we'll create say a web page full of Python products maybe 16 products and we will scrape Amazon based on that search term um the relevant web page and the HTML for the page that contains perhaps 16 books 16 Amazon products which are books pertaining to Python and it'll bring back that HTML and now we are going to extract the relevant data from the raw HTML and this is where we are going to use Cheerio. So we go const dollar you'll see a jQuery type syntax.
So if you're familiar with jQuery you'll recognize this syntax. So we load HTML.
So it loads the HTML content into Cheerio for selectorbased querying. So our method defines a function to turn row Amazon search results into structured book objects. So now we've got all the HTML loaded within Cheerio.
Now we want to return the results in a structured way. So we go return like this and we include the dollar symbol and then open round braces like that.
And we want to find the array of results.
The CSS selector for that purpose should be this here. So we're querying that raw HTML based on a known CSS selector within that HTML in order to get an array. Okay. So type this should be equals.
So s search result like that.
So let's just make sure that that's correct. Data hyphen component hyphen type equals to s search result.
And then we want to convert this to an array. So to array like that and then we want to crawl through each element. So that's why we pass eel here. So we're crawling through each element and then we can from each element extract the relevant values from the relevant fields that we want to include within the returned result. Okay. So when we go const we go dollar equals to dollar E here. So it wraps the current DOM element in a Cheerio object for internal searching.
That's what that does there.
And then we go const title recipe. Okay, don't worry about recipe, but this is what we're going to call it.
Equals to.
The reason it's called recipe is because this is just standard CSS code that we're searching on here which has t which has title recipe in it for some reason.
This here locates the specific container holding the book's title and author information.
Okay, great. And then we want to extract the price. So that's const price text equals to dollar eel dotfind open round braces. And here is the relevant CSS selector information that we need to include here. price do a hyphen off screen dot first dot text like this.
Okay, brilliant.
And now we want to include another return. We're going to return the object containing the relevant fields. So we first want search term like that.
So this returns a clean object representing a single book and this attaches the original search criteria to the object for database filtering. And now we want the title colon and we use title recipe. You see this is not just one particular value. So we need to extract values from this title recipe object. So we go title rare. So P.find and we want to find what's inside H2 within the title recipe element. So we want to find H2 there do text. And now we're extracting the individual values from this title recipe element. Okay. or elements that reside within the parent element. So we go trim.
So text must have empty round braces there and then trim.
Then we include the or operator. So if we are unable to find the title, we want to return oops unknown title like that.
Okay. And then we want authors.
And we also need to extract the authors from the title. Recipe object dotfind.
Okay.
And this is the CSS information we need to include here to find those authors.
Broke a size dash base dot a dash color dash sec secondary space a like that.
Okay. And then we want to let's do this on the next line just for neatness.
We want to convert that to an array since there's more than one author potentially. And then we want to troll through we want to iterate through the results. So we go map and then an arrow function a arrow and then dollar symbol the a element there dot text dot trim. We want to remove the white spaces include that there trim.
And then we want to filter the data using the filter method like this by boolean.
So it converts the author element into an array of clean strings.
That's what this code does here.
Great. So, I'm glad we can move on from that code. It's quite a lot of detail there. Comma. Okay. And let's go URL.
And let's go title. We're still using title recipe object recipe.find.
And we want to find an A element.
And we want to find its attribute. So, we found the A element. Then we drill down into the attribute, the href attribute. to find the relevant URL href gets passed into there. So it extracts the relative URL path to the books Amazon product page.
Okay. And then we want price and so if price text if there's a value there then pars float. So we want it to be a float because if you remember price was defined as a number. So we want to pass that price text as a float. Right? So we need to pass price text.replace.
So cleaning up this data a bit. And I'm just going to copy in the relevant code here.
So that is a regular expression. So we're cleaning up all that that data and replacing that with an empty string there.
Okay. And then or and then else null. So if price text doesn't exist, we'll just return null here.
So uses reax to remove nonnumeric characters and converts the price to a float. That's what this line of code does here. Okay, brilliant.
And we're just going through all of these fields step by step, extracting them from the relevant HTML, the raw HTML, cleaning up the data where necessary, and converting them into the relevant data types appropriate to how we've defined them in the schema that we passed to Mongoose.
Okay. So, rating rating text.
So if there's a rating text then we need to pause float. So this was also defined as a number in our schema.
This rating text pause float rating text dot split.
Okay.
by space and grab the first element of the array.
And if there's no rating text, we just return null like that.
So it splits the rating string to isolate and convert the first number.
So don't worry too much about the detail of all of this. All we're doing is taking the raw HTML and extracting the relevant values that we want from the web page and we're storing them in a structured array. So each object represents a book and we're extracting the relevant values from the book results that were extracted from Amazon based on the search criteria that the user inputed and we're extracting their values and putting their values as an object in an array views. Okay. So, reviews pause int dollar eel.find a and we're using area label in this particular for this particular CSS selector ratings. Okay.
And then we want to drill down further into the attributes within that element.
And we want area hyphen label.
Okay. Like that. Okay. And we want to we want to replace the data represented by this regular expression. I'm just going to paste all this in.
We don't need to go into all the detail of regular expressions.
Okay.
Or we can just say that rating is zero. Okay. Just include a close curly bracket there.
Some reason got that there.
Okay.
Great. Okay. And then the last field is image. So we go image. Okay.
Eel.find open round braces.
Image hyphen. Oops. Image. Syphen image like that. And then we want to extract the attribute, the src attribute like that. So it retrieves the source URL for the books cover thumbnail. That's what this code does here.
Okay. So include there and semicolon there.
Okay. It's probably some kind of issue here.
Oops. And there's the problem. Okay. I was wondering what all these squiggly lines is. I forgot to open the round brackets here. And all our problems go away just like that. just because of an open round bracket calling us causing us all kinds of headaches. Okay. And so all our problems have gone away there. And that's basically it. That's the pausing code. Okay. Before we continue, I found just a little bit of a mistake here. So this should not be a hyphen separating these two segments of our CSS code.
So this should be a dot like that. You can see they separated. These A's are separated by dots here.
So that was a hyphen and we've replaced that hyphen with a dot. Okay, little mistake, but these little mistakes can have serious consequences on our data.
So best to fix them. Okay, so um right. So now we want to fetch the data and then we've got our pausing functionality already. So we're going to fetch the HTML data by kicking off a scraping process on Evomi.
And in fact, what the difference is between this scraping process and the Tyobi index scraping process is we're using the scraper API plan for scraping the Tyobi index and we're going to be using the core residential plan to scrape Amazon. This is because Amazon is a bit tougher to scrape. They have better protection mechanisms in place, bot detection mechanisms. So we need to in order to bypass that those uh security detection those bot detection mechanisms we use the core residential plan and what's different about that plan is that every scraping request that goes to the target web page on Amazon will be done by a different proxy each time.
So this helps us bypass those security mechanisms, Amazon's security mechanisms. So let's create our method const fetch Amazon books like this equals to async. So we're creating an asynchronous arrow function here and then it accepts one argument query and that's the users input. You'll this will become a lot clearer when we write the react front end code. Let's open curly braces there. So we've defined an asynchronous function to fetch and trigger the pausing of book data. Right?
So const page url equals to let's use back tick character so we can include variable values within our string and then the first variable value we want is from an environment variable process env.
So you want to configure the base URL within your environment variables.
And I've called my environment variable base URL Amazon. And that's just https colon/www.amazon.com.
That's what that will be. And now the pattern for the rest of the URL for make for basically searching for book data on Amazon will look like this.
So question mark and then K equals this is K is the first parameter K equals 2 and then dollar and then we want to include this encode URI component like that and then we want to encode the query.
So if for example the user included a space within the search criteria like Python space programming we need to include the relevant placeholder for the space that's appropriate to a URL. So it could be something like for a space it could be percentage 20. I don't know if that's correct but that's just off the top of my head. So that's what this function does here. It would replace Python space programming with Python percentage 20 programming assuming percentage 20 represents a space um for a URL.
Okay. And then the next parameter is so we include to separate our parameters we include the amperand symbol and the next parameter is called I and this is strip books because we're searching for books on Amazon.
And that looks correct. So, so we've constructed the Amazon search URL specifically for the books category. And that's what this code here does. Okay.
And then we can go const response equals to await. And we're going to use axios for our post request. post and then page you whoops URL like that and then we want to pass the relevant arguments to the post method and this argument is where it gets a bit interesting. So we need to create a property called proxy and then define the object for the proxy property here and the protocol field must be set to HTTP and the host and the The reason why this is different from when we scrape the Toby index is because we're using the core residential plan. So this is just the way we interface with Evomi when we use the core residential plan and it is slightly different to the way we have done so using the scraper API plan. Right? So process environment variable evomi proxy oops end point host. So uses the host address provided by your proxy provider is what we're doing here. So, if you sign up, say for a free trial with the core residential plan, you'll have all of this information made available to you, the URL for the proxy host, and you can just define those various values within your environment variables within your env file here. Okay, great. And then the next field we want is port. So it's on port 10,00.
So it connects through the specific port designated for your proxy rotation.
See that this is why it is so effective to use the core residential plan when scraping at scale because each request is done by a different proxy because it exercises a it exercises proxy rotation so that every request comes from a different IP from a different proxy server. Okay. So and then username. So you'll get a username value made available to you when you sign up for your Evoi core residential plan.
Okay. Now I've called my environment variable username. Obviously I'm not showing you these because this is sensitive information process.v and then password. So you obviously want to protect your username and password for security reasons. So it provides the required credentials for the proxy service.
So you have this orth field. So this is all to do with the proxy rotation and the core residential plan. And this is how you interface with the relevant endpoint on iomi which is slightly different to the way we've done it. We did it when we scraped the Toby index. Okay. So that looks good. Host port 1000. Just make sure there's no nothing.
incorrect here. That all looks correct, right? And then if response, so if we get blocked, this is a way for us to know if we've been blocked or not. Includes, so we're looking within the HTML, the raw HTML returned to us, and we're looking for this specific text.
Something went wrong.
So if something's gone wrong, we can throw an exception and the message the exception message could be something like Amazon cap detected. So they've detected us as a bot and have stopped us from scraping.
So it throws an error to stop the process if the scraper was detected as a bot. Okay. And now if all has gone well, we can return.
We don't want to return the raw HTML.
What we want to return is the pared HTML which will actually result in returning a structured array of useful data. Amazon HTML par Amazon HTML and response data that will be the raw HTML and also the query that the user inputed on the front end. So that could be Python programming for example or dance or karate or whatever books book data the user wants returned. Okay. And then the last thing we need to write is we need to write the code that exports the module exports like that equals to and let's export the method fetch Amazon books like that so that it can be used from within the controller.
So exports the fetch function for use in the Amazon controller which we haven't yet written.
So we need to write the code for the controller now. Okay. So we go to controllers here and let's create a file called Amazon controller.
No, let's call it book controller.
Great. So let's write the code. So const book equals to require. And now we're importing that model that we created earlier. And we can find it at this location. Models forward slash and book. Right.
Okay. Like that.
Excellent. And then we want the Amazon service imported. Amazon service like this.
equals to require and service services Amazon service like this. So we're importing that module there and then let's export get Amazon books. This is the name of our method, our action method if you like for the controller.
Export get Amazon books. And you can see if we go back here to server.js.
Get Oh, get Ty rankings. Oh, that's not very good. Get Ty rankings. Get So, that needs to be get Amazon books right there. Okay. Ty controller.
And of course that mustn't be ti controller that needs to be book controller right so we're mapping that to the appropriate controller and method here controller like that controller right so got here and we need to do one more thing on this file here and that is to import the relevant controller that we haven't yet written So book controller controller equals to controllers and then this will be book controller here.
Okay. So we're importing this into the main entry point method for the node application part of our full stack web application. We're making the relevant imports here and we're mapping the relevant routes to their counterpart web methods within the relevant controllers here. Okay, so let's go back to book controller and let's write the relevant method.
Okay, so it's an async method. So let's go equals to a sync like this and then two parameters for one for the request.
Sorry. Whoops. Don't know what it did there. Request and one for the response.
So, rec and rest two parameters. Let's create the arrow.
And let's create our function.
And this should be export not export.
Okay. So, it defines an asynchronous root handler for the Amazon book search endpoint. So, this is the root handler function that we're creating. We've created the root here. We're defining the root and that is the root handler function for get Amazon books which we are going to write here.
Okay. So const query equals to rec. So that's the request the HTTP request dot query dot Q question mark dot. Okay, we want to normalize this. So we put it into lowerase.
Okay. So that it's consistent for our search. So extracts the Q parameter from the URL query string and converts it to lowerase. That's what that code does.
And then if not query return rest status the URL that the user has entered is incorrect. So we return a 400 error error query required.
So, we send the appropriate error message back to the client code in the event that the client code did not include a valid URL that includes a valid query string.
Okay. And let's include try catch code now. Okay. I'm just going to include the entire structure here. Catch error. Okay.
go here and let's just console let's log out the error. So controller error and let's include the error. So error dot message here.
Great.
Okay. Okay. And then we want to return in the event of an error, we want to return an internal server error to the client code in JSON format.
Okay.
Error failed to fetch books like that.
Okay, so we've got our try catch structure in place and let's write the logic within the try block here. So const cached books. So we're searching for the search term which we've indexed within the relevant collection query like that.
So we want to return the cache data first if it exists.
cached books like that. Length is greater than zero then return rest.json source database. So the source for the data returned to the client in this particular case would be database and let's return the data which is C cached books like that.
But now if it gets to this point so this would stop the method from processing any subsequent code because the data is available within the database and we want to return that data. But if no data exists for that particular search criteria, for example, Python books or JavaScript books or dance or karate, whatever it may be, whatever books the client is searching for, if that data does not exist within the database based on the relevant search criteria, we need to scrape Amazon for those books. And this is where it gets fun and interesting. So scraped books and scraped has an R. We got scaped books. Scraped books equals to await and then Amazon service dot fetch Amazon books include the query from the for example Python programming JavaScript programming whatever it may be would be in the query. So we've fetch we've written these the relevant methods. We just did that within the Amazon service and it will hopefully return us the scraped data.
It will parse that scrape data and return us the structured array that we want to return in JSON format to the client.
So if scraped books so if the array that's been returned from the service fetch Amazon books if scraped books.length is greater than zero. So there is that means there is data. If it's greater than zero then await book. So that's mongus book dot insert many. So we're inserting the structured scraped data in array format into our MongoDB collection so that next time the next call will grab that data from the database and we don't have to subsequently scrape Amazon for that particular search criteria because the relevant data is going to be cached within the database and That is pretty much finished. We've got one more line of code to write which is the response JSON.
So we want to return the relevant JSON data containing the book data to the client and the source is going to be scraper. So it's freshly scraped data in this particular case and the data is scraped books.
Excellent.
Okay. And we have written our controller class which means we are ready to test this.
Pretty exciting stuff. Hopefully there's no real issues. I think we may have ironed out all the issues. So we've got our book controller there and we're mapping the relevant methods to the controller methods.
The root handler methods. We're mapping them to their roots here. So that's for Amazon and that's for Tyobi.
And I think we might be ready to test our code. And we're going to test our code initially through Postman because we haven't written the client for the Amazon functionality yet. So what we can do is go to terminal here. Great. We're in the relevant root path for our server code and we can actually run that code. So we just go node server.js and hopefully we can kick off the scraper on evomi and get some results and that those results will be saved to our MongoDB database and we can check that after we've run the relevant code to see if our code is successful.
Let's see if it connects to MongoDB. And there it is. Server running on port 3000 connected to MongoDB. So now we want to we want to use Postman to scrape the data and hopefully it will create a collection with an info data with the scraped data in it. Okay. So I'm just going to minimize that and we're going to go into Postman now.
Great. Send API request is what we want.
And the request will be API.
Request will be API and then the relevant route is not t not API/tyobe.
API/books slash Amazon, but it's also expecting a parameter. So we need to pass in a parameter. So if we go to bookcontroller, so do query.q. So we want Q, we need to include the Q parameter here.
Okay. So let's try that. Let's go.
Okay. question mark Q equals to and what shall we what should be our search criteria JavaScript why not JavaScript like this and I think that is correct let's try that okay 500 internal error failed to fetch books okay So, I noticed a an issue here. So, this has got to be end code.
Oops, bit of a typo. So, encode URI component. That looks good. Okay, let's go back to Postman.
You can see we got a 500 internal server error because of this issue we had here, which is now fixed. So, let's try that again. I'm just going to run the code again. Go node server.js.
Okay.
Okay. Server running on port 3000 connected to MongoDB. Let's go back to Postman and let's try to run that query again. Send through 200. Okay, look at that. The scraper has worked. Search term JavaScript. And we've got all these titles. And look at that. It's worked first time. brought back all the data we want and let's see if our database has been filled with data. So the new collection hopefully would be created. Firstly we have to go to view then reload data and there it is books. So our collection has been created within the database and look at that. Isn't that brilliant?
Okay. So now the next time we run it, this source should reflect that the data is being retrieved from the database. So let's do that.
We'll run that request again through Postman.
And look at that. Source equals database. And you can see the data has been returned from our database. And if you want to test the scraper again, you can just go to MongoDB. You can just remove all the data in the collection.
And then that will force our code to scrape Amazon again for the relevant Amazon books pertaining to the relevant search term. In this case, it was JavaScript.
Excellent. So now we're ready to write our React code. Okay. So now we're ready to create the react part for the Amazon functionality.
So let's get going with that. So firstly we want to go to our client here and then we go into the src folder here and we want app.jsx the app.jsx component and we want to make sure that we include a root here. So let's include the relevant route and this is the one we actually want. So I'm just going to uncomment that. So path equals to Amazon/ Amazon element equals to Amazon. And we actually want to import it up here. Bear in mind we haven't actually created this component.
And that's what we're going to do next.
Okay. So import Amazon like that. And the Amazon directory which we haven't yet created.
And Amazon with a capital A. This is just our naming convention that we're using Amazon. Like that. Okay, good.
That's all looking good. And then the next step is we create a folder within src where we're keeping all our code. We go new folder. And let's call this one Amazon with a small A at the beginning.
Amazon. And that's just the naming convention we're using. So Amazon. And then a new file. And this Amazon has an capital A. And as I said earlier, this is just the naming convention we're using. So, Amazon.jsx like that. And we're ready to write our code. Okay. For Amazon.jsx, our Amazon component. So, we've got quite a lot to write here. So, let's get going. So import react, and then we want the use state hook from React like this. So that imports React and the use state hook to track search input and results. And that's what that is there.
And of course, React has a little R here. So that's why it's complaining there. And it's just complaining here with the red squiggly line because we are not using it anywhere, but we will be using it. Let's import Axios like this from Axios like that. Semicolon. Okay.
Imports Axios library to facilitate network requests to your back end.
That's what this line of code allows us to do. So, we're going to use Axios for that. Okay. And then let's create the Amazon component. So, we do this.
Amazon. Whoops. Amazon equals to no parameters. And this is an arrow function. So let's include our arrow open curly braces and let's get going. So const and then query this is the first time we're using use state to track state. So set query is the function that will set this variable change the value of this variable and thus change the state for the component. Okay. So it's equals to use state and we'll just include a default of an empty string here. So it creates states to store the user's current search keyword. And here we are defining the functional component for the Amazon book search interface.
Okay, let's create our second stage tracking variable if you like books and that's going to store the actual array of books that we get back from the server which is essentially the paused data from the raw HTML returned after scraping Amazon. Okay, use state and let's open braces there and include an empty array here. So this code initializes an array state to hold the list of books of book results returned by the API. Let's create a variable a state variable for our loading indicator set loading to change the state of the loading variable. Let's initialize the state, okay, by passing in false. Oops, not like that. Between the two braces rather like that. Okay. And that manages a boolean flag to show or hide the search progress spinner. And let's close off this round bracket here. And of course, all of these squiggly lines are just because we're not using them as of yet, but we will be using them soon. So, nothing to worry about there. So, const and then error. This is to store error information. And then set error to set the error variable where needed. And let's initialize this one equals to use state. And we open braces like this. And null. We'll initialize error to null.
Great. Okay. So, let's create a function called handle search.
And it's an async function like this.
Whoops, not like that. Like this. And then one parameter arrow and then open curly braces like that. So, it defines an asynchronous handler triggered when the search form is submitted. So, we're handling the submission of the search form. So the user will enter the user's search criteria for Amazon and then press a submit button and that search criteria will go off to the end point and scraping will occur appropriately.
Okay. So and then we go E or if the results exist it will get that data from our MongoDB database. So scraping may not necessarily occur. It only occurs if the relevant data does not yet exist within our MongoDB database. So that data is not yet cached within our MongoDB database. Okay. Prevent default like this. Okay. So that prevents the browser from reloading the page during the form submission.
E.prevent default. That's why we're using that there. And then we want to check query. So if not query, if there's no query, we need to handle that situation and all we're going to do is just return. So exits the function early if search input field is empty. That's what that line of code does. Else if we get to this point, we want to set loading to true. So the loading indicator will be displayed and then set error to null. We're just setting this again here. We've already initialized it to null, but let's set it to null here when the submission occurs. And then let's go Whoops. Let's go try let's open curly braces. And we're going to create a try catch block here. Let's do the catch before we create the logic in the try section. catch right open braces error like that. Okay. And then let's go set error. So we're setting the error message state variable like this. And the message for that will be unable to fetch data. Please ensure the back end is running. And then we want to log that error for debugging purposes.
console error error like that.
Okay. And then we want to include a finally section. So this finally section is no matter what happens whether an error occurs or doesn't occur we want to set the loading indicator state variable to false so that it no longer displays.
We might want to in the case where an error occurs we might want to display an error to the user and in the case where an error doesn't occur and everything runs as expected we obviously want to display the relevant data in an aesthetically pleasing style to the user. We'll get round to that code in just a bit. So for now, let's go const response equals to await axios.get and then I'm just going to hardcode this for now. We could should be putting this in an environment in av file. So setting this as an environment variable, but for testing I'm just going to include this for now as a hardcoded value like this.
3000 API and we've got books and this is the path to the relevant endpoint uh the endpoint that we created. Okay, perfect.
Okay. And then want to include another argument here to the get method and this will include the params. So we go params in curly braces Q and query is the parameter value. So it attaches the user search term as a query parameter to the URL.
That's what that does there. And then we just want to account for the potential for the results to come back in two potential formats. So response dot data dot data. But if nothing exists in response.
Response data will be the one. So in our case it will be response data. Okay. So it extracts the array of books from the nested API response.
Okay. Perfect. And then we want to set books.
We've got this semicolon hanging about here. We don't need that. Then we go set books. and then array oops dot is array. So we're checking whether it is an array results.
So if it is an array, we set the books to results.
Okay. Else.
So we check whether results is an array.
If it is, we set books to results. or else we set the results like this. So if it's not an array result okay so it ensures data is stored as an array for the map function.
Okay, so we're just making sure that the results that are returned are actually in an array format. Okay, that looks fine, right? So then below this function return. So we're writing out the rendered part of our code here. So this is to render the relevant elements to the browser. So we include a return keyword. And then this is the parent tag. It's a div tag obviously. Class name. We must include that in camel case. We don't want to make the same mistake as we did before or the same mistake I did before rather.
Container. Okay. MT-5 like that. So wraps the content in a bootstrap container with top margin spacing. That's what that does there.
And then div within the parent div class name equals to row space justify dash content dash center margin bottom or MB5 and this is just spacing. So centers the search bar section horizontally on the page. That's all those Bootstrap classes are doing.
for this div element here. And then within this div element here, we want to include another div.
Okay. And we want to include some more Bootstrap classes. Class name equals 2 coal. So we're creating a we're using got this row here. So we're using Bootstrap's grid system for layout purposes. So we want call MD 8. So this is going to take up eight columns of the grid which is has a maximum of 12 columns. So it limits the width of the search area on medium and larger screens. So bear in mind the reason to use Bootstrap or one of the most important reasons to use Bootstrap is for screen responsiveness.
So this limits the width of the search area on medium and larger screens. So then let's include an H2 element like this. And then we go class name equals 2.
Okay. Then we go text hyphen center space MV. So margin bottom four like that.
Okay.
And let's include this on one line rather like that. And the text we want here for the heading is book search.
Okay. So displays the main section heading centered above the input. We're going to include the input section now.
Okay. So we need to create a form.
So let's do that below this H2 tag. So form. Okay.
It's created this action attribute by default. We don't want this. We actually want an onsubmit. We want to capture a specific event. So when the user clicks the submit button, we want it to automatically call our handle search function. And we of course just created that here. So the user enters the user's search criteria, presses the submit button and the handle search method gets fired and sends a request to the relevant endpoint on the server.
Great. So within form, we want the input now. We input tag like that. Okay. So type equals to text.
So that's just going to be a text box.
Let's tab that across there. Let's include our bootstrap classes. So class name equals to form control form form dash control dash lg like that. So those are the bootstrap classes we want for formatting the relevant text box there. We want to include a placeholder. So placeholder equals to and it's just instructional text. So it's enter book title or key word and we'll just put for an example here classic example of JavaScript.
Okay. So e.g dot and JavaScript.
So you might be searching for JavaScript related books dot dot dot and that's our placeholder text within the text box but so that it that appears by default to the user within the text box. It's just instructional text before the user actually enters a value a search criteria value. And then we got value equals to query whatever the query may be. So we're mapping that binds the input value to the query state variable. We got the state variable defined up here.
Okay.
And then on change, we want to include an on change event handler equals to. And then within curly braces, we can include an arrow function that actually just sets using the set query function sets the query state variable and that will be e do.target dot value which of course will be this input here. So it sets the query each time the textbox value changes. So while the users inputting information into the text box, it is at the same time in sync with that onchange event setting the query state variable. So it's tracking that variable. Great. Okay. And then below here within the still within the form element, we want to include a button element. So let's do that.
button.
Okay, there button. And then let's include a class name attribute here.
Okay, want it to be a button. So, btn, these are just common Bootstrap classes used for formatting and styling. be TN primary px 4.
Okay. And then this must be this type of button must be submit because we want our handle search method to fire on submit. So this handles that for us. So onsubmit event gets fired. Handle search gets executed.
So the onsubmit event is triggered and the handle search method is executed. Type equals 2 submit like that. Okay. And then we want this button to be disabled depending on what the loading state variable is doing. So disabled equals to false. So the loading state variable is false. It means obviously the button is active and can be pressed which means that no data is loading but while data is loading we want this button to be disabled. Obviously we don't want the user trying to submit this inputed search criteria to the server while data is loading. So that's what this code here is about. So it disables the button while a search is in progress to prevent duplicate clicks. And that's what that code does there. And then let's handle the loading related functionality. So loading if loading. So if loading is true, open round braces like this. Okay.
And then let's include a span here. So we're still within the button element.
So including this span tag within the button element. class name equals to and we're going to actually include a spinner within the button. So, spinner hyphen border spinner hyphen border hyphen SM.
Okay. And we'll just include this roll attribute equals to status like that.
Okay.
Okay.
Great. Close the span.
Okay. And that's it basically. So we're including a spinner within the actual button itself. So the spinner will display when the user clicks the button and while the data the relevant data is loading from the server while the server functionality is firing and it may take a while. So we want to display this spinner to the user within the button.
And then let's include an else part of this here. We include open closing braces like this. And we include search within the button. So a spinner is either displayed while processing is going on as a result of clicking the relevant button.
Or if no processing is going on, we want the label search displayed on the button. That's all that is doing there.
Okay. Excellent.
Okay. Let's handle errors here now.
So if error ampand amperand div class name equals to alert and these are just bootstrap classes that we can use to raise an alert to the user. Display an alert to the user. We want it centered text center like this. Okay. And we want to display that error.
So we can include the error object within curly braces like that. And then let's close the div here. divand.
Something's not quite right there.
Something's not aligning. And it's I suspect it's to do with this div here.
And it's because I should be closing this div here.
because that needs to exist within the parent div there. So that has sorted out our problem.
Okay, so we're handling the error there and we need to continue within this parent div element here so we don't have any further problems.
And then I've noticed another thing that I left out here for styling purposes. We want to include a bunch of classes for the form element and those classes are input group. So the group is the text box and the button. So input group we're asking Bootstrap to style this for us. We want to include some spacing. MB 3 margin bottom three and then shadow dash SM. So it groups the input and button together with a subtle shadow effect is what these bootstrap classes here are doing.
Brilliant. And we're ready to continue with our marathon code writing. Okay. So let's open another div. Now we're going to be displaying the data. We're going to be looping through all the data within the books state variable and we're going to be displaying the relevant data to the user in an aesthetically pleasing style. Okay. So, first thing we want to do is open a div tag here.
And then we want to open curly braces like this.
And let's include this code book. So that's our state variable that will be populated with loads of book data hopefully at this point. Okay. And then book we want to loop through each book within the books array.
Then include an arrow open round braces like this.
Okay.
This needs two closures here because we're opening the map. And then we're also going to be creating the function with around braces there.
Okay, great. Okay, div div like that.
We want to include a key. This is for React. We want to include a unique identifier for this element because it denotes a book item. one book one unit of book data displayed to the user. So we want to uniquely identify it through its ID. So key equals to book key equals to book doc ID and then dollar oid or math random. So the purpose of this is just to make sure that we have a unique identifier for each book that we're outputting.
Okay, we got a dot here.
Say, so that's all that this code is doing is we're including a unique identifier for each book that we're going to be outputting to the user. I'm just going to put this on the next line here. And we're going to include a class name, attribute, and loads of Bootstrap classes for styling purposes. And this is really just for um layout and styling purposes and responsiveness on various screen sizes.
three colon MD 4 call and we're just accounting for various screen sizes uh 12 and then MB4 margin bottom four. So that sets responsive column widths for different screen sizes and that's what that is doing there. And then within our div. So bear in mind what we're doing is representing each book on the UI. So let's include the image tag here.
And now this is the cover image for the book. Book dot image like that.
Okay. and then class name equals to card h 100. So that's actually going to be card image top h100 then w. So width or toe.
This is all to do with screen responsiveness. Then we want to include an alt equals to book dot title and style. We're going to include an inline style and this is object fit contain like that. So that needs to be an equals style equals and then the object represents a particular style. We're using object fit here which ensures the book covers aren't cropped or distorted within their containers. So this is just for styling purposes. Okay. Excellent. And then we need to close off the image of course.
Let's go on to the next div here. So that's housing the book cover image.
It's a very important part of our UI because essentially what's going to strike the user on the front end are the book covers which will be displayed within cards on the UI and these cards will depending on the screen size will adapt to that screen size because we're using the relevant Bootstrap classes for that purpose. Okay, we need to further wrap this image in two other divs. Okay. So, include a div here.
There. Okay.
Copy that there.
Tab that across like that.
Tab that in a little bit.
Okay. And I want another div here.
Okay.
And then you'll see that we're going to style we're going to add Bootstrap classes for each of these divs. And this is for styling purposes. Class name equals 2.
And this is a card H 100 shadow SM border hyphen zero. So it creates a card with uniform height and a shadow but no border.
And that's what those bootstrap classes are all about there. And we need to add a bootstrap.
We need to add bootstrap classes here for this div here. So text and this is for centering center P three. So paragraph 3. And then we also want to include an inline style here. And this is just the aesthetic that I've chosen for the UI.
Okay. So height.
So let's include a height of 220 pixels, background background, and this will just be hash FFF. So white white background. So houses the image in a fixed length container to keep the grid aligned.
That's what that that's what these Bootstrap classes are there for. And as well as this inline style that we're including there, we want to include the title.
So this houses the actual card. So we want to stay within this card div.
This houses the image. So we want to include the title within another div here. Okay.
And let's include class name equals to card. This is the body of the card dlex and then flex. So we're using flex grid functionality through bootstrap through these classes here.
And if you want to know all the details of these particular Bootstrap classes, you can go to their website and it's easy just to look these up and uh see what all the details of the CSS behind the scenes are for each of these classes.
Okay, so it uses flexbox to ensure the card body fills available space. That's what these Bootstrap classes here are doing.
Okay. And we're going to include a H6 tag here for the title just for further styling. Let's include class name class name attribute for our Bootstrap classes. And this will be card title custom card title like that. So it renders the book's title as a subheader basically.
Okay, let's press the enter key and let's include the actual value for the title. So, book dot title like that.
Let's include a paragraph tag with class name set to text muted small and margin bottom two, which displays the author's name in a small letter font. So, we're going to include the author author's name here. And we're just going to actually show the first author here.
So book authors length because there can be multiple authors but we only want to show the first author or one author even if there are multiple authors. So and then let's include back to characters here by and then the back to characters we can include the author the book's author we need to include a dollar symbol outside the curly braces and we can now include the book's author value here authors zero. So we just want to include the first author if there are multiple authors. Okay. and then else.
So, shows the first author or fallback label. Okay, say let's just say unknown author. Okay, I'm just going to say that there.
So, within the card, we also want to include the price and the rating and then a button that the user can click to actually navigate to the product on Amazon. So, let's do that. So within this div, let's include the price, right?
So we want class name equals to margin top auto like that. So that pushes the price and button to the bottom of the card regardless of title length. So what we're trying to achieve here. So class author and that's just a UI related choice, styling choice.
Okay. And then we go class name equals 2 and then def flex. So we're using flexbox for responsiveness through bootstrap. Then justify P uh hyphen content between align items center then margin bottom three and these are just stylistic choices that we are leveraging through Bootstrap.
Okay.
Excellent. And then let's include a span here.
Class name equals to equals 2. We want this in bold. So FW bold text dark H5 margin bottom zero like that.
And again, this is just stylistic choices that I've chosen for this particular UI. So, it renders the price in a bold, dark, prominent font size.
And then, of course, within the span, we want to include the actual value for the price. So, book price.
Okay. Back to characters.
So include dollar literal dollar and then dollar as in JavaScript dollar reserve character. And so within this we can now include the actual price value within the string to fixed oops.
So to two decimal places or else we just include an instructional string value check price so they can check the price on Amazon if no price is available. So we need to close our back to characters there not here.
Okay great. So if the price exists, we want to display it in a particular format. Or else we want to just display the label check price. So it formats the number to two decimal places or shows fallback text which is check price. That's all that's doing there. Okay. And then below this span we want to do the rating. Now the rating field. So book ratings checking the rating field amp% amperand open round braces. Let's include the small element like that.
Class name equals to we want sort of a faded font here. So muted for that.
Okay. And we want the star symbol included there. And then book dot rating like that. So if the rating exists then we display the rating like this.
So it only renders the star rating if the data is available. Great. And then we want to include a an A element here below this div like that. Href.
So we're going to include the literal Amazon URL. Just going to hardcode that for now.
forward slash then dollar and then book dot URL like that.
Okay.
And then so we wanted to open Amazon in a different window.
So we include target equals to oops underscore blank like that.
We certainly don't want that there.
Go equals to no reer like that.
I'm just going to put that on a different line there.
Let's include the class name.
Let's include our Bootstrap classes for the A element. So, we want this to appear as a button. So, btn btn dash we want to use the warning aesthetic and then btn small. So btnsm and then w 100 and then fw-ash bold like that.
So this will open the Amazon product page in a new browser tab here and we want to include the label of view item here. So, it allows them to view the actual product on Amazon by clicking the relevant button.
Excellent.
Below this div element here, we want to include if there no results, we want to output a status indicating that fact to the user. So loading emperand emperand books.length.
So if not loading and books.length is equal to zero and there's no error just no books ampand amperand open round braces like that. want to output a message to the user. Let's do it within this div tag like this. Let's include some styling here. Class name equals to center text muted MT dash five like that. And then within a P tag, let's include the message to the user.
No results yet.
Start by searching for a book above like that. Okay. And I think that is pretty good. And then below everything here, we must export it. Just include some semicolons here.
And we'll export default as Amazon like that. Semicolon and that exports the Amazon component for use in the main app roots. Okay. So I think we're pretty good. There's no problems now. So that looks really good.
So we are ready to test this. You know local host API books Amazon. This looks really good. Okay great. So let's go to terminal here.
want to open a new terminal.
Oh, I might have one open actually for the server. So, we're within the root of the serverside code here and we want to run the serverside code. So, we go nodes node server.js to run the code.
Injecting env from Great. And it should connect to MongoDB shortly.
And there it is. Server running on port 3000 connected to MongoDB.
Okay. And then let's run our React code.
Let's see what we got going here. PM rundev.
Great.
So, let's run our React client through Chrome this.
Okay. Excellent. So, I need to do something about this. This doesn't look right, but we'll look at that in just a bit. So, I'm just going to go to Amazon book search. Here we've got our interface. That looks pretty good. No results yet. Start by search for a book above. Should say searching for a book above, but we'll uh deal with that a little bit later. Great. And it doesn't look too bad. Doesn't look perfect this layout. We might have to look at the layout a bit. But let's do a test. So, I'm going to just Okay, I've already we've already got if we look at the MongoDB database and we go into books, we've actually already got books pertaining to JavaScript here. So, if I typed in JavaScript, it's going to return the relevant book data from MongoDB. And we actually wanted to scrape Amazon. So, I'm going to force a scrape by typing in Python the search criteria that hasn't yet been initiated.
So, let's do that. Let's type in Python and press search and see what happens.
Great. Our indicator is looking good.
And let's see what we get back from the server. Okay, that looks good.
So, our layout, it's doing what we asked it to do, but there's something wrong with our layout. So, it's actually scraped Amazon and populated the MongoDB database, but there's something wrong with our layout. So, it's just probably a Bootstrap class. that hasn't been included or something along those lines, but it is doing what we want, essentially what we want. So, we'll we'll have a bit of a debugging session and and sort out the relevant problems. This layout is not correct and that doesn't look all that great there either. So, we'll So, it's the basic functionality is in place. It's displaying the relevant fields that we want and it's looking pretty good, but this might look good on a mobile device.
You know, one card on top of the other, but it doesn't look great on our desktop computers. So, we want at least maybe three or four cards displayed on each row. So that's just something to do with the bootstrap layout functionality. We might not have included the appropriate class for that.
So we'll have a look at that in just a bit, but that looks good. Excellent.
Right. So there's just a few little cosmetic things we need to take care of.
Um I have identified a few little issues that I've noticed.
So this should be background color within the Amazon.jsx. JSX file. Let's make this background color like that. Okay. And the other one I noticed that wasn't quite right is this should be margin bottom two.
Great.
And the big issue is I forgot to put class name equals to row. And that is why we had the cards displayed one on top of the other. So, it was just to do with this line of code here. And then there's just a minor cosmetic issue within app.jsx where we're displaying these menus. And that's why they were displayed one on top of the other because they're displayed within one LI tag. These should be displayed within two separate LI tags. So, I'm just going to take care of that. I'm going to add that there. L I L I closing tag and then an L I opening tag like that to remedy this situation here.
Okay, I want to move that back one and then we need to include a bootstrap class here. So class class name equals two and this should be nav dash item and we want to include the same class within the other LI element here that houses the Amazon link and that should sort out that problem. And that's really it. So let's run our code and see how that's affected the UI. Okay, so I'm just going to clear that. Okay, node server. No, that's display. We don't want to do node server there. Let's get the server side running. Info data capture. So, we're in the root of the server code there. And it'll be node server.js to run our node code on the server, our server side code. And there it is. It's running connected to MongoDB server running on port 3000.
Okay. And then let's run our React code.
So, npm rundev to run our react code.
Great. I'm going to copy that URL to my Chrome browser.
Okay. and let's see what happens. Okay, so that you can see that's sorted out.
They displayed one next to the other.
And if we make this a smaller screen, let's see how it adapts. Great. It's including the relevant icon there, the classic burger type icon. And we can now use that to open and close our menu to collapse and expand our menu. And this is how you know this this is how it would look on a smaller device basically. And then when we go to full screen on a big device, you can see that those menu items are are positioned next to one another like that. And that looks pretty good. So, okay. So, let's try and test that again.
Let's go to Amazon book search and let's see how the how our changes have affected our UI. And it really was that class name equals to row within the parent div that was causing the problem.
So if we type in JavaScript, I don't want to test scraping here. I just want to test the how the UI has been affected by our changes. So it's cool that it gets the data from MongoDB. So we've already searched for JavaScript and it saved that data within MongoDB. You can see that we got loads of JavaScript data. So it will retrieve that data from from our MongoDB collection rather than from freshly scraped data from Amazon. So, let's do a search. It's pretty quick.
And look at that. Now, that looks really awesome. I'm really happy with that. And those are all our JavaScript books. And if we put it on a smaller screen, one underneath the other, that looks great on smaller screens. And as we expand the screen size, it adapts. So, our Bootstrap classes now make everything look wonderful. I'm sorry for the brief interruption. I just wanted to quickly make the point. You may have noticed that those book prices are way overpriced. Uh the dollar symbol is in fact incorrect there. The price is correct, but it's in South African rand.
I'm actually recording this course from South Africa. So, it's brought back the data to me in South African rand, and that's why the price seems crazy. It's just that it's in South African rands and not in American dollars. So that dollar symbol is incorrect in this particular case. Let's search for Python. This again we've got Python data. So it will grab the data from the MongoDB database. And look at that.
Super quick. And we can check those product items on um on Amazon by just clicking the relevant item. Okay, that didn't work. Okay, so there's a bit of a problem there with that. Let's just remove that forward slash and see if that affects it. So, there's an extra forward slash we need to get rid of for the path. And we can do that. No problem. So, we've got a bit of a problem when we try to view the individual items.
And okay, well, that worked.
That worked. Okay. So, it is navigating to this particular product, but ideally you don't want two forward slashes here. So, we can address that in the code.
Great. And now we can see that item. We could even purchase that item through Amazon.
Excellent. And we've scraped that data successfully from Amazon. And we've saved it to our database. And now, let's say we want to force a scrape. So, I'm actually just going to go to the homepage, then back into Amazon. So, it initializes the search. And let's search for something that we haven't searched yet. Let's just type in AI and see what books it brings back. Now, this will trigger a the scraper to be invoked. So, this so the server should return freshly scraped data for this search AI because we don't have AI related search results. We don't have search results for this particular search criteria saved to MongoDB. We've only got JavaScript and Python at the moment. So, let's just see. We got Python here somewhere. That's because we're we're only displaying 25 items.
Let's display 75 items. We should see Python. There we go. Search term Python.
Okay.
But let's uh search for AI. And this will trigger a scrape and it will return freshly scraped data hopefully to our web page. You can see now it's taking a bit longer, but it didn't take that long. This is brilliant. Look at that.
And it's returned books related to AI and saved that data now to our database.
You can see how quick it is now when it retrieves it from the database. So if we go Python, look how great this is. There it is. Split second. It returns it. Go AI.
It returns it from our database and it's a split second. And we could do something sort of off topic like dance for example. Let's search for books relating to dancing.
Okay, there we go. It's scraping. You can see. Look at that. Dance. Loads of dance books. We could try karate.
Just a different topic from the normal tech topics that we've been searching so far. So, karate. Let's see what happens.
Look at that. It's scraping. It returns martial arts related karate books to us and we can view those books on Amazon.
Sorry. Okay. So, some of the links aren't quite right, but uh let's test this one.
And there we go. Great. So, it takes us to the product on Amazon. And we could even purchase that product if we wanted to now. Okay. So, and let's remove this.
So, it is it does have an extra So, it unfortunately does have an extra forward slash here and that we can resolve that in code.
No problem. Okay, great.
Excellent. So, everything is working great. So if we go to the homepage, we go to the Tyobi index. We've got the Tyobi index there.
And then we've got the Amazon book search. And you could extend this functionality to scrape other websites and capture that data and perform all kinds of analysis on that data. So you can see the potential of having an app like this where you can capture data from the internet through scraping.
So what do we want to search for here?
Let's do something different here. So okay, let's try C. Let's see if we can scrape for C textbooks.
And there we go. Loads of C books here.
Brilliant.
So everything is working perfectly.
Let's see if we can see this one on Amazon. View item and it's taken us to the relevant book. And we could now purchase this book or read some reviews or whatever. And that is working perfectly. So that completes our full stack web application.
Look at that. And we've got a full catalog of books here in our own database now through web scraping.
Excellent.
And our screens are fully responsive.
And that is the magic of Bootstrap at work there. Makes it very easy to implement responsive web pages.
Tyobi Tyobi index there on a smaller devices screen. And we've got Amazon. And we can type in JavaScript or let's do AI rather and see what it returns. There we go.
all these AI related books that we've scraped from Amazon presented to us and we can go to Amazon and look at the book on Amazon, look at its reviews and even purchase the book. Excellent. I hope you enjoyed this course on web scraping as much as I enjoyed creating it. Web scraping is more than just a technical skill. It's the bridge between raw unorganized information and actionable insights.
Whether you are feeding a machine learning model, tracking market trends, or automating tedious data entry, you now have the power to gather the exact information you need whenever you need it. Happy scraping, and I hope to see you soon. Thank you and take care.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











