Accessible Machine Learning for SEOs — Whiteboard Friday
Posted by BritneyMuller
Machine learning — a branch of artificial intelligence that studies the automatic improvement of computer algorithms — might seem far outside the scope of your SEO work. MozCon speaker (and all-around SEO genius) Britney Muller is here with a special edition of Whiteboard Friday to tell you why that’s not true, and to go through a few steps to get you started.
To see more on machine learning from Britney and our other MozCon 2020 speakers, check out this year’s video bundle.
Hey, Moz fans. Welcome to this special edition of Whiteboard Friday. Today we are taking a sneak peek at what I spoke about at MozCon 2020, where I made machine learning accessible to SEOs everywhere.
This is so, so exciting because it is readily at your fingertips today, and I’m going to show you exactly how to get started.
So to kick things off, I learned about this weird concept called brood parasites this summer, and it’s fascinating. It’s basically where one animal tricks another animal of the same species to raise its young.
It’s fascinating, and the more I learned about it, the more I realized: oh my gosh, I’m sort of like a brood parasite when it comes to programming and machine learning! I latch on and find these great models that do all the work — all of the raising — and I put in my data and my ideas, and it does things for me.
So we are going to use this concept to our advantage. In fact, I have been able to teach my dad most of these models that, again, are readily available to you today within a tool called Colab. Let me just walk you through what that looks like.
Models to get you started
So to get started, if you want to start warming up right now, just start practicing clicking “Shift” and then click “Enter”.
Just start practicing that right now. It’s half the battle. You’re about to be firing up some really cool models.
All right. What are some examples of that? What does that look like? So some of the models you can play with today are things like DeOldify, which is where you repair and colorize old photos. It’s really, really fun.
Another one is a text generator. I created one with GTP-2 — super silly, it’s this excuse generator. You can manipulate it and make it do different things for you.
There’s also a really, really great forecasting model, where you basically put in a chunk of time series data and it predicts what the future might have in store. It’s really, really powerful and fun.
You can summarize text, which is really valuable. Think about meta descriptions, all that good stuff.
You can also automate keyword research grouping, which I’ll show you here in a second.
You can do really powerful internal link analysis, set up a notebook for that.
Perhaps one of the most powerful things is you can extract entities and categories as Google perceives them. It’s one of my favorite APIs. It’s through Google’s NLP API. I pull it into a notebook, and you basically put the URLs you want to extract this information from and you can compare how your URL compares to competitors.
It’s really, really valuable, fun stuff. So most importantly, you cannot break any of this. Do not be intimidated by any of the code whatsoever. Lots of seasoned developers don’t know what’s happening in some of those code blocks. It’s okay.
We get to play in this environment. It’s hosted in Google Drive, and so there’s no fear of this breaking anything on your computer or with your data or anything. So just get ready to dive in with me. Please, it’s going to be so much fun. Okay, so like I said, this is through a free tool called Colab. So you know how Google basically took Excel and made Google Sheets?
They did the same thing with what’s known as Jupyter Notebooks. So these were locally on computers. It’s one of the most popular notebook environments. But it requires some setup, and it can be somewhat clunky. It gets confused with different versions and yada, yada. Google put that into the cloud and is now calling it Colab. It’s unbelievably powerful.
So, again, it’s free. It’s available to you right now if you want to open it up in a new tab. There is zero setup. Google also gives you access to free GPU and TPU computing, which is great. It has a 12-hour runtime.
Some cons is that you can hit limits. So I hit the limits, and now I’m paying $9.99 a month for the Pro version and I’ve had no problems.
Again, I’m not affiliated with this whatsoever. I’m just super passionate about it, and the fact that they offer you a free version is so exciting. I’ve already seen a lot of people get started in this. It’s also something to note that it’s probably not as secure or robust as Google’s Enterprise solution. So if you’re doing this for a large company or you’re getting really serious about this, you should probably check out some other options. But if you’re just kind of dabbling and want to explore and have fun, let’s keep this party going.
All right. So again, this is basically a cloud hosted notebook environment. So one thing that I want to really focus on here, because I think it’s the most valuable for SEOs, is this library known as “pandas”.
Pandas is a data frame library, where you basically run one — or two — lines of code. You can choose your file from your local computer, so I usually just upload CSVs. This silly example is one that I really did run with Google Search Console data.
So you run this in a notebook. Again, I’m sharing this entire notebook with you today. So if you just go to it and you do this, it brings you through the cells. It’s not as intimidating as it looks. So if you just click into that first cell, even if it’s just that text cell, “Shift + Enter”, it will bring you through the notebook.
So once you get past and once you fire up this chunk of code right here, upload your CSV. Then once you upload it, you are going to name your data frame.
So these are the only two cells you need to really change or do anything with if you want. Well, you need to.
So we are uploading your file, and then we are grabbing that file name. In this case, mine was just “gsc-example.csv”. Again, once you upload it, you will see the name in that output here. So you just put that within this code block, run this, and then you can do some really easy lines of code to check to make sure that your data is in there.
So one of the first ones that most people do is “df”. This is your data frame that you named with your file right here. So you just do “df.head()”. This shows you the first five rows of your data frame. You can also do “df.tail()”, and it shows you the last five rows of your data frame.
You can even put in a number in here to modify how many rows you want to explore. So maybe you do “df.head(30)”, and then you see the first 30 rows. It’s that easy just to get it in there and to see it. Now comes the really fun stuff, and this is just tip of the iceberg.
So you can run this really, really cool code cell here to create a filterable table. What’s powerful about this, especially with your Google Search Console data, is you can easily extract and explore keywords that have high click-through rate and a low ranking in search. It’s one of my favorite ways to explore keyword opportunities for clients, and it couldn’t be easier.
So check that out. This is kind of the money part right here.
If you’re doing keyword research, which can take a lot, right, you’re trying to bucket keywords, you’re trying to organize topics and all that good stuff, you can instantly create a new column with pandas with branded keyword terms.
So just to walk you through this, we’re going “df[“Branded”]”. This is the name of the new column we’re going to create. We have this query string “contains,” and this is just regex, (“moz|rand|ose”). So any keywords that contain one of those words gets in the “Branded” column a “True”.
So now that makes filtering and exploring that so much faster. You can even do this in ways where you can create an entirely different data frame table. So sometimes if you have lots and lots of data, you can use the other cell in that example. All of these examples will be in the notebook.
You can use that and export your keywords into buckets like that, and there’s no stall time. Things don’t freeze up like Excel. You can account for misspellings and all sorts of good stuff so, so easily with regular expressions. So super, super cool.
Again, this is just tip of the iceberg, my friends. I am most excited to sort of plant this seed within all of you so that you guys can come back and teach me what you’ve been able to accomplish. I think we have so much more to explore in this space. It is going to be so much fun. If you get a kick out of this and you want to continue exploring different models, different programs within Colab, I highly suggest you download the Colab Chrome extension.
It just makes opening up the notebook so much easier. You can save a copy to your drive and play with it all you want. It’s so much fun. I hope this kind of sparked some inspiration in some of you, and I am so excited to hear what all of you think and create. I really appreciate you watching.
So thank you so much. I will see you all next time. Bye.
Ready for more?
You’ll uncover even more SEO goodness in the MozCon 2020 video bundle. At this year’s special low price of $129, this is invaluable content you can access again and again throughout the year to inspire and ignite your SEO strategy:
- 21 full-length videos from some of the brightest minds in digital marketing
- Instant downloads and streaming to your computer, tablet, or mobile device
- Downloadable slide decks for presentations
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!