Python Automation Projects With Machine Learning

This article shows and explains the most important new features in Python 3.10.

In this Python Tutorial I show how to improve different Automation Projects with Machine Learning.

In a previous video Python Task Automation Ideas I created different projecs, which should now be improved with Machine Learning:

Video Transcript¶

The following text is the unedited video transcription of the corresponding video tutorial. Getting a transcript automatically is one of the projects in this video. It is generated with AssemblyAI's Speech-To-Text API. *

So earlier this year I made a video about different task automation scripts with Python. I created a jobboard scraper, a price tracker, and a Tweet scheduler, and today we're going to improve those three projects with machine learning. Plus, I added one new option automation project for this video. The purpose of this video is to give you an idea how you can use machine learning in real projects, and hopefully it will get you excited about machine learning in general and inspires you to start creating your own projects. For each project. I will show you an overview, the necessary data and the algorithm we use, so let's get started. As you might know, I have a website where I post a corresponding article for each video so I could save a lot of time. If I could automatically get the transcript of my videos, then I could simply convert my video files into text files, paste the results into my article, maybe do a couple of manual adjustments and the article would be done. For this I need speech recognition, which is a very big research field in deep learning, but it's not a simple task. So for the first project, I decided to not implement this on my own, but instead use a third party service. So since I don't implement the algorithm myself, I don't need to collect any data here. As a service, I chose to use assembly AI. They offer a speech to text API that is super simple to use and provides great results. Now a full Disclaimer. Since I really like the API, assembly AI was kind enough to sponsor this video, so a big thank you to assembly AI for sponsoring me. In order to convert my video files into text, I only need to follow four simple steps. Step one, convert video to audio to keep the file transferred to a minimum. I only want to upload the MP three file and this can be done in Python in only two lines of code with the movie pipe package. Step two, upload the MP three file by hitting the upload endpoint and to use the API. Of course, you need an account and an API key so you can sign up and test assembly AI for free. So I grabbed the API key and then I can simply use the request module and send a post request to the API endpoint. Step Three after uploading, I have to send a post request to the transcript endpoint. This sends me a Jason response back with a new ID for the transcription, and after that it takes a few minutes to do the actual transcription and step four by hitting the transcript endpoint again with a new ID. I can then ask for the transcription status, so when it's completed, I can simply grab the text, save it in a file, and I'm done straight forward and simple. So I'm really happy with the results again, thanks to assembly AI for sponsoring me assembly AI offers automatic speech to text conversion with a clean API and great accuracy. It's trusted by many companies around the world and offers many nice features, such as batch transcription, real time transcription, and you can even extract key insights from your data. So if you want to test it yourself, you can sign up for free at AssemblyAI. Com. Last time we built a price tracker that regularly scrapes the prices of products on Amazon and then sends an email whenever the price falls below a given threshold. So in order to improve this, I want to be able to predict the prices in the future so I can get an alert when the price is likely to drop before it even happens to collect data. I scrape the prices every day and put it into a simple CSV file that contains the date and the price. So essentially we get a time series data which can then be used for forecasting as algorithm. I decided to use the Facebook Profit package. This is a free module available in Python that is optimized for forecasting. All we need to do is install it, and then we create a model, prepare the time series and call model fit and model predict. Then we even get nice looking graph to analyze. But for this project I care only about the trend in the near future. So whenever the price falls below a given threshold in the next seven days, I will send an email to myself. Last time we created a project to schedule Tweets in the future so we could put a Tweet into a Google spreadsheet, and then we had a script running in the background. The regular re checks if the given time has passed, and then uses the Twitter API to send out a Tweet. So to improve my Tweet creation process, I want to be able to classify my Tweets into a good one or a bad one before I schedule it. So I created a small web application that shows me immediate feedback before I send it out, and then I can change a potential bad one into a good one so we could scrape Twitter, download Tweets, and label all ourselves. But again, we can make our life easier and just take an existing data set that did exactly this, and luckily there is one on Keggle. It consists of 1.6 million Tweets extracted using the Twitter API. The Tweets have already been annotated and can be used to detect sentiment. I was very long and detailed explanation here, but the approach is very similar to another video where I did sentiment specification with TensorFlow. Now here is the rough overview to simplify classification. I divided the data only into the label zero and one meaning negative or positive Tweets. Then some NLP techniques are applied that involve stemming tokenization padding and word embedding, and at the end of the pipeline I create one model using an LSTM and a dense layer for classification at last step. This works pretty well. So after training the model, I wrote a small web application using Streamlit, and now whenever I type a tweet, the model will predict its sentiment in the background and shows me the results. And then only if I'm happy with it, I can submit it and it will be stored in my spreadsheet. I wrote a script that scraped a chalkboard site each day and I could specify the tags I was looking for. For example, I could specify the text Python and back end, and whenever a new job with one of those text appears, I will get an email with a link to apply. This logic is not very clever. So wouldn't it be cool to have an algorithm that knows what I like and what I dislike and then can recommend me only the jobs I really like and not just all jobs with a tech Python in it? So here we want to write a small recommendation system that learns my preferences. So here I had to put in some work and label the data myself, because after all, I want to have a recommendation system based on my preferences. So I scrape the job website every day for several weeks and saved all jobs in a CSV file along with the job. I also saved the company name, the job title, the Tags, the location, and the salary range if it was available. So these are the features to train the model. And then I labeled each job as one for I like or zero for I don't like it. So there is one tricky part about our data except the salary range. All other features are non numeric, so we need to transform it in some way. For example, we cannot simply take the job title and feed it to our model because then the model does not know how to deal with it. Instead, we have to create a so called categorical variable. There are techniques like ordinal encoding, and onehot encoding that do this. So for example, when we know all possible job titles, we can map each title to a dedicated number, and then our model knows how to deal with this number. So the key part here is really all about feature engineering and feature preprocessing. And once we encoded all of our features in a good way, we can test different models and then choose one. I suggest to try algorithms like logistic regression, random forest, and boosted trees. In my case, I ended up using XGBoost, which usually performs really well and is also used in many Kaggle competitions. But again, the more important part for this project is the feature engineering part. All right. So I hope you enjoyed this video, and it gave you an idea how you can use and integrate machine learning in your own projects. And now, if you haven't watched my first video about Python automation already, I highly recommend that you do so now and then. I hope to see you in the next video bye.

* AssemblyAI sponsored this video, so the link above is a sponsored link. By clicking on it you will not have any additional costs, you can signup and test AssemblyAI for free.

FREE VS Code / PyCharm Extensions I Use

✅ Write cleaner code with Sourcery, instant refactoring suggestions: Link*

Python Problem-Solving Bootcamp

🚀 Solve 42 programming puzzles over the course of 21 days: Link*

* These are affiliate link. By clicking on it you will not have any additional costs. Instead, you will support my project. Thank you! 🙏