Whilst preparing for Data Science interviews, I brushed up on SQL. Here is a question and my thought process in answering.
The problem comes from InterviewQs but I have no affiliation to them. In signing up to their free plan, you are emailed Data Science interview questions 3 times a week. I have found this website useful in preparing for the technical side of the application process.
In this project I built a model to classify news as fake or true. As a hypothetical business scenario, I was working on a consulting assignment for a social media platform who had reported an influx of fake news.
The rise of fake news is causing the company significant concern as it could lead to a decline in user engagement not to mention the potential reputational damage for any entity targeted by the fake news, which the platform could be seen as facilitating.
By using machine learning to detect fake news, new stories posted on the platform could be systematically…
In this blog post, I will describe the project I completed to a non-technical audience. For the code, I would encourage you to look at my GitHub repository.
Executives of a new movie studio are after actionable insights to maximise their return on investment and ensure successful movies are produced.
“Avengers: Endgame’s $1.2 billion opening weekend is the biggest in movie history” — Vox, April 2019.
“Box office cats-tastrophe: Cats projected to lose $70m” — The Guardian, December 2019.
From these two contrasting headlines, we see that entering the movie industry can be viewed as a high risk/ high reward…
For my Flatiron Data Science Bootcamp Capstone project, I knew I wanted to build a recommendation engine. Recommendation systems have become such a part of our daily lives, from Netflix recommending movies to Amazon showing items we may be interested in. Whilst the MovieLens dataset appears to be the go-to dataset for learning and building recommendation systems, I wanted to use a different data source to ensure a more unique project.
Whilst in the middle of the COVID pandemic, video games have seen a surge as people stay home more. In fact Steam reported a record number of concurrent users…
In this blog post, we will be visualise embeddings of video games (based on data from Steam.)
We will start with the following DataFrame relating to Steam computer games. Here the
uid represents a unique user id and
id represents a unique game id. We have over 4 million rows, each representing a
uid/id relationship (namely user
uid owning game
id ). To see the preprocessing steps which led to this stage, please see the full project on my GitHub.
Classifying x-ray images using deep learning
In this project, I chose to apply deep learning to classify chest X-ray images as belonging to a patient with pneumonia or healthy. The key takeaway from this experience is the importance of domain knowledge to shape your decisions. It is relatively straightforward to apply a model but the true value comes in questioning your decisions and careful evaluation.
The dataset was obtained from Kaggle and can be downloaded here. It contains 5860 images. The first step was to divide the data into training, validation and test sets. Here comes the first decision where…
“We’re starting the telemarketing campaign on Monday and have the budget for 500 calls. Who should we contact to maximise revenue?”
In this scenario, Bank XYZ had on-boarded 2000 new customers through acquiring a smaller bank but due to resourcing and budget constraints, only 500 of these could be contacted. Using data from Bank XYZ’s existing customer base and results of the campaign for its existing customers, my goal was to identify which of the 500 customers to contact to maximise revenue and provide recommendations for future campaigns.
A successful subscriber translates to revenue for Bank XYZ. Based on domain…
In this tutorial, I will guide you step by step to create a map displaying houses sales using Bokeh, with a colour mapping indicating sale price. I wanted the viewer to be able to distinguish at a glance which neighbourhoods are most expensive to live in . This is also a useful visualisation for price prediction modelling purposes to get a sense of how important location is and whether new features relating to location may be useful to engineer.
We will be using a dataset with sale prices for the King County area, which can be found on Kaggle but…
Over the past two weeks, I’ve been working on my first data exploration project involving the movie industry. Here are five technical tricks I learned. Note that as the title implies, I’m very much a beginner to data science and as such would welcome any comments.
I was provided with data from various movie websites including IMDB, The Movie Database and Rotten Tomatoes. The first task was to read them into our Jupyter Notebook. A neat way of accomplishing this is with glob, which creates a list of all files of a certain extension.
# Create a list of all…
As I’m about to embark on a 10 month Data Science bootcamp, I thought it would be interesting to reflect on the path that led me here.
A Mathematical start
I graduated from University College London (UCL) with a first class honours Msci degree in Mathematics in 2012. I chose this subject due to my passion for the subject and particularly enjoyed pure abstract mathematics modules such as Number Theory. I took part in undergraduate research programs over the summer breaks, making great friends and culminating in publications. For my Msci thesis, I analysed the structure (in the mathematical sense)…
Mathematics graduate and Data Scientist