September 19, 2024
Data Science in Finance
 #Finance

Data Science in Finance #Finance


hey guys in our last CashNews.co I talked about 5 tips for data scientists but let’s actually get our hands dirty by analyzing some data related to Finance

there are many facets of data related to Finance but I’ll be analyzing one part in detail and hopefully you’ll be able to apply the same principles anywhere else

so let’s get to it first let’s define a problem we’re going to analyze data for a purpose so what is this purpose in this case I want to analyze customer churn customers churn when they terminate services from a company my purpose is to determine who is churning why are they

churning what can we do to reduce it and if we can predict when a customer will turn based on transaction history so that’s great we have our purpose so what do we do next second explore our data and understand exactly what it is and what it represents it’s now that we should not worry

too much about the purpose of the churn analysis let’s just try to understand the data we have in general and I do this by plotting some big stats for this analysis we’re going to use a simple casual data set here’s my understanding of this every row represents a customer who has

ever done business with us telco these customers could still be doing business with us in which case they’re active users or otherwise they’re not in which case they are turned users to alcohol furs to services phone and Internet I’ll run you through this notebook of my analysis

and then explain the details through presentation so don’t worry you’ll see how it all comes together in the end to get an idea of how many users use our phone services or Internet services or both I plot pie charts for some ballpark numbers Internet users can either have two types of

connections a fiber optic connection or a DSL we can plot the distribution of users for each service now we have some understanding of our data so what’s next third get back to the question of analyzing customer churn and don’t get sidetracked to achieve our goal i establish comparisons

between different sets of active ensure news i want to see if there is a difference in how long they spend with us so I compare active Internet users and churn to Internet users and do the same for phone users too so we can see here that these box plots have the median lifetime of active users

larger than that of churned users but we cannot simply look at these box bots and just say that the lifetime of an active user is greater than that if a turned user we need to first see if this difference is statistically significant this is done with hypothesis testing gotta love those p-values

I’ll prepare a separate CashNews.co on this but here’s a short explanation the type of test you conduct depends on the data you have for comparing these two groups you’d think of performing something like a t-test but that test has assumptions that the data must be normally

distributed and both distributions must have equal variances in most of these cases the test for normality fails so I have the option of transforming my data into a normal distribution with techniques like box Cox but even that fails in my case so I use the mann-whitney u test which only requires

that the data is iid independently and identically distributed since the p-values obtained after comparison is less than 0.05 we can reject the null hypothesis of the u test which states that both of them belong to the same distribution in other words we can definitively say that active user

lifetime and turn to user lifetime come from separate distributions and we can make the claim that the median active user has a longer lifetime with us than the median turn user similarly I make the comparison between Internet versus non Internet users and phone versus non phone users and finally a

3-way comparison between phone users Internet users and internet plus phone users now multiple groups can be compared with tests like ANOVA analysis of variance but I do a pairwise Men Whitney you test here because of the normality constraint violations I could also use Chris Cole Wallace H test

which is a generalization of the mann-whitney u test for multiple groups but the hypothesis is too weak for me to tell anything useful the null hypothesis states something like the meeting of all three groups is the same so even if I reject this hypothesis it just means that two of the groups have

statistically different medians but we still won’t know which of the two groups from the test alone so this leads me to use the pairwise mann-whitney u test any who say that you’ve done your analysis now what well if you show this notebook to other people it’s gonna be pretty

difficult to read in all this code and explanation we can see which details are important and which are not so you need to dig through the entire thing once again and point out which parts stand out and Capitalize on that will do this by compiling it into a set of presentation

slides which is step four I ran through everything before but let me emphasize my top findings in this presentation I’ll first kick things off with some big stats as telco we have customers coming for two types of services internet and phone services here are two pie charts showing the number

of Internet users phone users or users of both services each plotted for current active users and turn’d users each slice has a number along with the proportion of users the big takeaway 85% of our users who left us had both internet and phone services and currently such users contribute to

over 60% of our business now note just because 85% of our users who churned had both internet and phone services doesn’t mean the combination internet and phone is bad they still constitute a large 63% of our active user base after all similarly 6% churned users only had phone services

doesn’t mean that the phone only policy is good they just don’t have as many active users to begin with I plotted something similar for active in Shirin users these phone users may or may not have an additional internet plan they could be fiber-optic DSL or just no internet interesting

to note that 76% of our churned phone users also had fiber optic internet services now we compare the tenor or lifetime of active Insurance users by lifetime I mean the number of months for which they stayed with us telco the plots on the Left are active and churned Internet users

and the plots on the right are that if active Insurance funds I wrote the median lifetime in months for each plot in red and under each plot I wrote the p-value on performing the test of statistical significance in this case it’s the mann-whitney u test note here I

don’t just put the actual p-value I just state that it is less than 0.001 this is how you should report p-values if it’s greater than 0.05 just write the actual value which denotes non statistical significance if it’s between 0.001 and 0.05 then write the actual value again

denoting statistical significance and anything less than 0.001 just write less than 0.001 to show strong statistical significance there’s no need if that 2.6 a to the negative 10 or whatever the actual p-value is since the p-values during the comparison of active and churned Internet users is

statistically significant we can reject the null hypothesis of the u test which states that the distributions of both populations are equal hence active Internet user lifetime and churn to Internet user lifetime are different distributions the same can be said for active phone users and turn phone

users they belong to different distributions so our current internet and phone users have stayed with us longer than our churned Internet and phone users this slide looks like the last but this time of the left we’re comparing active Internet users to active non Internet users since the

difference is statistically significant indicated by the low p-value these two distributions are different and hence we can say that Internet users are older than our non Internet users we see a similar case in turn’d users too so the Internet users also stayed with us longer than the users

who didn’t have an Internet subscription once again we have a similar slide but instead of comparing Internet and non Internet users we compare phone and non phone users the graph on the Left shows the comparison of lifetime of current active phone users and current active non phone users

since the p-value after performing the u test is not significant we cannot reject the null hypothesis so we really can’t establish a difference in the tenor with telco for these phone and non phone active users for turn phone and non phone users however the p-value is significant and hence we

can say that our turn phone users stayed slightly longer with us than our non phone users now we can make the comparison between three box plots they denote the lifetime of users who had both Internet and phone services just Internet services and/or just phone services we performed the pair wise

men Whitney you test and determine that all p-values are significant thus we can say that the users who take both Internet and phone services from us stay longer with us than those who just take Internet services from us these users in turn stay much longer than those who just take phone services

from us interestingly among our turned phone only users more than half of them stay with us for only a month after activation like I mentioned before our Internet users have two types of services fiber optic and DSL the goal of the slide is to compare the lifetime of users with different types of

Internet services perhaps users of a specific type of Internet service tend to turn sooner than the rest well in this case we find that our users with the fiber optic internet service stay with us longer than our DSL users for our Internet users we give them an option to opt in or opt out of our

technical support I thought this might have an impact on customer tenor so I compared the lifetime of users with tech support and those without tech support and I find that users with tech support stay much longer than those without tech support now I’ve only mentioned a few of these facts in

the presentation so far but the list of facts that we can extract is endless facts are amazing but how does knowing these facts help us solve or at least mitigate the problem of customer churn well this is another thing to take care of point number five coming up with solutions and so I take some

of these facts and try to think of what we can do as telco to decrease customer churn I mentioned before that users with internet and phone services stay with us longer than users with just one of either service what we could do is when a user signs up with us just for our phone service we can

entice them to sign up for an internet subscription as well selling both for a package deal perhaps more users would sign on and be users longer than they are now another fact I pointed out earlier over half of our past exclusive phone users turned within their first month of activation knowing

this what can we do to mitigate customer churn what we could do here is improve our phone service policy perhaps by including new features but since our current exclusive phone users have been with us for over two years on average I don’t think this is the problem anymore then again new

services are always something to think about we know that our fiber optic internet users stay longer than our DSL users both services have their advantages the fiber optic connection is fast while DSL is reliable and affordable customers require internet services that suit their needs whether

it’s bandwidth location usage or price so for our larger businesses or larger bandwidth consuming individuals we can offer discounted fiber-optic package plans this would better tailor Internet services to make it more customer centered and thus we could expect less churn another fact that we

found out was that users with technical support churn later than those without so to increase customer time with us we can offer a 12 month free tech support subscription this way they’ll be more satisfied with telco stick around longer and even willingly subscribe to this technical support

after a year of good service so yeah that’s my mini analysis and presentation with this cog all data it’s always fun playing around and seeing what insights you can come up with the link to everything is down in the description below so check that out now this is great but can we use

machine learning in this well a useful application I can come up with is building a churn predictor given customer information predict how likely he or she is to churn in the next month however I wouldn’t need the users behavior over time where were they before Teleco when did they join us

now given this information we could extract features and throw that into our model but since we only have access to one tuple per customer we cannot build this classifier just given this data so I’m not gonna force a model out of this now here the general points to keep in mind while

conducting this type of analysis first define your problem have a specific goal in mind what do you want to get out of this analysis number two don’t be afraid to explore your data getting to know your data is very important so don’t let the goal define how you understand your data

perhaps something you thought seemingly unrelated may actually help in the analysis 3 when starting the analysis don’t lose sight of your goal after you know your data conduct the analysis by always keeping your goal in mind no point in doing a bunch of random shallow analysis number 4

presentation matters you may have done this amazing complex analysis on data but if no one understands your work then it’s like you didn’t even perform the analysis in the first place number 5 come up with solutions as a data scientist you should try to come up with some business

solutions based on the insights domain knowledge always helps and it does take some time to research number 6 model if you need to not if you want to machine learning isn’t always the answer if you feel that the model will help address your goal then it is something to consider though

customer churn is one of the many facets of what you could be dealing with as a data scientist in Finance I’m hoping that going deep into one aspect can help you

actually analyze data similarly in any field of Finance and that’s all I have for you now so if you liked what you saw hit that like button if you’re new here

welcome and hit that subscribe button I got some cool links in the description so check them out still looking for a daily dose of AI then clicker top one of the CashNews.cos right here for an awesome CashNews.co and I will see you in the next one

Now that you’re fully informed, watch this essential video on Data Science in Finance.
With over 18338 views, this video offers valuable insights into Finance.

CashNews, your go-to portal for financial news and insights.

15 thoughts on “Data Science in Finance #Finance

Leave a Reply

Your email address will not be published. Required fields are marked *