# ML Beyond Jupiter Notebooks Nivit kannani or something idk ## System Terminology 1. Reliabilty: - availabiliyt - fault tolerance : system continues to work despite failures - Correctness He went over how google is reilable, and india pakistan issues, providing correct information 3. latency :- sports scores, should be instant, sports livestreams should be instant 4. throughput 5. async flow : start and dont wait for it to finish. 6. accuracy 7. recall: models abiltiy to find actual positive cases ## Unsaid rules tradeoffs are everywhere when it comes to latency, if you focus on it too much, data inconsistoncy might arise. duplication of critical components ensures reliability. Network will take time, disk speeds will take time. ## jupyter notebook to producition gap putting stuff into production is hard. it is more than just a juptyor notebook Issues faced: 1. Data distribution shift (DDS): - Statistical properties of data changes, like the mean, sd of those numbers might change. Because the distribuition changed, the model might not work on this data. - So when the model is run in production, they ask for the distribuition properly of the trianing data - **Solution**: Train on comprehensive dataset - Do a model refresh, train it on the new data, and put the new model out there. - Worst base train a complete new model 2. **Data imbalance**: 99.9 % of all payments in the world are not fraud. If you train the model on that type of data, it won't work properly. - recall for fraid transactions will be less. Resampling, (oversampling, etc) where you create duplicates of the minority (fraud) transactions, undersampling (you decrease the no. of non fraud transactions). **synthetic** minority oversampleing. If you have a loss function 3. **Silent Failures** They have a simple 400 rules sytems instead of an ML model for checking for fraud detection, for the cases where the model might go wrong. 4. **Data quality**: Good quality data matters a lot lot. Data matters more than the excellance of the model. Even when training or inference, good data is very very important. Bias in the dataset will come to the model, mobile social media filters working only white people. Only get data that is in range the data quality part is the lions share of engineering time spent on using ML models in production ## VISA case study: Transaction fraud detection system They'll check bank balance, credit limit. VISA tests for - Fraud: where merchent is trying to deceive you - Stolen card is when somebody else steals your credit card - Enumeration attack: where people will try random numbers to get a valid card (big issue in US, etc not in India) ### Challenge 20ms is the acceptable latency for the fraud detection. Peak throughput was 10,000 req/sec availabilyt as 31.5 s per year downtime allowed. ### Tech stack Transaction -> visa ml model inference stacks <- redis storing profiles of users/merchants |---> async flow to update the profiles -> Apache kafka ---^ ### Reliabilty how? Europe is stored in europe itself. US has two independant stack, and they are independant, with two differen electricity providrs each stack has capacity to handle full global peak within each stack they have multiple ### Low latency how? - Use redis: its on memory - Each transaction is sent to both stacks, whoever returns score faster wins. - persistenct connections, unlike http requests that requests 3 way handshake - Lieghtwieght models ### References Designing Machine Learning Systems ## Last advice 1. Impact of work is important: do 2 taska which are high impact, more than 5 small task 2. For a corporation Business > product > engineering 3. FIrst impressions matter 4. Reputation no 5. develop an eye 6. pro tip about promotions, take on more responsibiliies, dont limit yourself. if your junior engineer, if you do the senior engineering level stuf at the junior level stuff, then you'll be noticed. 7. Build your trust in the company, you can work with the manager, (about 6 months) theyll discuss things with you. 8. You should actually approach managers with promotion in mind. 9. Explore and learn what other people are working on. 10. REad the code, read the documentation. Review the code. 11.