Keys (in the number of 20) indicated by SHAP values to get a
Keys (inside the number of 20) indicated by SHAP values for a classification studies and b regression research; c legend for SMARTS visualization (generated with all the use of SMARTS plus (smarts.plus/); Venn diagrams generated by http://bioinformatics.psb.ugent.be/webto ols/Venn/Wojtuch et al. J Cheminform(2021) 13:Page 9 ofFig. 4 (See legend on preceding page.)Wojtuch et al. J Cheminform(2021) 13:Page 10 ofFig. 5 Evaluation of the metabolic stability prediction for CHEMBL2207577 for human/KRFP/trees predictive model. Analysis of the metabolic stability prediction for CHEMBL2207577 with the use of SHAP values for human/KRFP/trees predictive model with indication of characteristics influencing its assignment for the class of steady compounds; the SMARTS visualization was generated using the use of SMARTS plus (smarts.plus/)ModelsIn our experiments, we examine Na e Bayes classifiers, Help Vector Machines (SVMs), and various models determined by trees. We use the implementations provided inside the scikit-learn package [40]. The optimal hyperparameters for these models and model-specific data preprocessing is determined working with five-foldcross-validation plus a genetic algorithm implemented in TPOT [41]. The hyperparameter search is run on 5 cores in parallel and we let it to final for 24 h. To decide the optimal set of hyperparameters, the regression models are evaluated applying (negative) mean square error, and also the classifiers applying one-versus-one area below ROC curve (AUC), that is the average(See figure on next web page.) Fig. 6 Screens in the web service a major web page, b submission of custom compound, c stability predictions and SHAP-based analysis for any submitted compound. Screens of the net service for the compound analysis using SHAP values. a principal page, b submission of custom compound for evaluation, c stability predictions to get a submitted compound and SHAP-based analysis of its structural featuresWojtuch et al. J Cheminform(2021) 13:Page 11 ofFig. six (See legend on preceding page.)Wojtuch et al. J Cheminform(2021) 13:Page 12 ofFig. 7 Custom compound analysis with all the use of the ready internet service and output application to optimization of compound structure. Custom compound evaluation with the use from the ready net service, with each other with the application of its output to the optimization of compound structure in terms of its metabolic stability (human KRFP classification model was used); the SMARTS visualization generated with the use of SMARTS plus (smarts.plus/)AUC of all probable pairwise combinations of classes. We use the scikit-learn implementation of ROC_AUC score with parameter multiclass set to ‘ovo’. The hyperparameters accepted by the models and their values regarded as in the course of hyperparameteroptimization are listed in Tables 3, four, 5, 6, 7, 8, 9. After the optimal hyperparameter configuration is determined, the model is retrained on the whole instruction set and evaluated around the test set.Wojtuch et al. J Cheminform(2021) 13:Page 13 ofTable 2 Quantity of PLD custom synthesis measurements and compounds in the ChEMBL datasetsDataset Human Subset Train Test Total Rat Train Test Total Number of measurements 3221 357 3578 1634 185 1819 Number of compounds 3149 349 3498 1616 179The table presents the amount of measurements and compounds present in particular datasets utilized inside the Glycopeptide Synonyms study–human and rat information, divided into coaching and test setsTable three Hyperparameters accepted by distinctive Na e Bayes classifiersalpha Fit_prior norm var_smoothingBernoulliNB ComplementNB GaussianNB Multinomi.