<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by MavenCode on Medium]]></title>
        <description><![CDATA[Stories by MavenCode on Medium]]></description>
        <link>https://medium.com/@mavencode?source=rss-b55720387b55------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*qGR7iZ82K3DLab-yhzMe5Q.png</url>
            <title>Stories by MavenCode on Medium</title>
            <link>https://medium.com/@mavencode?source=rss-b55720387b55------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 31 May 2026 14:29:22 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@mavencode/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Using Machine Learning to Detect and Predict the likelihood of a Heart Attack]]></title>
            <link>https://mavencode.medium.com/using-machine-learning-to-detect-and-predict-the-likelihood-of-a-heart-attack-e2db0bbde951?source=rss-b55720387b55------2</link>
            <guid isPermaLink="false">https://medium.com/p/e2db0bbde951</guid>
            <dc:creator><![CDATA[MavenCode]]></dc:creator>
            <pubDate>Wed, 28 Apr 2021 13:55:00 GMT</pubDate>
            <atom:updated>2021-04-28T13:55:00.476Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mGwBw8pCheiagTXrglDpaw.jpeg" /></figure><h3><strong>Introduction</strong></h3><p>Heart diseases or Cardiovascular diseases (CVDs) are the number one cause of death globally, claiming an estimated 17.9 million lives yearly. The World Health Organization estimates this to be 31% of deaths worldwide. In the United States alone, the Center for Disease Control estimates that a heart attack occurs every forty seconds translating to 805,000 Americans each year.</p><p>Some common symptoms include pain and tightness in the chest, shortness of breath, cold sweat, fatigue, sudden dizziness among others. Not everybody who has a heart attack experiences the same type or severity of symptoms. People experience moderate to intense pain while some have no such symptoms. It is expected that the higher the number of symptoms experienced the greater the likelihood of heart disease.</p><p>The severity and prevalence of this disease have necessitated a massive need for AI to develop predictive models that can help in disease management and risk control.</p><p>Our goal is to build ML models for Heart Disease Detection; export the best performing model to Android devices for everyday use. In addition, we will highlight the ML Operations for this project using Katib for Hyperparameter Tuning of the Model and creating a Pipeline using Kubeflow.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*i4qjeDrJEc_kguvZdkEaVQ.png" /></figure><h3><strong>Data</strong></h3><p>The dataset used for this project was taken from Kaggle. It has observations of 303 patients with 14 features such as demographic features like age, gender, fasting blood sugar, cholesterol level, resting blood pressure, and so on. The chances of a heart attack were classified using binary classification. This classification was taken as the target in our modeling. The Data Dictionary for the dataset is as shown below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*frqRp3uAeZIfgwNmc0zpbQ.png" /></figure><p>Figure 1: Data Dictionary:</p><pre>Age: Age of the patient<br>Sex: Sex of the patient (1 = male; 0 = female)<br>exang: exercise induced angina (1 = yes; 0 = no)<br>ca: number of major vessels (0-3) <br>cp: Chest Pain type <br>    Value 0: no pain<br>    Value 1: typical angina<br>    Value 2: atypical angina<br>    Value 3:non-anginal pain<br>    Value 4: asymptomatic<br>trtbps: resting blood pressure (in mmHg)<br>chol: cholesterol in mg/dl fetched via BMI sensor<br>fbs: (fasting blood sugar &gt; 120 mg/dl) (1 = true; 0 = false)<br>rest_ecg: resting electrocardiographic results <br>    Value 0: normal<br>    Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of &gt; 0.05 mV)<br>    Value 2: showing probable or definite left ventricular hypertrophy by Estes&#39; criteria <br>thalachh: maximum heart rate achieved <br>oldpeak: ST depression induced by exercise relative to rest <br>slp: the slope of the peak exercise ST segment <br>target: 0 = less chance of a heart attack; 1 = more chance of a heart attack</pre><h3><strong>Analysis</strong></h3><p>A review of the features revealed an imbalance for some categories primarily gender and the output. We noted that the average age is 54.37 years and the outlier observations were in three features namely Cholesterol, Resting Blood Pressure and Maximum Heart Rate. Chest pain showed the highest correlation with the target.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*tpcCKMKf2kK36zOv3XSVpQ.jpeg" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/512/1*L0SHwFX-pyc3qlJsy_CpWA.png" /><figcaption>Figure 2: Proportions of Gender and the Target Label</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Y7Tvhkc5iyhpttWG8VSRhA.png" /><figcaption>Figure 3: Box plot Showing the Outliers</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/966/1*7tEw6W_ps9AX6plXr-rpHA.png" /><figcaption>Figure 4: Correlation Plot of all the Features</figcaption></figure><h3><strong>Modeling</strong></h3><p>Data preprocessing tasks include removal of outliers, resampling the target variable for a more balanced distribution, and scaling. After preprocessing, our clean data was tested with six models, our best performers were the Logistic Regression and CatBoost models with a tied accuracy of 95%.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*e1L1F8U4hQjCkCPawnXdmA.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/512/1*usth1VxvlIUHbq3rdKBHrg.png" /><figcaption>Figure 5: Model Accuracy Scores</figcaption></figure><p>The Logistic Regression Model showed that the Maximum Heart Rate Achieved (thalachh) feature had the highest influence on the target. The visualization in figure 6 below highlights the Feature Importance Results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/916/1*YjmlmrMxyL8T97xiKUpR_w.png" /><figcaption>Figure 6: Logistic Regression Feature Importance</figcaption></figure><h3><strong>Converting Machine Learning Model to TensorFlow Lite for Android Devices</strong></h3><p>The Keras Model was converted into a TensorFlow Lite format to be used on Android devices. The TensorFlow Lite Converter did not support Logistic Regression or CatBoost Classifier Models at the time the project was conducted.</p><p>The conversion steps are as follows:</p><ol><li>Import TensorFlow Lite</li></ol><pre>pip install tflite</pre><p>2. Convert the Keras model to TensorFlow lite format</p><pre>converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)<br>tflite_model = converter.convert()</pre><p>3. Save the model</p><pre>with open(&#39;HeartFailurePrediction_model.tflite&#39;, &#39;wb&#39;) as f:<br>f.write(tflite_model)</pre><h3>Model Explainability with Alibi Explain’s ALE (Accumulated Local Effects) Plots</h3><p>In this session, we will use the <a href="https://docs.seldon.io/projects/alibi/en/latest/methods/ALE.html">ALE</a> explainer (Accumulated Local Effects) plots to explain the behavior of our best performing model, the Logistic Regression model on our dataset.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*2MId8MYCUx2-J1TsOgGQNA.png" /><figcaption>Figure 7: As age increases, the likelihood of having a heart attack decreases.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*v3U4F1a-pHncAZZe-VTLWA.png" /><figcaption>Figure 8: caa(number of major vessels(0–3) colored by fluoroscopy). For caa=0, there is no effect on average predictions, but as caa increases, its effect on classifying the instance as 1, decreases.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*7Qd10iHkA_taJGnWZzH4-A.png" /><figcaption>Figure 9: For higher cholesterol levels, the model assigns negative probabilities towards classifying instances as 1 while for lower cholesterol levels, the model assigns higher probabilities towards classifying the instance as 0.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*7DI3c_QZ5SJvkUCVsKcbdg.png" /><figcaption>Figure 10: The model assigns positive probabilities towards classifying chest pain types 1 and 2, and negative probabilities towards classifying them as 0. Chest pain type 0 seems to not affect average prediction.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*08lrwHrb-94o2cm9jjmwAg.png" /><figcaption>Figure 11: For instances where exng = 0 (that is, patients with no exercise-induced angina), there is no effect on average prediction. For instances where exng = 1 (that is, patients with exercise-induced angina), their effects on average prediction are negative.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*8BSt_0R5m-PHKM4yMf82Qg.png" /><figcaption>Figure 12: For instances where fbs=1 (fasting blood sugar &gt; 120), there seems to be no effect on average prediction, but for instances where fbs=0 (fasting blood sugar &lt; 120), the model seems to assign positive probabilities towards classifying those instances as 1.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*ZIT7IL8KF6r5hIa5UOFpWg.png" /><figcaption>Figure 13: oldpeak( ST depression induced by exercise relative to rest). As old peak increases, its effect on classifying the instance as a 1 decreases.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*1RLLo_rsDC3mcaImQV7tUA.png" /><figcaption>Figure 14: For instances where restecg (resting electrocardiographic results) = 0, (normal restecgs results) there’s no effect on average predictions. For instances where restecg = 1( ST-T abnormality), the model assigns positive probabilities towards classifying those instances as 1. For instances where restecg = 2(showing probable or definite left ventricular hypertrophy), the model assigns even higher positive probabilities towards classifying those instances as 1.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*iDBTyDGH4WtD8Y0fucqYDA.png" /><figcaption>Figure 15: At sex=0 and 1, the model assigns negative probabilities towards classifying the instances as 1, and positive probabilities towards classifying them as 0. So gender does not seem to have much effect on average prediction.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*OfYCV7QGy4j5XaFkwtZZ0g.png" /><figcaption>Figure 16: slp (the slope of the peak exercise ST segment). For slp =1, the model assigns positive probabilities towards classifying the instance as 1.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*_Zvi0mQacgtII_cntOy-NQ.png" /><figcaption>Figure 17: As the maximum heart rate achieved increases, the model assigns increasingly higher probabilities towards classifying those instances as 1. This means that patients with higher maximum heart rate achieved are more likely to have heart attacks.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*ZWorixIhzw6RBdZrln0ftw.png" /><figcaption>Figure 18: As thall increases, its effect on classifying an instance as 1 decreases.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*Ur2U7x0HbPam70dslMhy6g.png" /><figcaption>Figure 19: As resting blood pressure increases, the model assigns lesser probabilities towards classifying the instance as 1.</figcaption></figure><p>The ALE plots show maximum heart rate achieved as the feature, most effective in predicting heart failure. According to <a href="https://www.heart.org/en/health-topics/heart-attack/understand-your-risks-to-prevent-a-heart-attack">heart.org</a>, older age and high cholesterol levels are factors that increase the risk of a heart attack. Our dataset, however, does not depict that.</p><h3><strong>Implementing KubeFlow ML Operators for Model Learning</strong></h3><p>An Operator is a method of packaging, deploying and managing a stateful Kubernetes application which in this context is a machine learning Job.</p><p><a href="https://enterprisersproject.com/article/2019/2/kubernetes-operators-plain-english">Operators</a> are software written to encapsulate all of those operational considerations for a specific Kubernetes application and ensure that all aspects of its lifecycle, from configuration and deployment to upgrades, monitoring, and failure-handling, are integrated right into the Kubernetes framework and invoked when needed</p><p>An ML Operator can be made for a range of actions from basic functionalities to speciﬁc logic for an ML Job.</p><p><strong>TensorFlow Operator</strong></p><p>This is one of the operators offered by <a href="https://github.com/kubeflow">Kubeﬂow</a> to make it easy to run and monitor both distributed and non-distributed tensorﬂow jobs on Kubernetes. Training tensorﬂow models using tf-operator relies on centralized parameter servers for coordination between workers. It supports the tensorﬂow framework only.</p><p><strong>TensorFlow Training Jobs (TFJob)</strong></p><p>TensorFlow Training Job (TFJob) is a Kubernetes custom resource with a <a href="https://en.wikipedia.org/wiki/YAML#:~:text=YAML%20(a%20recursive%20acronym%20for,is%20being%20stored%20or%20transmitted.">YAML</a> representation that you can use to run TensorFlow training tasks on Kubernetes. The Kubeﬂow implementation of TFJob is in <a href="https://github.com/kubeflow/tf-operator">tf-operator</a>.</p><p><strong>TensorFlow Operator for Heart Attack Dataset</strong></p><p>Here, we go through the process of creating a TensorFlow Operator with our Dataset:</p><ol><li>Check that the right image, <a href="https://www.tensorflow.org/api_docs/">TensorFlow</a> is available:</li></ol><pre>#! pip3 list | grep tensorflow<br>! pip3 install --user tensorflow==2.4.0<br>! pip3 install --user ipywidgets nbconvert<br>!python -m pip install --user --upgrade pip<br>!pip3 install pandas scikit-learn keras tensorflow-datasets --user</pre><p>2. To package the trainer in a container image, we shall need a file (on our cluster) that contains the code as well as a file with the resource definition of the job for the Kubernetes cluster:</p><pre>TRAINER_FILE = &quot;tfjobheart.py&quot;<br>KUBERNETES_FILE = &quot;tfjob-heartdisease.yaml&quot;</pre><p>3. Define a helper function to capture output from a cell with %%capture that looks like some-resource created:</p><pre>import re</pre><pre>from IPython.utils.capture import CapturedIO</pre><pre>def get_resource(captured_io: CapturedIO) -&gt; str:<br>    &quot;&quot;&quot;<br>    Gets a resource name from &#39;kubectl apply -f &lt;configuration.yaml&gt;&#39;.</pre><pre>    :param str captured_io: Output captured by using `%%capture` cell magic<br>    :return: Name of the Kubernetes resource<br>    :rtype: str<br>    :raises Exception: if the resource could not be created <br>    &quot;&quot;&quot;<br>    out = captured_io.stdout<br>    matches = re.search(r&quot;^(.+)\s+created&quot;, out)<br>    if matches is not None:<br>       return matches.group(1)<br>    else: <br>       raise Exception(f&quot;Cannot get resources as its creation failed: {out}. It may already exist.&quot;)</pre><p>4. Load and Inspect the Data:</p><pre>import pandas as  pd<br>data = pd.read_csv(&quot;heart.csv&quot;)<br>data.head()</pre><p>5. Train the Model in the Notebook:</p><p>We trained the model in a distributed fashion and put all the code in a single cell. That way we could save the file and include it in a container image. That saves the file as defined by TRAINER_FILE but it does not run it.</p><pre>%%writefile $TRAINER_FILE<br>import argparse<br>import logging<br>import json<br>import os<br>import warnings<br>warnings.filterwarnings(&quot;ignore&quot;, category=DeprecationWarning)</pre><pre>import numpy as np<br>import pandas as pd</pre><pre>from sklearn.model_selection import train_test_split as tts<br>from sklearn.preprocessing import StandardScaler</pre><pre>from numpy.random import seed<br>import tensorflow as tf<br>tf.random.set_seed(221)<br>from tensorflow import keras<br>from tensorflow.keras.models import Sequential<br>from tensorflow.keras.layers import Dense, Dropout, BatchNormalization<br>from tensorflow.keras.optimizers import SGD, Adam, RMSprop</pre><pre>logging.getLogger().setLevel(logging.INFO)</pre><pre>def make_datasets_unbatched():<br>    data = pd.read_csv(&quot;heart.csv&quot;)<br>    data.head()</pre><pre>    data.apply(lambda x: sum(x.isnull()),axis=0)</pre><pre>    # List of variables with missing values</pre><pre>    vars_with_na=[var for var in data.columns if data[var].isnull().sum()&gt;1]</pre><pre>    #Boolean variables<br>    bool_var=[&#39;sex&#39;, &#39;output&#39;, &#39;fbs&#39;, &#39;exng&#39;]<br>    #Categorical variables:cardinalty<br>    cat_var=[&#39;cp&#39;, &#39;restecg&#39;, &#39;sap&#39;, &#39;thall&#39;]<br>    #discrete variables <br>    num_var=[&#39;age&#39;, &#39;trtbps&#39;, &#39;chol&#39;, &#39;thalachh&#39;, &#39;oldpeak&#39;, &#39;caa&#39;]<br>    #remove outliers</pre><pre>def removeOutlier(att, data):<br>    lowerbound = att.mean() - 3 * att.std()<br>    upperbound = att.mean() + 3 * att.std()<br>    #print(&#39;lowerbound: &#39;, lowerbound, &#39; -------- upperbound: &#39;, upperbound )<br>    df1 = data[(att &gt; lowerbound) &amp; (att &lt; upperbound)]<br>    #print((data.shape[0] - df1.shape[0]), &#39; number of outliers from &#39;, data.shape[0] )<br>    #print(&#39; ******************************************************&#39;)<br>    data = df1.copy()<br>    return data<br>data = removeOutlier(data.trtbps, data)<br>data = removeOutlier(data.chol, data) <br>#resampling<br>from sklearn.utils import resample</pre><pre># Separate Target Classes<br>df_1 = data[data.output==1]<br>df_2 = data[data.output==0]</pre><pre># Upsample minority class<br>df_upsample_1 = resample(df_2, replace=True,     # sample with replacement<br>n_samples= 163,     # to match majority class <br>random_state=123) #reproducible results</pre><pre># Combine majority class with upsampled minority class<br>df_upsampled = pd.concat([df_1, df_upsample_1])</pre><pre># Display new class counts<br>df_upsamples.output.value_counts()</pre><pre>x = df_upsampled.drop(&#39;output&#39;, axis = 1)<br>y = df_upsampled[&#39;output&#39;]</pre><pre>#Split dataset</pre><pre>x_train,x_test, y_train, y_test = tts(x,y, test_size = 0.2, random_state = 111)</pre><pre>#Scaling<br>scaler = StandardScaler()<br>x_train = scaler.fit_transform(x_train)<br>x_test = scaler.fit_transform(x_test)</pre><pre>train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))<br>test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))<br>train = train_dataset.cache().shuffle(2000).repeat()<br>return train, test_dataset</pre><pre>def model(args):<br>    seed(1)<br>    model = Sequential()<br>    model.add(Dense(10, activation=&#39;relu&#39;, input_dim=13))<br>    #model.add(BatchNormalization())<br>    model.add(Dense(10, activation=&#39;relu&#39;))<br>    #model.add(Dropout(0.2))<br>    model.add(Dense(1, activation=&#39;sigmoid&#39;))</pre><pre>    model.summary()<br>    opt = args.optimizer<br>    model.compile(optimizer=opt, loss = &#39;binary_crossentropy&#39;, metrics=[&#39;accuracy&#39;])<br>    tf.keras.backend.set_value(model.optimizer.learning_rate, args. learning_rate)<br>    return model</pre><pre>def main(args):<br>    #MultiWorkerMirroredStrategy creates copies of all variables in the model&#39;s <br>    #layers on each device across all workers<br>    strategy = <br>tf.distribute.experimental.MultiWorkerMirroredStrategy(</pre><pre>communication=tf.distribute.experimental.CollectiveCommunication.AUTO)<br>    logging.debug(f&quot;num_replicas_in_sync: {strategy.num_replicas_in_sync}&quot;)<br>    BATCH_SIZE_PER_REPLICA = args.batch_size<br>    BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync</pre><pre>    # Datasets need to be created after instantiation of `MultiWorkerMirroredStrategy`<br>    train_dataset, test_dataset = make_datasets_unbatched()<br>    train_dataset = train_dataset.batch(batch_size=BATCH_SIZE)<br>    test_dataset = test_dataset.batch(batch_size=BATCH_SIZE)</pre><pre>    # See: <a href="https://www.tensorflow.org/api_docs/python/tf/data/experimental/DistributeOptions">https://www.tensorflow.org/api_docs/python/tf/data/experimental/DistributeOptions</a><br>    options = tf.data.Options()<br>    options.experimental_distribute.auto_shard_policy = \<br>    tf.data.experimental.AutoShardPolicy.DATA</pre><pre>    train_datasets_sharded  = train_dataset.with_options(options)<br>    test_dataset_sharded = test_dataset.with_options(options)</pre><pre>    with strategy.scope():<br>    # Model building/compiling need to be within `strategy.scope()`.<br>    multi_worker_model = model(args)<br>    # Keras&#39; `model.fit()` trains the model with specified number of epochs and<br>    # number of steps per epoch.<br>    multi_worker_model.fit(train_datasets_sharded, epochs=50, steps_per_epoch=30)</pre><pre>    eval_loss, eval_acc = multi_worker_model.evaluate(test_dataset_sharded, verbose=0, steps=10)<br>    # Log metrics for Katib<br>    logging.info(&quot;loss={:.4f}&quot;.format(eval_loss))<br>    logging.info(&quot;accuracy={:.4f}&quot;.format(eval_acc))</pre><pre>if __name__ == &#39;__main__&#39;:<br>  parser = argparse.ArgumentParser()<br>  parser.add_argument(&quot;--batch_size&quot;,<br>                     type=int,<br>                     default=32,<br>                     metavar=&quot;N&quot;,<br>                     help=&quot;Batch size for training (default: 128)&quot;)<br>  parser.add_argument(&quot;--learning_rate&quot;,<br>                     type=float,<br>                     default=0.1,<br>                     metavar=&quot;N&quot;,<br>                     help=&#39;Initial learning rate&#39;)<br>  parser.add_argument(&quot;--optimizer&quot;,<br>                     type=str,<br>                     default=&#39;adam&#39;,<br>                     metavar=&quot;N&quot;,<br>                     help=&#39;optimizer&#39;)<br>  parsed_args, _ = parser.parse_known_args()<br>  main(parsed_args)</pre><p>6. Create a Docker Image:</p><p>The Docker file looks as follows:</p><pre>FROM tensorflow/tensorflow:2.4.0<br>RUN pip install tensorflow_datasets pandas scikit-learn keras<br>COPY tfjobheartdisease.py / <br>ENTRYPOINT [&quot;python&quot;, &quot;/tfjobheart.py&quot;, &quot;--batch_size&quot;, &quot;64&quot;, &quot;--learning_rate&quot;, &quot;0,1&quot;, &quot;--optimizer&quot;, &quot;adam&quot;]</pre><p>7. Check if the code is correct by running it from within the notebook:</p><pre>%run $TRAINER_FILE --optimizer &#39;adam&#39;</pre><p>8. Create a Distributed TFJob:</p><p>For large training jobs, we wish to run our trainer in a distributed model. Once the notebook server cluster can access the Docker image from the registry, we can launch a distributed TF Job.</p><p>The specification for a distributed TFJob is defined using YAML:</p><pre>%%writefile $KUBERNETES_FILE<br>apiVersion: &quot;kubeflow.org/v1&quot;<br>kind: &quot;TFJob&quot;<br>metadata:<br>  name: &quot;hrtd&quot;<br>  namespace: mavencodeai # your-user-namespace<br>spec:<br>  cleanPodPolicy: None<br>  tfReplicaSpecs:<br>    Worker:<br>      replicas: 2<br>      restartPolicy: OnFailure<br>      template:<br>        metadata:<br>          annotations:<br>            sidecar.istio.io/inject: &quot;false&quot;<br>        spec:<br>          containers:<br>          - name: tensorflow<br>            # modify this property if you would like to use a custom image<br>            image: mavencodevv/tfjob_heart:v.0.1<br>            command:<br>                - &quot;python&quot;<br>                - &quot;/tfjobheart.py&quot;<br>                - &quot;--batch_size=64&quot;<br>                - &quot;--learning_rate=0.1&quot;<br>                - &quot;--optimizer=adam&quot;</pre><p>9. Deploy the distributed training job:</p><pre>%%capture tf_output --no-stderr<br>! kubectl create -f $KUBERNETES_FILE<br>TF_JOB = get_resource(tf_output)</pre><p>10. See the job status:</p><pre>! kubectl describe $TF_JOB</pre><p>11. See the created pods:</p><pre>! kubectl get pods -l job-name=hrtd</pre><p>12. Stream logs from the worker-0 pod to check the training progress:</p><pre>! kubectl logs -f hrtd-worker-0</pre><p>13. Delete the job:</p><pre>! kubectl delete $TF_JOB</pre><p>14. Check to see if the pod is still up and running:</p><pre>#! kubectl -n mavencodeai logs -f hrtd</pre><h3><strong>Hyperparameter Tuning with Katib for TensorFlow Model</strong></h3><p>Hyperparameter tuning is the process of optimizing a model’s hyperparameter values to maximize the predictive quality of the model. <a href="https://github.com/kubeflow/katib">Katib</a> automates the Hyperparameter Tuning process thereby eliminating errors that arise from manual intervention and also saves much-needed resources. Katib is agnostic to ML Frameworks and supports a variety of traditional Hyperparameter Tuning Algorithms. Its concepts are Experiments, Suggestions, Trials, and WorkerJob which are all Custom Resource Definitions integrated on the Kubernetes Engine.</p><p>In a nutshell, an Experiment runs several Trials until an objective is reached. Each Trial evaluates Suggestions which are HP values proposed by the tuning process. The WorkerJob evaluates a Trial and calculates its objective value.</p><p>This section shows how to create and configure an Experiment for the TensorFlow training job. In terms of Kubernetes, such an experiment is a Custom Resource Definition (CRD) run by the Katib operator.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*i74JnjGzTsLRV4-L5h8a7Q.png" /></figure><p>How to Create Experiments:</p><ol><li>Set up a few basic definitions that can be reused:</li></ol><pre>TF_EXPERIMENT_FILE = &quot;katibheartdisease-tfjob-experiment.yaml&quot;</pre><pre>import re</pre><pre>from IPython.utils.capture import CapturedIO</pre><pre>def get_resource(captured_io: CapturedIO) -&gt; str:<br>    &quot;&quot;&quot;<br>    Gets a resource name from `kubectl apply -f &lt;configuration.yaml&gt;`.</pre><pre>    :param str captured_io: Output captured by using `%%capture` cell magic<br>    :return: Name of the Kubernetes resource<br>    :rtype: str<br>    :raises Exception: if the resource could not be created<br>    &quot;&quot;&quot;<br>    out = captured_io.stdout<br>    matches = re.search(r&quot;^(.+)\s+created&quot;, out)<br>    if matches is not None:<br>        return matches.group(1)<br>    else:<br>        raise Exception(f&quot;Cannot get resource as its creation failed: {out}. It may already exist.&quot;)</pre><p>2. TensorFlow: Katib TFJob Experiment:</p><p>The TFJob definition for this example is based on the TensorFlow operator notebook shown earlier. For our experiment, we focused on the learning rate, batch-size and optimizer. The following YAML file describes an Experiment object:</p><pre>%%writefile $TF_EXPERIMENT_FILE<br>apiVersion: &quot;kubeflow.org/v1beta1&quot;<br>kind: Experiment<br>metadata:<br>  namespace: mavencodeai<br>  name: heart<br>spec:<br>  parallelTrialCount: 3<br>  maxTrialCount: 12<br>  maxFailedTrialCount: 3<br>  objective:<br>    type: maximize<br>    goal: 0.8<br>    objectiveMetricName: accuracy<br>  algorithm:<br>    algorithmName: random<br>  metricsCollectorSpec:<br>    kind: StdOut<br>  parameters:<br>    - name: learning_rate<br>    parameterType: double<br>    feasibleSpace:<br>      min: &quot;0.01&quot;<br>      max: &quot;0.1&quot;<br>   - name: batch_size<br>   parameterType: int<br>   feasibleSpace:<br>     min: &quot;50&quot;<br>     max: &quot;100&quot;<br>  - name: optimizer<br>  parameterType: categorical<br>  feasibleSpace:<br>    list:<br>      - rmsprop<br>      - adam<br>  trialTemplate:<br>    primaryContainerName: tensorflow<br>    trialParameters:<br>      - name: learningRate<br>      description: Learning rate for the training model<br>      reference: learning_rate<br>      - name: batchSize<br>      description: Batch Size<br>      reference: batch_size<br>      - name: optimizer<br>      description: Training model optimizer (sdg, adam)<br>      reference: optimizer<br>  trialSpec:<br>    apiVersion: &quot;kubeflow.org/v1&quot;<br>    kind: TFJob<br>    spec:<br>      tfReplicaSpecs:<br>        Worker:<br>          replicas: 1<br>          restartPolicy: OnFailure<br>          template:<br>            metadata:<br>              annotations:<br>                sidecar.istio.io/inject: &quot;false&quot;<br>            spec:<br>              containers:<br>                - name: tensorflow<br>                image: mavencodevv/tfjob_heart:v.0.1<br>                command:<br>                  - &quot;python&quot;<br>                  - &quot;/tfjobheart.py&quot;<br>                  - &quot;--batch_size=${trialParameters.batchSize}&quot;<br>                  - <br>&quot;--learning_rate=${trialParameters.learningRate}&quot;<br>                  - &quot;--optimizer=${trialParameters.optimizer}&quot;</pre><p>3. Run and Monitor Experiments:</p><p>You can either execute these commands on your local machine with kubectl or on the notebook server:</p><pre>%%capture kubectl_output --no-stderr<br>! kubectl apply -f $TF_EXPERIMENT_FILE</pre><p>The cell magic grabs the output of the kubectl command and stores it in an object named kubectl_output. From there we can use the utility function we defined earlier:</p><pre>EXPERIMENT = get_resource(kubectl_output)</pre><p>4. See experiment status:</p><pre>! kubectl describe $EXPERIMENT</pre><p>5. Get the list of created experiments:</p><pre>! kubectl get experiments</pre><p>6. Get the list of created trials:</p><pre>! kubectl get trials</pre><p>7. After the experiment is completed, use describe to get the best trial results:</p><pre>! kubectl describe $EXPERIMENT</pre><p>8. Delete Katib job to free up resources:</p><pre>! kubectl delete -f $TF_EXPERIMENT_FILE</pre><p>9. Check to see if the pod is still up and running:</p><pre>! kubectl -n mavencodeai logs -f heart</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*jiF5K2c9MS7Na2oYZdw9BQ.png" /><figcaption>Figure 20: Result of Katib Experiment</figcaption></figure><h3><strong>Model Deployment Using KubeFlow</strong></h3><p>Deployment is a crucial factor in the ML process. For the models built to be effective to real-life users, there is a need to position our model on a platform that can successfully receive data from as many users as needed and output the predictions. For this use case, we will make use of the KubeFlow platform which provides helpful services and tools that ease the development, deployment, and management of portable, scalable machine learning projects.</p><p>In building Kubeflow Pipelines, the available options to build KubeFlow Pipelines are the Lightweight and Reusable Components. The former is easy to build and update; useful for Testing and Deployment while the latter are stable containerized functions useful for multiple projects. For our use case, the Reusable Components option was adopted. The process is broken down as follows.</p><ol><li>Creating self-contained ML code:</li></ol><p>When creating reusable components, our first step involves creating functions of our ML code that can pass data between themselves, with all other packages needed to run contained within the function. Each step of the ML process should be packaged in this way.</p><pre>import argparse<br>def lr(clean_data):<br>  import joblib<br>  import numpy as np<br>  import pandas as pd<br>  from sklearn import metrics<br>  from sklearn.linear_model import LogisticRegression</pre><pre>data = joblib.load(clean_data)<br>X_train = data[&#39;X_train&#39;]<br>y_train = data[&#39;Y_train&#39;]<br>X_test = data[&#39;X_test&#39;]<br>y_test = data[&#39;Y_test&#39;]</pre><pre>lr_model = LogisticRegression()</pre><pre>lr_model.fit(X_train, y_train)</pre><pre>y_pred = lr_model.predict(X_test)</pre><pre># Test score<br>test = lr_model.score(X_test, y_test)<br>train = lr_model.score(X_train, y_train)<br>print(&#39;test accuracy:&#39;)<br>print(test)<br>print(&#39;train accuracy:&#39;)<br>print(train)</pre><pre>#Classification Report<br>report = metrics.classification_report(y_test, y_pred, output_dict=True)<br>df_classification_report = pd.DataFrame(report).transpose()<br>print(df_classification_report)</pre><pre>lr_metrics = {&#39;train&#39;:train, &#39;test&#39;:test, &#39;report&#39;:df_classification_report, &#39;model&#39;:lr_model}<br>joblib.dump(lr_metrics,&#39;lr_metrics&#39;)</pre><pre>if __name__ == &#39;__main__&#39;:<br>  parser = argparse.ArgumentParser()<br>  parser.add_argument(&#39;--clean_data&#39;)<br>  args = parser.parse_args()<br>  lr(args.clean_data)</pre><p>2. Create Docker Images:</p><p>Using our packaged ML functions, we create Docker Images and push them to the repository where they can be called when needed by the pipeline. By sectioning each step of our code into components, any step can be repeated, scaled, or transformed individually without affecting the other components in the pipeline. Creating Docker Images requires the Docker package in a command line and an account with a repository like DockerHub.</p><pre>FROM python:3.8<br>WORKDIR /logistic<br>RUN pip install -U scikit-learn numpy pandas joblib<br>COPY logistic.py /logistic<br>ENTRYPOINT [&quot;python&quot;, &quot;logistic.py&quot; ]</pre><p>This Dockerfile directs the installation of python 3.8 as the base of our image’s functioning, installs the needed packages, and creates a working directory for our python function.</p><p>With both the python function (logistic.py) and the Docker file in the same directory, we can run some Docker Commands to build and push the image to the repository.</p><pre># build the image<br>docker build --tag=lr_heart:v.0.1 .<br>#tag to a docker repository<br>docker tag lr_heart:v.0.1 mavencodevv/lr_heart:v.0.1<br># push the image to the repository<br>docker push mavencodevv/lr_heart:v.0.1</pre><p>With mavencodevv as the user id and lr_heart as our image tag, we have successfully built the image for our logistic regression component. Each step of our ML pipeline will be built this way before being compiled using Kubeflow’s pipeline functions.</p><p>3. Building and Compiling the Pipeline:</p><p>Using a Jupyter notebook environment, we will utilize the KubeFlow Python package to build our pipeline from our created images then compile it for deployment.</p><pre># Installing the Kubeflow SDK<br>!python -m pip install --user --upgrade pip<br>!pip3 install kfp --upgrade --user</pre><pre># Restart the runtime then import the packages<br>import kfp<br>from kfp import dsl<br>import kfp.components as comp</pre><p>First, we install the packages then we create component functions built from our created images.</p><pre>def lr_op(clean_data):<br>    return dsl.ContainerOp(<br>        name = &#39;Logistic Regression&#39;,<br>        image = &#39;mavencodevv/logistic_heart:v.0.1&#39;,<br>        arguments = [&#39;--clean_data&#39;, clean_data<br>            ],<br>        file_outputs={<br>            &#39;lr_metrics&#39;: &#39;/logistic/lr_metrics&#39;<br>        }<br>)</pre><p>The inputs and outputs are explicitly stated to facilitate the passage of data between components, once all our pipeline components are packaged in this way we can call them in a final pipeline function that contains all the components created.</p><pre>@dsl.pipeline(<br>    name=&#39;Heart Attack Prediction&#39;,<br>   description=&#39;An ML reusable pipeline that predicts the chances of a patient having heart attack&#39;<br>)</pre><pre># Define parameters to be fed into pipeline<br>def heart_pipeline(bucket_name, credentials):</pre><pre>  _load_data_op = load_data_op()</pre><pre>  _stat_op  = stat_op(<br>        dsl.InputArgumentPath(_load_data_op.outputs[&#39;data&#39;])<br>).after(_load_data_op)</pre><pre>  _schema_op = schema_op(<br>      dsl.InputArgumentPath(_stat_op.outputs[&#39;stats&#39;])<br>).after(_stat_op)</pre><pre>_val_op = val_op(<br>    dsl.InputArgumentPath(_stat_op.outputs[&#39;stats&#39;]),<br>    dsl.InputArgumentPath(_schema_op.outputs[&#39;schema&#39;])<br>).after(_stat_op,_schema_op)</pre><pre>_preprocess_op = preprocess_op(<br>      dsl.InputArgumentPath(_load_data_op.outputs[&#39;data&#39;])<br>).after(_load_data_op,_val_op)</pre><pre>_rf_op = rf_op(<br>      dsl.InputArgumentPath(_preprocess_op.outputs[&#39;clean_data&#39;])<br>).after(_preprocess_op)</pre><pre>_keras_op = keras_op(<br>      dsl.InputArgumentPath(_preprocess_op.outputs[&#39;clean_data&#39;])<br>).after(_preprocess_op)<br>  _lr_op = lr_op(<br>        dsl.InputArgumentPath(_preprocess_op.outputs[&#39;clean_data&#39;])<br>).after(_preprocess_op)<br>  _cb_op = cb_op(<br>        dsl.InputArgumentPath(_preprocess_op.outputs[&#39;clean_data&#39;])<br>).after(_preprocess_op)<br>  _knn_op = knn_op(<br>        dsl.InputArgumentPath(_preprocess_op.outputs[&#39;clean_data&#39;])<br>).after(_preprocess_op)</pre><pre>_sv_op = sv_op(<br>      dsl.InputArgumentPath(_preprocess_op.outputs[&#39;clean_data&#39;])<br>).after(_preprocess_op)</pre><pre>_eval_op = eval_op(<br>      dsl.InputArgumentPath(_rf_op.outputs[&#39;rf_metrics&#39;]),<br>      dsl.InputArgumentPath(_keras_op.outputs[&#39;keras_metrics&#39;]),<br>      dsl.InputArgumentPath(_lr_op.outputs[&#39;lr_metrics&#39;]),<br>      dsl.InputArgumentPath(_cb_op.outputs[&#39;cb_metrics&#39;]),<br>      dsl.InputArgumentPath(_knn_op.outputs[&#39;knn_metrics&#39;]),<br>      dsl.InputArgumentPath(_sv_op.outputs[&#39;sv_metrics&#39;])<br>).after(_rf_op,_keras_op,_lr_op,_cb_op,_knn_op,_sv_op)</pre><pre>_push_op = push_op(bucket_name, credentials,<br>      dsl.InputArgumentPath(_eval_op.outputs[&#39;best_model&#39;])<br>).after(_eval_op)</pre><p>Our pipeline goes through loading the data, carrying out descriptive statistics, data validation before processing data for our six models before evaluation of the metrics, and exporting the best model to the cloud storage. Each step started as a self contained function with docket images created from them.</p><p>This pipeline function can then be compiled into a yaml file, zip and tar.gz formats are also acceptable and then uploaded to the KubeFlow platform.</p><pre># Compile pipeline to generate compressed YAML definition of the pipeline.<br>experiment_name = &#39;heart_pipeline&#39;</pre><pre>kfp.compiler.Compiler().compile(heart_pipeline, &#39;{}.yaml&#39;.format(experiment_name))</pre><pre># Client requires an endpoint and credentials to your Kubeflow<br>client = kfp.Client()<br>client.create_run_from_pipeline_func(heart_pipeline, arguments={})</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/823/1*6cWc4fkjwnDppllaq7tR3A.png" /><figcaption>Figure 21: KubeFlow Pipeline</figcaption></figure><h3><strong>Conclusion</strong></h3><p>AI’s goal is to make computers and other devices more effective in solving difficult healthcare problems, and by doing so, we can interpret data collected from the diagnosis of chronic diseases such as cardiovascular (heart) diseases. In the same vein, we have applied tools and techniques in machine learning to our heart disease use case to help predict the likelihood of a person having a heart attack or not. We went a step further to make our results available for Android devices for portability and scalability. To this end, early diagnosis of the likelihood of a person having a heart attack with our approach will be very helpful in minimizing complications of the disease.</p><h3><strong>References</strong></h3><p><a href="https://www.who.int/health-topics/cardiovascular-diseases/#tab=tab_1">https://www.who.int/health-topics/cardiovascular-diseases/#tab=tab_1</a></p><p><a href="https://www.cdc.gov/heartdisease/facts.htm">https://www.cdc.gov/heartdisease/facts.htm</a></p><p><a href="https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset">https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset</a></p><p><a href="http://rstudio-pubs-static.s3.amazonaws.com/24341_184a58191486470cab97acdbbfe78ed5.html">http://rstudio-pubs-static.s3.amazonaws.com/24341_184a58191486470cab97acdbbfe78ed5.html</a></p><p><a href="https://docs.seldon.io/projects/alibi/en/latest/methods/ALE.html">https://docs.seldon.io/projects/alibi/en/latest/methods/ALE.html</a></p><p><a href="https://developer.android.com/ml?authuser=1">https://developer.android.com/ml?authuser=1</a></p><p><a href="https://www.tensorflow.org/lite/guide?authuser=1">https://www.tensorflow.org/lite/guide?authuser=1</a></p><p><a href="https://enterprisersproject.com/article/2019/2/kubernetes-operators-plain-english">https://enterprisersproject.com/article/2019/2/kubernetes-operators-plain-english</a></p><p><a href="https://docs.d2iq.com/dkp/kaptain/1.0.1-0.5.0/tutorials/metadata/">https://docs.d2iq.com/dkp/kaptain/1.0.1-0.5.0/tutorials/metadata/</a></p><p><a href="https://www.who.int/health-topics/cardiovascular-diseases/#tab=tab_1">https://www.who.int/health-topics/cardiovascular-diseases/#tab=tab_1</a></p><p><a href="https://www.heart.org/en/health-topics/heart-attack/understand-your-risks-to-prevent-a-heart-attack">https://www.heart.org/en/health-topics/heart-attack/understand-your-risks-to-prevent-a-heart-attack</a></p><p>Website: <a href="https://www.mavencode.com">www.mavencode.com</a> <br>Twitter: @mavencode<br>Email: ai@mavencode.com</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e2db0bbde951" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Simplifying Data Preparation For AI Model Training]]></title>
            <link>https://mavencode.medium.com/simplifying-data-preparation-for-ai-model-training-39d3f8e259c3?source=rss-b55720387b55------2</link>
            <guid isPermaLink="false">https://medium.com/p/39d3f8e259c3</guid>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[data-curation]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[MavenCode]]></dc:creator>
            <pubDate>Fri, 16 Aug 2019 22:53:40 GMT</pubDate>
            <atom:updated>2019-08-16T22:53:40.848Z</atom:updated>
            <content:encoded><![CDATA[<p>In the past few months at MavenCode we have been working on “Contextual Scene Parsing” projects where we needed to identify entities in an image with great accuracy (down to the pixel level) using a Deep Learning Semantic Segmentation approach. One of the primary challenges for effective machine learning and artificial intelligence modeling is to have a constant stream of datasets to train models accurately. However, it’s not only the quantity that matters, but also the quality of the training datasets. If the dataset that is being fed in for model training has not been cleaned, reviewed, and properly curated one cannot expect to see any worthwhile results. As the old saying goes among programmers and scientists, “Garbage in, Garbage out.” While strides have been made over the last few years to develop tools that help with data cleaning and preparation, human input is often still required to prepare and curate the data, depending on the difficulty and type of modeling undertaken. This multi-stage process can be slow, tiring and cumbersome and be one of the main choke points holding back the end to end, large scale operation of ML and AI training pipelines. Worse yet, it could also inhibit any possibility of large scale model deployment in production.</p><p>At MavenCode we develop Artificial Intelligence Solutions for our clients and constantly think about how we can evolve to make the entire ML and AI process (i.e. the pipeline) a lot more efficient for our team during implementation and development cycle. Ironically, we find ourselves spending a significant amount of time in the process of curating and staging datasets needed for our model training. In this post, we will briefly describe how we built a pipeline that allows us to load, stage, assign tasks, review and validate the results before pushing the curated datasets to our training job</p><h4><strong>Stage 1: Data Acquisition &amp; Curation</strong></h4><p>In the interest of full disclosure, we run our entire operation on Kubernetes because it allows us to instantiate and run ephemeral compute tasks including fetching images, computing checksums, and fingerprinting for near-duplicate image detection during the ingestion stage. If we are fortunate enough, a client may be able to provide us with an archive of historical datasets that we can leverage to get started. However, in most cases, we have had to source the data with crawlers by pulling in data from the web whenever we need to create a quick Proof Of Concept (POC).</p><p>Inside our tool that we have built internally, ingested data is organized by dates and we create a manifest file that represents the attributes of the data we ingested on each run. Our run process is orchestrated as a Kubernetes Cron Job task.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*rcMelVd9GFGnrsLq" /></figure><p>We pull data and display it in a WebUI for sorting and organization. Our reviewers can quickly eyeball and verify the integrity of the data and make sure that what we have is relevant to the context of the problem we are trying to solve. For example, if we were trying to identify different objects that would normally be located in the bedroom, a stove or bathtub showing up will be a misnomer that will negatively impact our model’s training down the road. As such, fixing this kind of problem early on in the process goes a long way in ensuring we get started on the right path towards creating a well performing model.</p><h4>Stage 2: Data Labeling &amp; Validation</h4><p>Once data is available in the staging bucket, the next goal is for us to be able to annotate and label the data as accurately as possible. In some cases, this may require us to hire additional people to augment our in-house team, which we will train and equip with the necessary domain knowledge of the problem we are trying to solve. We assign the users the “Annotator” role on our Web UI tool, and then select a set images and assign it to their corresponding task buckets. We leverage Amazon SageMake Ground Truth to simplify this stage of the process for us, and we basically bootstrap a labeling Job using Terraform. Once this process runs to completion, it provisions each user in our work pool with a labeling task (or job). As each Annotator gets done with their assigned labeling tasks we process the result and store it in a Review bucket for subsequent evaluation and validation by our Reviewer users. The data output is validated and compared with the original input (i.e. the Ground Truth). If any of the annotated images are not good enough, we drop them from the annotated dataset and re-stage them for another Annotator user to pick up and re-annotate.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*YrBZYAzJsnrtS-4J" /></figure><h4><strong>Stage 3: Data Transformation &amp; Tfrecord Creation</strong></h4><p>Once a sufficient number of validated image annotations are stored in the Data Transformation staging bucket, these annotations will automatically be converted to pixel maps as shown below, and subsequently into tfrecords format needed by Tensorflow for the semantic segmentation model training.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*rsLBwcYdR30Rs9iG" /></figure><p>Once a new data batch gets delivered to our training file storage location (e.g. AWS S3 or Google storage) a training job automatically kicks off with the generated tfrecords data as the input. This runs to completion on our selected managed cloud infrastructure (i.e. we using Kubeflow Pipelines to bootstrap Google Cloud ML or Tensorflow servers on AWS).</p><p>In this short blog post, we have shown how a cloud-based automated tool that we developed has improved the efficiency and velocity of image dataset preparation and curation for our ML and AI model training at MavenCode. If you or your organization needs help with operationalizing your ML &amp; AI infrastructure, please do not hesitate to reach out to us.</p><p>Website: <a href="https://www.mavencode.com">www.mavencode.com</a> <br>Twitter: @mavencode<br>Email: ai@mavencode.com</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=39d3f8e259c3" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>