Data Labeling in Machine Learning with Python Explore Modern Ways to Prepare Labeled Data for Training and Fine-Tuning ML and Generative AI Models

Take your data preparation, machine learning, and GenAI skills to the next level by learning a range of Python algorithms and tools for data labeling Key Features Generate labels for regression in scenarios with limited training data Apply generative AI and large language models (LLMs) to explore an...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Suda, Vijaya Kumar, author (author)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Birmingham, UK : Packt Publishing [2024]
Edición:	First edition
Materias:	Python (Computer program language) Machine learning. Computer programming.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009799143006719

Tabla de Contenidos:

Cover
Title Page
Copyright
Acknowledgments
Contributors
Table of Contents
Preface
Part 1: Labeling Tabular Data
Chapter 1: Exploring Data for Machine Learning
Technical requirements
EDA and data labeling
Understanding the ML project life cycle
Defining the business problem
Data discovery and data collection
Data exploration
Data labeling
Model training
Model evaluation
Model deployment
Introducing Pandas DataFrames
Summary statistics and data aggregates
Summary statistics
Data aggregates of the feature for each target class
Creating visualizations using Seaborn for univariate and bivariate analysis
Univariate analysis
Bivariate analysis
Profiling data using the ydata-profiling library
Variables section
Interactions section
Correlations
Missing values
Sample data
Unlocking insights from data with OpenAI and LangChain
Summary
Chapter 2: Labeling Data for Classification
Technical requirements
Predicting labels with LLMs for tabular data
Data labeling using Snorkel
What is Snorkel?
Why is Snorkel popular?
Loading unlabeled data
Creating the labeling functions
Labeling rules
Constants
Labeling functions
Creating a label model
Predicting labels
Labeling data using the Compose library
Labeling data using semi-supervised learning
What is semi-supervised learning?
What is pseudo-labeling?
Labeling data using K-means clustering
What is unsupervised learning?
K-means clustering
Inertia
Dunn's index
Summary
Chapter 3: Labeling Data for Regression
Technical requirements
Using summary statistics to generate housing price labels
Finding the closest labeled observation to match the label
Using semi-supervised learning to label regression data
Pseudo-labeling.
Using data augmentation to label regression data
Using k-means clustering to label regression data
Summary
Part 2: Labeling Image Data
Chapter 4: Exploring Image Data
Technical requirements
Visualizing image data using Matplotlib in Python
Loading the data
Checking the dimensions
Visualizing the data
Checking for outliers
Performing data preprocessing
Checking for class imbalance
Identifying patterns and relationships
Evaluating the impact of preprocessing
Practice example of visualizing data
Practice example for adding annotations to an image
Practice example of image segmentation
Practice example for feature extraction
Analyzing image size and aspect ratio
Impact of aspect ratios on model performance
Image resizing
Image normalization
Performing transformations on images - image augmentation
Summary
Chapter 5: Labeling Image Data Using Rules
Technical requirements
Labeling rules based on image visualization
Image labeling using rules with Snorkel
Weak supervision
Rules based on the manual visualization of an image's object color
Real-world applications
A practical example of plant disease detection
Labeling images using rules based on properties
Bounding boxes
Example 1 - image classification - a bicycle with and without a person
Example 2 - image classification - dog and cat images
Labeling images using transfer learning
Example - digit classification using a pre-trained classifier
Example - person image detection using the YOLO V3 pre-trained classifier
Example - bicycle image detection using the YOLO V3 pre-trained classifier
Labeling images using transformations
Summary
Chapter 6: Labeling Image Data Using Data Augmentation
Technical requirements
Training support vector machines with augmented image data.
Kernel trick
Data augmentation
Image data augmentation
Implementing an SVM with data augmentation in Python
Introducing the CIFAR-10 dataset
Loading the CIFAR-10 dataset in Python
Preprocessing the data for SVM training
Implementing an SVM with the default hyperparameters
Evaluating SVM on the original dataset
Implementing an SVM with an augmented dataset
Training the SVM on augmented data
Evaluating the SVM's performance on the augmented dataset
Image classification using the SVM with data augmentation on the MNIST dataset
Convolutional neural networks using augmented image data
How CNNs work
Practical example of a CNN using data augmentation
CNN using image data augmentation with the CIFAR-10 dataset
Summary
Part 3: Labeling Text, Audio, and Video Data
Chapter 7: Labeling Text Data
Technical requirements
Real-world applications of text data labeling
Tools and frameworks for text data labeling
Exploratory data analysis of text
Loading the data
Understanding the data
Cleaning and preprocessing the data
Exploring the text's content
Analyzing relationships between text and other variables
Visualizing the results
Exploratory data analysis of sample text data set
Exploring Generative AI and OpenAI for labeling text data
GPT models by OpenAI
Zero-shot learning capabilities
Text classification with OpenAI models
Data labeling assistance
OpenAI API overview
Use case 1 - summarizing the text
Use case 2 - topic generation for news articles
Use case 3 - classification of customer queries using the user-defined categories and sub-categories
Use case 4 - information retrieval using entity extraction
Use case 5 - aspect-based sentiment analysis
Hands-on labeling of text data using the Snorkel API.
Hands-on text labeling using Logistic Regression
Hands-on label prediction using K-means clustering
Generating labels for customer reviews (sentiment analysis)
Summary
Chapter 8: Exploring Video Data
Technical requirements
Loading video data using cv2
Extracting frames from video data for analysis
Extracting features from video frames
Color histogram
Optical flow features
Motion vectors
Deep learning features
Appearance and shape descriptors
Visualizing video data using Matplotlib
Frame visualization
Temporal visualization
Motion visualization
Labeling video data using k-means clustering
Overview of data labeling using k-means clustering
Example of video data labeling using k-means clustering with a color histogram
Advanced concepts in video data analysis
Motion analysis in videos
Object tracking in videos
Facial recognition in videos
Video compression techniques
Real-time video processing
Video data formats and quality in machine learning
Common issues in handling video data for ML models
Troubleshooting steps
Summary
Chapter 9: Labeling Video Data
Technical requirements
Capturing real-time video
Key components and features
A hands-on example to capture real-time video using a webcam
Building a CNN model for labeling video data
Using autoencoders for video data labeling
A hands-on example to label video data using autoencoders
Transfer learning
Using the Watershed algorithm for video data labeling
A hands-on example to label video data segmentation using the Watershed algorithm
Computational complexity
Performance metrics
Real-world examples for video data labeling
Advances in video data labeling and classification
Summary
Chapter 10: Exploring Audio Data
Technical requirements.
Real-life applications for labeling audio data
Audio data fundamentals
Hands-on with analyzing audio data
Example code for loading and analyzing sample audio file
Best practices for audio format conversion
Example code for audio data cleaning
Extracting properties from audio data
Tempo
Chroma features
Mel-frequency cepstral coefficients (MFCCs)
Zero-crossing rate
Spectral contrast
Considerations for extracting properties
Visualizing audio data with matplotlib and Librosa
Waveform visualization
Loudness visualization
Spectrogram visualization
Mel spectrogram visualization
Considerations for visualizations
Ethical implications of audio data
Recent advances in audio data analysis
Troubleshooting common issues during data analysis
Troubleshooting common installation issues for audio libraries
Summary
Chapter 11: Labeling Audio Data
Technical requirements
Downloading FFmpeg
Azure Machine Learning
Real-time voice classification with Random Forest
Transcribing audio using the OpenAI Whisper model
Step 1 - importing the Whisper model
Step 2 - loading the base Whisper model
Step 3 - setting up FFmpeg
Step 4 - transcribing the YouTube audio using the Whisper model
Classifying a transcription using Hugging Face transformers
Hands-on - labeling audio data using a CNN
Exploring audio data augmentation
Introducing Azure Cognitive Services - the speech service
Creating an Azure Speech service
Speech to text
Speech translation
Summary
Chapter 12: Hands-On Exploring Data Labeling Tools
Technical requirements
Azure Machine Learning data labeling
Label Studio
pyOpenAnnotate
Data labeling using Azure Machine Learning
Benefits of data labeling with Azure Machine Learning
Data labeling steps using Azure Machine Learning.
Image data labeling with Azure Machine Learning.

Data Labeling in Machine Learning with Python Explore Modern Ways to Prepare Labeled Data for Training and Fine-Tuning ML and Generative AI Models

Ejemplares similares