The Google Cloud Platform provides
a comprehensive big data solution in a single platform.
The Google Cloud Platform is a full
service platform, and it's set up so that you can utilize not only cloud native
services, but also open-source tools. It also supports both batch and stream
data processing modes.
Google Cloud Platform resources
consist of physical resources, like computers and hard disk drives, as well as
virtual resources, for example virtual machines. It's global in scope, with
resources located around the world in Google data centers. And global
distribution has a number of positive implications, including redundancy in the
event of failure. The vast reach of the global data centers offered by Google
Cloud Platform mean that you can pretty much deploy whatever number of
resources you need to without worry. It also means reduced latency, since you
can locate your services at a data center close to your end users.
Resources reside in regions or
zones. A region is a particular geographical location where resources run. Each
region contains one or more zones. For example, the us-central1 region
specifies a region in central US that has zones in us-central1-a,
us-central1-b, -c, and -f. Resources that reside in a zone, for example
resources like virtual machine instances or persistent disks, are called zonal
resources. Other resources, like static external IP addresses, are regional.
Regional resources can be consumed by any resources within that region. And
that includes any zone within that region, while zonal resources are only used
by other resources within the same zone.
Google Cloud Platform resources are hosted across
multiple locations globally. Placing resources in different zones within a
region provides isolation from several common types of infrastructure,
software, and hardware failures. Placing resources in different regions
provides an even higher level of protection against failure. The bottom line is
that you can design robust systems using resources that are spread across
different failure domains. Compute Engine resources are either global,
regional, or zonal. As an example, images themselves are global resources,
while disks are zonal. Global resources are accessible by resources regardless
of region or zone. So virtual machine instances from different zones can apply
the same global image. The scope of a resource indicates how accessible it is
to other resources. Though all resources, regardless of whether global, zonal,
or regional, must be uniquely named within a project. What this means is that
you can't, for example, name a virtual machine instance demo instance in one
zone and then try to name another VM within the same project with that same
name.
Google Cloud Platform Services
Google Cloud Platform provides a
huge number of services.
Some of the more common services
include computing and hosting, storage, networking as well
as big data.Let's look at computing and hosting first. First,
managed application platform. This is offered as Google App Engine and it's a platform as a service
offering. It's a somewhat hands off approach in which you allow Google to
manage hosting, scaling, and monitoring services. Well, for example, if traffic
to your e-commerce website is a dramatic upturn, Google will automatically
scale the system for you.
And container based
computing. This is focused on application code rather than deployment and
hosting. Google Kubernetes Engine is referred to as containers as a service,
and is very mature, and one of the most powerful container orchestration
platform. Virtual machines are offered as a service called Google Compute Engine,
and this is considered to be a type of infrastructure as a service.
With this type of service, you are
responsible for configuration, administration, and monitoring tasks. In other
words, Google will make sure that reliable resources are always available and
up to date, but it's on you to manage and provision them.
Now, let's look at storage services. Cloud SQL, which is
a database service based on Structured Query Language or SQL, and it offers
either MySQL or PostgreSQL databases. Google Cloud Platform offers two types of
NoSQL data storage. One is Cloud Datastore and Cloud Bigtable is the other.
Cloud Spanner is a fully managed, highly available, relational database service
for mission critical application. Cloud Storage offers large capacity,
consistent and scalable data storage. And Compute Engine offers persistent
disks. And this is available as the primary storage for your instances with
both standard persistent disks, as well as solid state drives.
Now, let's look at networking
services. Compute
Engine provides networking services for virtual machine instances to
use. You can load balance traffic across multiple instances. And there's Cloud
DNS as well, which allows you to create and manage domain name system records.
And Google Cloud
Interconnect is an advanced connectivity service which allows you to
connect your existing network to Google Cloud Platform networking resources.
And finally, big data services.
First, BigQuery,
this is a data analysis service. The data analysis services include custom
schema creation, so you can organize your data as you wish. For example, you
may have a schema structure in mind using specific datasets and tables.It
offers the convenience of large dataset querying using SQL-like commands, so
the learning curve is more manageable. It provides for loading, querying, and
other operations via jobs. And supports managing and protecting data with
controllable and manageable permission.
Cloud Dataflow is a managed service that includes software
development kits or SDKs for batch and streaming data processing modes. And
Cloud Dataflow is also applicable for extract, transform, load, or ETL
operation. Then there's Cloud
Pub/Sub. This is an asynchronous messaging service. It allows an
application to send messages as JSON structures. And on the receiving end is a
publishing unit referred to as a topic. These topics are global resources and
what means is that other applications and projects owned by your organization
can subscribe to the topic. Thereby receiving those messages in the body of
HTTP requests or responses.
Benefits of Google Cloud Platform
Some of the main benefits of Google
Cloud Platform include ,
Let's have a closer look at these.
Future-proof infrastructure includes factors like live migration. And that
means you can move Google Compute Engines instances to nearby hosts, even while
they are active and under high load. Google Cloud Platform offers
pricing innovations like per second billing and discounts for sustained
use. The platform allows you to configure a wide combination of memory and
virtual CPU, helping to avoid over-provisioning when sizing hardware for a
particular workload. Fast archive restore provides a high throughput for right
now restoration of data. Google's load balancer is the same system that
supplies load balancing to Google products like Gmail and Google Maps over a
global distributive platform. It's super fast and capable of tolerating extreme
bites of traffic. You can take advantage of the Google security model, built
and maintained by some of the top, application, information and network
security experts. This is the same infrastructure that secures Google
applications like Gmail and G Suite.
Google maintains a global network
footprint, boasting over 100 points of presence, banning over 30 countries. Now
let's look at powerful data and analytics as a benefit. You can build
distributed services or fast results on the platform. BigQuery, Cloud Datalab
and Cloud Dataproc. These are the same services that Google uses. So queries
that traditionally take hours or days, can now be performed in a fraction of
the time. Google Cloud Platform offers powerful applications and tools for
working with big data, with data processing tools like Cloud Dataflow, Cloud
Pub/Sub, BigQuery, and Cloud Datalab. Making it easier to use extreme volumes
of data to deliver results.
Again, these are the same products
that Google itself uses. And Google Cloud Machine Learning provides access to
powerful deep learning system that google uses for services like Google
Translate and Google Photos, as well as voice search. With respect to
serverless computing, there are no upfront provisioning costs. So resources are
allocated dynamically as needed. You simply bring your code and data. It's full
management of servers and eliminates the repetitive tasks and potential errors
that are inherent in tasks like scaling clusters and applying security badges.
With automatic scaling and dynamic provisioning of resources, you pay only for
what you use.
Let's consider a couple of use
cases for a serverless computing. Take for example, web backend. You employ
Google's App Engine with the highly scalable NoSQL Cloud Datastore database for
a full scale, powerful backend infrastructure. Or Internet of Things or IoT
device messaging, combined the real time geo-redundent Cloud Pub/Sub messaging
service, with Cloud Dataflow serverless stream and batch data processing. When
considering extract transform and load or ETL, we could combine Cloud Dataflow
again for stream and batch data processing, with BigQuery for serverless data
warehousing. [Other examples of Serverless Use Cases are shown.
Now one of the other benefits, customer-friendly pricing.
As pointed out earlier, you do not have to commit to a specific deployment
size, so no upfront costs are involved. You pay as you go and with per second
pricing, that means that you pay for services as you require them. So you don't
have to maintain a mountain of hardware and have that hardware sitting there
idle. And you stop paying when you stop using a service, with no termination
fees. Google Cloud Platform offers, as another benefit, data center innovation.
For example, high performance virtual machines for fast and consistent
performance.
Google's global network provides
fast networking, performance, strong redundancy and high availability. Live migration technology
means that maintenance of virtual machines is transparent, never requiring down
time for scheduled maintenance. Google maintains very high security compliance
and standards, providing some of the most secure infrastructure on earth. And
Google builds its data centers with energy efficiency top of mind. In fact,
Google is the first major Internet services organisation to obtain ISO 50,000 I
certification. And Google has reportedly been carbon-neutral for over a decade.
Now, consider security.
Google security model is an
end-to-end process. Google uses practices and controls to secure data access
and when retired, hard disks that contain customer information undergo a data
destruction process. With only a few exceptions, customer data stored at rest
is always encrypted on Google Cloud Platform. The encryption is automatic and
transparent, so no customer intervention or action is required. Google's secure
global network helps to improve security of in transit data. And the
combination of cloud interconnect and managed VPN, means that you can create
encrypted channels from an on-prem private IP environment, to Google's network.
In addition to that, Cloud Security Scanner assists app engine developers to
identify common vulnerabilities.
Google Cloud Platform also allows
the configuration of user permissions at the project level, for full control of
who has access to what resources and at what level. Using tools like Google
Cloud logging and Google Cloud monitoring, simplifies the collection and
analysis of request logs, as well as the monitoring of infrastructure services
availability.
Comparing GCP and Other Models
When you're talking about cloud services
suppliers, there are really three main suppliers. That's Amazon Web Services,
Google Cloud Platform, and Microsoft Azure. The major differences from platform
to platform include pricing. And one thing that you should keep in mind when
considering pricing is this. How to calculate cost? For example, the pricing
for Amazon's EC2 and Azure's Virtual Machines scalable computing service can
get pretty complicated. While Google's scalable computing service is perhaps a
little less flexible, but the pricing, way more straightforward. Another major
difference lies in how these vendors name and group the services that they
offer.
Compute
offerings
With respect to scalable computing on demand,
Amazon Web Services has its Elastic Compute Cloud, or EC2. While Google Cloud
Platform offers Compute Engine, and Azure has Virtual Machines and Virtual
Machine Scale Set. For web and mobile apps, we have AWS Elastic Beanstalk and
GCP, and I'll refer to Google Cloud Platform hereafter as GCP. GCP's App
Engine, and Azure offers Webs Apps and Cloud Services. For software container
management, Amazon Web Services has ECS. And ECS is Amazon's EC2 container
service, while EKS is Amazon Elastic Container Service for Kubernetes. Google
Cloud Platform provides the Google Kubernetes Engine. And I'm quite familiar
with that having a lot of personal experience with GKE, and I can tell you it
is very powerful, very flexible, and fantastic. Azure offers AKS, which is
Azure Container Service, and Azure Container Instances.
For storage offerings, we have object storage. So
Amazon Web Services offers Simple Storage Service, or S3. GCP has Cloud
Storage, while Azure has Blob Storage. As far as archiving, or as it's known,
cold storage, is concerned, AWS offers Glacier and Data Archive. GCP has Cloud
Storage Nearline. And Azure offers Backup and Archive. With respect to content
delivery networks, AWS offers CloudFront. GCP offers Cloud CDN, and Azure has
its Content Delivery Network.
Now let's look at analytics offerings. For big data, AWS
offers EMR and Athena. While GCP offers BigQuery, Cloud Dataflow, Dataproc,
Datalab, and Pub/Sub. Azure has HDInsight and Data Lake Analytics. For BI, or
business intelligence, AWS offers QuickSight. GCP offers Data Studio, while
Azure has Power BI. Now, you might be thinking, well, Power BI, isn't that an
installable application? Yes, it is an installable application. There are
applications that are installable on desktop, or on servers even. But they
connect to cloud resources very readily and easily. That's why I'm including
those here. With respect to machine learning, AWS has Amazon Machine Learning,
or AML. Google has Cloud Machine Learning, and Azure offers Azure Machine
Learning.
Now let's consider briefly locations offerings.
So really you should try to choose your data center close to users, because it
reduces latency and provides better user experience. I mean, that goes without
saying, if you're involved in any capacity in networking, in administration,
you already know that. So here's the thing, AWS has the most global coverage,
but unfortunately there's no coverage in Africa at this point. Google Cloud
Platform has good coverage in the US, but not so good in Europe and Asia, and
there's none in South America or Africa at this time. But knowing Google, I'm
pretty sure that they are in the process of planning those right now. And as
far as Azure goes, really they have the second best global coverage behind AWS,
but again, there is no coverage in Africa.
In the beginning, we need to apply a Google Cloud Platform (GCP) account and the new client would have $300 credit to spend in the first 12 months.
Once we logged in to the GCP console, we need to create a new project in order to use the GCP services. Simply click the New Project button and type the project name. At the same time, it would assign a unique project ID for your project which we need to use to access the GCP in terminal later.