Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - Tapushe Rabaya Toma

Pages: 1 2 [3] 4 5 ... 7
31
Faculty Forum / Three Steps to Better Meetings
« on: June 25, 2018, 11:35:33 AM »
Your meetings will be shorter and more productive if you follow this simple formula.

1. Create a visual agenda ahead of time using a mind map you can edit in real time. Display it on the conference room projector (or on your screen using an online meeting platform such as GoToMeeting™ or WebEx™).

2. Move through the agenda items, noting decisions on the mind map. If action items are needed, decide in the meeting who is responsible. There are a few ways to do this in SmartDraw. You can assign the item to that person using Trello, change the shape of the task so it has a row that can show assignment, or simply add a shape representing the person and drag the task under their name in the hierarchy. Everyone in the meeting sees the action item assigned. There is no room for different interpretations of action taken and the person assigned the task is publicly accountable for completing it.

3. Upon conclusion of the meeting, distribute the action item document, in visual format (which becomes the minutes of the meeting), to each attendee by sending them a link.

Source: goo.gl/C4p4HB

33
Software Engineering / Best Database Certifications 2018
« on: May 09, 2018, 06:49:25 PM »
During the past three decades, we've seen a lot of database platforms come and go, but there's never been any question that database technology can be a crucial component for all kinds of applications and computing tasks.

Database certifications may not be as sexy or bleeding edge as cloud computing, storage or computer forensics. But the reality is that there has been, is, and always will be a need for knowledgeable database professionals at all levels and in many related job roles.

To get a better grasp of the available database certifications, it's useful to group them around specific database-related job roles. In part, this reflects the maturity of database technology, and its integration into most aspects of commercial, scientific and academic computing. As you read about the various database certification programs, keep these job roles in mind:
Database Administrator (DBA): Responsible for installing, configuring and maintaining a database management system (DBMS). Often tied to a specific platform such as Oracle, MySQL, DB2, SQL Server and others.
Database Developer: Works with generic and proprietary APIs to build applications that interact with DBMSs (also platform specific, as with DBA roles).
Database Designer/Database Architect: Researches data requirements for specific applications or users, and designs database structures and application capabilities to match.
Data Analyst/Data Scientist: Responsible for analyzing data from multiple disparate sources to discover previously hidden insight, determine meaning behind the data and make business-specific recommendations.
Data Mining/Business Intelligence (BI) Specialist: Specializes in dissecting, analyzing and reporting on important data streams, such as customer data, supply chain data, transaction data and histories, and others.
Data Warehousing Specialist: Specializes in assembling and analyzing data from multiple operational systems (orders, transactions, supply chain information, customer data and so forth) to establish data history, analyze trends, generate reports and forecasts and support general ad hoc queries.
Careful attention to these database job roles highlight two important kinds of technical issues for would-be database professionals to consider. First, a good general background in relational database management systems, including an understanding of the Structured Query Language (SQL), is a basic prerequisite for all database professionals.

Second, although various efforts to standardize database technology exist, much of the whiz-bang capability that databases and database applications can deliver come from proprietary, vendor-specific technologies. Most serious, heavy-duty database skills and knowledge are tied to specific platforms, including various Oracle products (such as the open source MySQL environment), Microsoft SQL Server, IBM DB2 and more.

It's important to note that NoSQL databases – referred to as "not only SQL" and sometimes "non-relational" – handle many types of data, such as structured, semi-structured, unstructured and polymorphic. NoSQL databases are increasingly used in big data applications, which tend to be associated with certifications for data scientists, data mining/warehousing and business intelligence. Although there is some natural overlap, for the most part, we cover those types of certifications in our annually updated Best Big Data Certifications article.

Before you look at each of our featured certifications in detail, consider their popularity with employers. The results of an informal job search conducted on several high-traffic job boards shows which database certifications employers look for when hiring new employees. The results vary from day to day (and job board to job board), but such numbers provide perspective on database certification demand.


Source:
http://www.tomsitpro.com/articles/database-certifications,2-664.html

34
Key-value, document-oriented, column family, graph, relational... Today we seem to have as many kinds of databases as there are kinds of data. While this may make choosing a database harder, it makes choosing the right database easier. Of course, that does require doing your homework. You’ve got to know your databases.

One of the least-understood types of databases out there is the graph database. Designed for working with highly interconnected data, a graph database might be described as more “relational” than a relational database. Graph databases shine when the goal is to capture complex relationships in vast webs of information.
Here is a closer look at what graph databases are, why they’re unlike other databases, and what kinds of data problems they’re built to solve.
Graph database vs. relational database
In a traditional relational or SQL database, the data is organized into tables. Each table records data in a specific format with a fixed number of columns, each column with its own data type (integer, time/date, freeform text, etc.).

This model works best when you’re dealing mainly with data from any one table. It also doesn’t work too badly when you’re aggregating data stored across multiple tables. But that behavior has some notable limits.

Consider a music database, with albums, bands, labels, and performers. If you want to report all the performers that were featured on this album by that band released on these labels—four different tables—you have to explicitly describe those relationships. With a relational database, you accomplish this by way of new data columns (for one-to-one or one-to-many relationships), or new tables (for many-to-many relationships).

This is practical as long as you’re managing a modest number of relationships. If you’re dealing with millions or even billions of relationships—friends of friends of friends, for instance—those queries don’t scale well.
In short, if the relationships between data, not the data itself, are your main concern, then a different kind of database—a graph database—is in order.

Graph database features
The term “graph” comes from the use of the word in mathematics. There it’s used to describe a collection of nodes (or vertices), each containing information (properties), and with labeled relationships (or edges) between the nodes.

A social network is a good example of a graph. The people in the network would be the nodes, the attributes of each person (such as name, age, and so on) would be properties, and the lines connecting the people (with labels such as “friend” or “mother” or “supervisor”) would indicate their relationship.

In a conventional database, queries about relationships can take a long time to process. This is because relationships are implemented with foreign keys and queried by joining tables. As any SQL DBA can tell you, performing joins is expensive, especially when you must sort through large numbers of objects—or, worse, when you must join multiple tables to perform the sorts of indirect (e.g. “friend of a friend”) queries that graph databases excel at.

Graph databases work by storing the relationships along with the data. Because related nodes are physically linked in the database, accessing those relationships is as immediate as accessing the data itself. In other words, instead of calculating the relationship as relational databases must do, graph databases simply read the relationship from storage. Satisfying queries is a simple matter of walking, or “traversing,” the graph. 

A graph database not only stores the relationships between objects in a native way, making queries about relationships fast and easy, but allows you to include different kinds of objects and different kinds of relationships in the graph. Like other NoSQL databases, a graph database is schema-less. Thus, in terms of performance and flexibility, graph databases hew closer to document databases or key-value stores than they do relational or table-oriented databases.

Graph database use cases
Graph databases work best when the data you’re working with is highly connected and should be represented by how it links or refers to other data, typically by way of many-to-many relationships.

Again, a social network is a useful example. Graph databases reduce the amount of work needed to construct and display the data views found in social networks, such as activity feeds, or determining whether or not you might know a given person due to their proximity to other friends you have in the network.

Another application for graph databases is finding patterns of connection in graph data that would be difficult to tease out via other data representations. Fraud detection systems use graph databases to bring to light relationships between entities that might otherwise have been hard to notice.

Similarly, graph databases are a natural fit for applications that manage the relationships or interdependencies between entities. You will often find graph databases behind recommendation engines, content and asset management systems, identity and access management systems, and regulatory compliance and risk management solutions.

Graph database queries
Graph databases—like other NoSQL databases—typically use their own custom query methodology instead of SQL.

One commonly used graph query language is Cypher, originally developed for the Neo4j graph database. Since late 2015 Cypher has been developed as a separate open source project, and a number of other vendors have adopted it as a query system for their products (e.g., SAP HANA).

Here is an example of a Cypher query that returns a search result for everyone who is a friend of Scott:

MATCH (a:Person {name:’Scott’})-[:FRIENDOF]->(b)
RETURN b
The arrow symbol (->) is used in Cypher queries to represent a directed relationship in the graph.

Another common graph query language, Gremlin, was devised for the Apache TinkerPop graph computing framework. Gremlin syntax is similar to that used by some languages’ ORM database access libraries.

Here is an example of a “friends of Scott” query in Gremlin:

g.V().has(“name”,”Scott”).out(“friendof”)
Many graph databases have support for Gremlin by way of a library, either built-in or third-party.

Yet another query language is SPARQL. It was originally developed by the W3C to query data stored in the Resource Description Framework (RDF) format for metadata. In other words, SPARQL wasn’t devised for graph database searches, but can be used for them. On the whole, Cypher and Gremlin have been more broadly adopted.

SPARQL queries have some elements reminiscent of SQL, namely SELECT and WHERE clauses, but the rest of the syntax is radically dissimilar. Don’t think of SPARQL as being related to SQL at all, or for that matter to other graph query languages.

Popular graph databases
Because graph databases serve a relatively niche use case, there aren’t nearly as many of them as there are relational databases. On the plus side, that makes the standout products easier to identify and discuss.

Neo4j
Neo4j is easily the most mature (11 years and counting) and best-known of the graph databases for general use. Unlike previous graph database products, it doesn’t use a SQL back-end. Neo4j is a native graph database that was engineered from the inside out to support large graph structures, as in queries that return hundreds of thousands of relations and more.

Neo4j comes in both free open-source and for-pay enterprise editions, with the latter having no restrictions on the size of a dataset (among other features). You can also experiment with Neo4j online by way of its Sandbox, which includes some sample datasets to practice with.

See InfoWorld’s review of Neo4j for more details.

Microsoft Azure Cosmos DB
The Azure Cosmos DB cloud database is an ambitious project. It’s intended to emulate multiple kinds of databases—conventional tables, document-oriented, column family, and graph—all through a single, unified service with a consistent set of APIs.

To that end, a graph database is just one of the various modes Cosmos DB can operate in. It uses the Gremlin query language and API for graph-type queries, and supports the Gremlin console created for Apache TinkerPop as another interface.

Another big selling point of Cosmos DB is that indexing, scaling, and geo-replication are handled automatically in the Azure cloud, without any knob-twiddling on your end. It isn’t clear yet how Microsoft’s all-in-one architecture measures up to native graph databases in terms of performance, but Cosmos DB certainly offers a useful combination of flexibility and scale.

See InfoWorld’s review of Azure Cosmos DB for more details.

JanusGraph
JanusGraph was forked from the TitanDB project, and is now under the governance of the Linux Foundation. It uses any of a number of supported back ends—Apache Cassandra, Apache HBase, Google Cloud Bigtable, Oracle BerkeleyDB—to store graph data, supports the Gremlin query language (as well as other elements from the Apache TinkerPop stack), and can also incorporate full-text search by way of the Apache Solr, Apache Lucene, or Elasticsearch projects.

IBM, one of the JanusGraph project’s supporters, offers a hosted version of JanusGraph on IBM Cloud, called Compose for JanusGraph. Like Azure Cosmos DB, Compose for JanusGraph provides autoscaling and high availability, with pricing based on resource usage.

35
Software Engineering / Faster data is a safer bet for risk exposure
« on: May 09, 2018, 06:24:10 PM »
 “Rather than just adding up scores at the end of a game, we tried using real-time data. For the last five years, we have been doing sport betting.”
FSB now provides a platform for online gambling and casino sites including 188Bet, BlackType online casino and bookmaker Toals in Northern Ireland.

In effect, it offers sports betting as a service. “Sports betting technology is expensive,” says Lawrence. “Data is becoming a bigger part of this cost.”

The speed of data has a direct impact on the company’s bottom line, he says. “With sports betting, we have the problem of data arriving into a platform that needs to be processed quickly. This data has to be processed fast as it will directly affect margins, so there is always a driver to make data processing faster.”

The company runs the open source PostgreSQL database as its main transactional data store. Lawrence says: “As the company grew organically, we chose open source software that we could get up and running quickly with a commercial support contract.”

But given the need for speedy data processing, he points out: “We needed fast complex query support and the ability to read and write data fast and to scale.”

The company has begun using GridGain, an in-memory database built on Apache Ignite as a data cache.

GridGain says its in-memory computing technology enables massive scale-out of data-intensive applications. It promises to dramatically improve transaction times compared with application architectures with disk-based databases.

Huge amounts of event data
At FSB Technology, huge amounts of event data must be updated constantly, and must be immediately available to a vast number of clients. For example, as a game progresses, new betting opportunities emerge. Odds must be calculated and presented to users in real time, and bets and event results must be processed instantly.

“We have progressed from application database architecture to a cache,” says Lawrence. Data is offloaded and kept in synch on a second data tier for fast in-memory database reads. “For the most part, we use GridGain for reading,” he adds. “On the risk side, when a bet is placed, we have to do fast calculations to ensure we are not exposed.”

FSB can pull complex data structures from its database rapidly and reliably. For example, if a better wants to see all current opportunities for betting on a particular sport, GridGain supports a single, fast query that pulls all the current odds for all the current bets for all the current events for that sport.

“GridGain is extremely flexible,” says Lawrence. “We have been able to create a great user experience for a variety of devices, providing the exact information users need when they need it.”

Although FSB started out using GridGain as a data cache, it is now being used as a layer between the database, says Lawrence. However, there has not been a complete shift to in-memory technology.

For instance, GridGain offers transactional support, but this is an area that FSB is not yet looking at. Lawrence is confident the company will make more use of in-memory technology going forward. “Having developers on the team who have knowledge of GridGain is a great way to get an understanding of new technology,” he says.

Private and public clouds
The company operates across both the private and public clouds, using Rackspace and Google’s public cloud, which means it can add capacity dynamically. This allows it to spin up extra GridGain nodes as and when extra data processing is needed.

Lawrence explains: “Generally, we spin up extra nodes at weekends or during a Champions League game.” This enables FSB to support the extra demands on its systems from peaks in betting, which often occur during major sporting events.

“GridGain’s ability to dynamically add and subtract nodes in the cloud has been critical to cost-effectively scaling our business and meeting our performance goals, even during extreme usage spikes, such as the Grand National,” says Lawrence. “And distributing our cluster across multiple datacentres has ensured availability.”

With GridGain, FSB also easily spun up another instance of the in-memory computing platform for a partner that wanted to run on a dedicated instance of the managed service.

In-memory processing is becoming more mainstream. At the end of last year, Stephen Hawking’s Centre for Theoretical Cosmology (Cosmos) announced it would be using HPE’s in-memory technology to further its research into the early universe and black holes. And in July last year, Japanese telco NTT Docomo rolled out SAP’s Hana in-memory database to analyse data from retail outlets and customer call centres in a bid to improve service levels.

37
Azure Security Center, Microsoft cloud-based security platform for customer instances, now supports custom assessments, allowing users to tailor the system to their needs.

In its standard form, Azure Security Center uses a set of over 150 rules that are used to harden an operating system, spanning firewalls, password policies and other factors that contribute to the system software's security posture.

Azure instances that stray from these rules trigger a security recommendation, alerting users that their virtual machines are vulnerable to attack, unauthorized access and other malicious activities that can lead to a data breach or service disruption.

Visit the Link:

http://www.eweek.com/database/microsoft-adds-custom-cloud-assessments-to-azure-security-center

38
Oracle, which once shuddered at the idea that cloud computing was the wave of the future because it was cutting in on its heavy-metal on-premises systems, now embraces its new business model—cloud services—as if all that history never happened.

There’s lots of evidence about this old criticism 10 years ago of the biggest trend in IT since client-server; you can look it up. We all make mistakes, and companies and founders of companies are no exceptions. Good thing for Oracle shareholders that the company saw the light a half-dozen years ago, read the trend correctly and came up with its own cloud service business (in 2012) before it was too late.

Advancing the timeline to Feb. 12, 2018, Oracle President of Product Development Thomas Kurian (who wasn’t with the company back when it was criticizing clouds, by the way) demonstrated the latest advances in Oracle Cloud Platform at the CloudWorld conference in New York City.


Visit the source:
http://www.eweek.com/database/oracle-turns-on-autonomous-capabilities-throughout-cloud-platform

40
Just as Uber owns no cars itself, iPass owns no actual Wi-Fi network access points (APs), but acts instead as an aggregator of publicly available hotspots in locations such as airports, hotels and shopping malls, selling its single sign-on service to both consumers and enterprises with highly mobile workforces.
It counts airlines such as Emirates and SAS – which offers pilots access to the iPass network in their electronic flight bags (EFBs) – among its key customers, as well as large tech enterprises such as Ericsson and Hitachi, and incidentally, Uber. Its network partners include big names such as AT&T, BT, China Telecom, Orange and Swisscom.

Around three and a half years ago, the organisation decided to pivot from simply reselling connectivity services to a software company offering connectivity as a service powered by big data, analytics and machine learning.

Tomasz Magdanski, iPass director of big data and analytics, who transferred from London to Silicon Valley to lead the project, explained that this pivot came about because while iPass had successfully established itself as a connectivity layer for its customers, it effectively had to also maintain the whole service despite the physical layer being owned by someone else entirely.

“The challenge we used to have was how to monitor and manage the user experience, and steer customers on where to connect, where not to connect and why, based on things like reliability and security,” Magdanski told Computer Weekly.

“We wanted a platform capable of collecting all the data our applications around the world are generating and merging that into a recommendation engine to serve our clients.”

Hadoop cluster
To start out, Magdanski explained, he implemented an on-premise Hadoop cluster in iPass’ datacentre, but this turned out to be a less than optimal decision – the build and implementation grew so complex that the process ended up taking iPass 18 months.

“It was brand-new technology and we didn’t have the expertise in-house,” he said. “By the time it was done there were Spark upgrades and other parts of the infrastructure needed replacing, so the team had to start all over with new versions, new cycles, to just stay current with the technology. It was very hard to stay on top of it with a small team and a small budget.

“The next thing to hit us was scale – we got Hadoop up and running but the data and compute started to scale to the point where we started evaluating whether it even made sense to run big data, it was so overwhelming.”

As the crisis at iPass’ datacentre deepened, Magdanski knew the only solution was to move from on-premise Hadoop up into the cloud with additional Apache Spark to better manage and scale the organisation’s data, and meet customer expectations of 24/7 network access.

“We looked at cloud providers and ended up comparing Amazon EMR (Elastic MapReduce) to Databricks. Based on the experience of the previous 18 months, we knew we needed expertise more than just a cloud environment, but Databricks was actually able to do both,” he said.

“They gave us the ability to launch clusters with the touch of a button, reconfigure on the fly, and access to their engineers to bring us up to speed.

“It took 18 months to set up the first iteration of our big data initiative, and six weeks to set up, build and release the second.”
Databricks’ Unified Analytics Platform is specifically designed to eliminate the need for disparate tools, reduce deployment times, and improve the productivity of data engineering teams, something Magdanski says has been a noticeable plus for iPass. His team is now also using Databricks to build exact, transform, load (ETL) pipelines, streaming, analytics, machine learning, data lake processing and reporting.

“Databricks was the obvious solution as it’s equipped to collect and scale massive amounts of Wi-Fi data. We have been using Databricks in development through to production, which has allowed our data engineers to focus less on building infrastructure and more on creating new data products and providing the best Wi-Fi recommendations to mobile devices,” said Magdanski.

Beyond giving iPass’ IT department some much-needed breathing space, the implementation of Databricks has helped the organisation deliver more relevant and timely information to its customers using data-driven algorithms.

Helping users’ connectivity decisions
It can now help users make far better connectivity decisions by providing them with data on white- or blacklisted APs, steering connections to Wi-Fi or 4G mobile networks based on performance, or from unprotected hotspots to protected ones, for example.

“We now provide reports and dashboards to customers to help them understand their network usage and how our platform fits into their broader connectivity strategy, keeping them secure and productive as they travel,” said Magdanski.

The implementation has also enabled iPass to introduce new product lines. In late 2017, it announced Veri-Fi, a suite of analytics services that effectively productises and monetises the network data itself.

The aggregator is now able to sell data on aspects of performance such as AP functionality, customer experience, signal interference and more back into its service provider partners – the actual network owners – to help them monitor, manage and improve their own services.

For iPass, this is creating more meaningful, deeper relationships with its business partners, and most importantly for its shareholders, new revenue streams.

41
Speakers at the Singapore launch of Tableau’s new data preparation tool said the growing prevalence of data analytics is due to the exponential growth in data volume and leaps in computing power. This presents huge opportunities for organisations to innovate and optimise their businesses.
“Data is the lifeblood of pervasive innovation,” said JY Pook, senior vice-president at Tableau Asia Pacific. “Everyone should be skilful in making decisions based on data on behalf of their organisation. We see smart nation initiatives, organisations and tertiary institutions all working towards understanding their data, and building a data culture.”

For example, the Singapore Sports Institute, a government statutory board, uses analytics to analyse the performance of more than 70 high-performance athletes to help them train effectively for competitions and forecast success.

“In the future, competitiveness is going to be determined by the ability of organisations to manage data,” said Fermin Diez, an adjunct professor at Singapore Management University.

Diez gave the example of how ride-hailing companies Grab and Uber both harness analytics in their operations, but Grab gained the upper hand through better use of technology.

He noted that although Uber, a global competitor, had entered Asia with a superior platform, Malaysian startup Grab used analytics to understand driver and customer needs better, eventually overtaking Uber in market share.
In fact, more than 60% of Grab’s staff engage in data analysis, said Diez. “We are seeing a broadening of analytics skills to as many people as possible, instead of depending only on data scientists who may seem like high priests in black robes in some back room gazing into crystal balls,” he quipped.

It is a challenge to prepare the data for analysis because it may be in the wrong shape or residing in disparate sources. A recent Harvard Business Review study found that people spend 80% of their time cleaning and shaping data, and only 20% of their time analysing it.

Besides having clean data, Diez said the main challenge is to convince people that data analytics does not always start with data.

“You need to figure out the business problem you are trying to solve,” he said, adding that organisations should consider the hypotheses they need to prove, or disprove, before figuring out what data to look at.

He gave the example of how HR departments tend to link employee turnover to pay, while another approach is to explore the relationship between turnover and company profits.

A trailblazer in this space is Merck Group. The global life science and technology company is using HR data to make business processes more efficient. It has rolled out an analytics app that allows managers and HR employees to compile real-time data for business units. This data is combined with payroll, accounting and employee engagement data to enable decision-making on a variety of issues.

“We often say we have this data and ask what insights it throws up…but part of the problem is that we haven’t trained people to ask a question and then look at data,” said Diez. “That type of fishing is much harder.”

Despite the challenges for emerging ASEAN countries to adopt analytics technology, some have made headway.

“The pace of innovation and adoption of modern business intelligence and analytics is amazing in some subsidiaries [in emerging countries],” said Tableau’s Pook. “They are very nimble and fast. They are doing things in the analytics space that their parent company is not doing yet.”

42
The autonomous self-patching, self-healing database, the first version of which is 18c, is a part of a long-term play to help draw the company’s customers into Oracle’s piece of the cloud – which is increasingly packing itself with cloud-based applications and services.
Hurd said it could take almost a year to get on-premise databases patched, whereas patching was instant with the autonomous version. “If everyone had the autonomous database, that would change to instantaneous,” he said.

So where does that leave Oracle DBAs around the world? Possibly in the unemployment queue, at least according to Hurd.

“There are hundreds of thousands of DBAs managing Oracle databases. If all of that moved to the autonomous database, the number would change to zero,” Hurd said at an Oracle media event in Redwood Shores, California.

That could be appealing for companies with large rosters of DBAs from a cost-cutting perspective, but that day is most likely many years away.

It is early days for the 18c version, which became available in March 2018, and most Oracle customers are still kicking its tyres.

Three customers on a panel said they were evaluating 18c with a view of using it in the future. They were Michael Sherwood, IT director for the city of Las Vegas; Glenn Coles, Yamaha US CIO; and Lynden Tennison, CIO of Union Pacific Corporation. Hertz and Accenture are also likely early 18c users.

Meanwhile, Pat Sullivan, Accenture’s North America Oracle business group lead, said at the event that his firm has 20,000 DBAs and their future looked reasonably rosy with many set to become more specialised database experts – if the basic database maintenance role went away with the autonomous version.

Hurd said the performance boost from 18c was on a level with the company’s high-end Exadata Database Machine, used by around 5% of Oracle’s on-premise customers.
“They [Exadata customers] get 20 times better performance than our traditional on-premise customers. Imagine everybody getting Exadata-plus performance. Extreme performance, totally patched, totally optimised,” he said.

Hurd was also very bullish on Oracle’s applications business, which includes revenues from on-premise support, on-premise licensing and software as a service (SaaS).

“I made a prediction in the middle of last year that [the applications business] would grow double digits…and that will happen for us during the year,” he said.

The company is also throwing in top-level platinum level support at no cost for anyone using Oracle’s Fusion SaaS applications. The support package includes 24/7 rapid response technical support, proactive technical monitoring, implementation guidance and improved on-demand education resources.

On the autonomous platform-as-a-service front, which Oracle is increasingly targeting as a future cash cow, the company announced the availability of three new services that have baked in artificial intelligence (AI) and machine learning algorithms. These are the Oracle Autonomous Analytics Cloud, Oracle Autonomous Integration Cloud, and Oracle Autonomous Visual Builder Cloud.

43
Software Engineering / How to use Azure Cosmos DB in .Net
« on: May 09, 2018, 06:08:32 PM »
Azure Cosmos DB is an easy-to-use, scalable, NoSQL database available in Microsoft’s Azure cloud. Cosmos DB is hosted on SSDs and hence data storage and retrieval is very fast. This database service also supports multiple data models including document, key-value, graph, and columnar, so can be leveraged for all kinds of applications. We will focus on Cosmos DB’s document data model, which is accessed through the SQL API (formerly known as the DocumentDB API).

Similar to MongoDB and RavenDB, the document data model in Azure Cosmos DB allows you to store data represented as JSON without an enforced schema. In this article I will discuss the use of Azure Cosmos DB as a JSON document store and how we can work with it in .Net.
Getting started with Azure Cosmos DB
First create a free Microsoft Azure account if you don’t have one. Next, create and configure an Azure Cosmos DB account from the Azure portal. When creating an Azure Cosmos DB account, you will be prompted to select any one of the five API options: Gremlin (graph database), MongoDB (document database), SQL (document database), Cassandra (columnar database), and Azure Table (key-value store). Select SQL as the API for your Azure Cosmos DB account. You can take a look at this Microsoft Quickstart to learn how to configure Azure Cosmos DB with the SQL API.

Now that you have your Cosmos DB account, create a new console application project in Visual Studio and save it with a name. Next, select the project in the Solution Explorer window and install the Microsoft.Azure.DocumentDB .Net client library in your project from the NuGet Package Manager window. Alternatively, you can install this library via the NuGet Package Manager console using the following command.

44
Perhaps the least understood component of secondary storage strategy, archive has become a necessity for modern digital enterprises with petabytes of data and billions of files.
So, what exactly is archive, and why is it so important?

Archiving data involves moving data that is no longer frequently accessed off primary systems for long-term retention.
The most apparent benefit of archiving data is to save precious space on expensive primary NAS or to retain data for regulatory compliance, but archiving can reap long-term benefits for your business as well. For example, archiving the results of scientific experiments that would be costly to replicate can be extremely valuable later for future studies.

In addition, a strong archive tier can cost-effectively protect and enable usage of the huge data sets needed for enhanced analytics, machine learning, and artificial intelligence workflows.

Legacy archive fails for massive unstructured data
However, legacy archive infrastructure wasn’t built to meet the requirements of massive unstructured data, resulting in three key failures of legacy archive solutions.

First, the scale of data has changed greatly, from terabytes to petabytes and quickly growing. Legacy archive can’t move high volumes of data quickly enough and can’t scale with today’s exploding data sets.

Second, the way organizations use data has also changed. It’s no longer adequate to simply throw data into a vault and keep it safe; organizations need to use their archived data as digital assets become integral to business. As more organizations employ cloud computing and machine learning/AI applications using their huge repositories of data, legacy archive falls short in enabling usage of archived data.

Third, traditional data management must become increasingly automated and delivered as-a-Service to relieve management overhead on enterprise IT and reduce total cost of ownership as data explodes beyond petabytes.

Modern archive must overcome these failures of legacy solutions and meet the following requirements.

1. Ingest petabytes of data
Because today’s digital enterprises are generating and using petabytes of data and billions of files, a modern archive solution must have the capacity to ingest enormous amounts of data.

Legacy software uses single-threaded protocols to move data, which was necessary to write to tape and worked for terabyte-scale data but fail for today’s petabyte-scale data.

Modern archive needs highly parallel and latency-aware data movement to efficiently move data from where it lives to where it’s needed, without impacting performance. The ability to automatically surface archive-ready data and set policies to snapshot, move, verify, and re-export data can reduce administrator effort and streamline data management.

In addition, modern archive must be able to scale with exponentially growing data. Unlike legacy archive, which necessitates silos as data grows large, a scale-out archive tier keeps data within the same system for simpler management.

2. API-driven, cloud-native architecture
An API-driven archive solution can plug into customer applications, ensuring that the data can be used. Legacy software wasn’t designed with this kind of automation, making it difficult to use the data after it’s been archived.

Modern archive that’s cloud-native can much more easily plug into customer applications and enable usage. My company’s product, Igneous Hybrid Storage Cloud, is built with event-driven computing, applying the cloud-native concept of having interoperability at every step. Event-driven computing models tie compute to actions on data and are functionally API-driven, adding agility to the software. Building in compatibility with any application is simply a matter of exposing existing APIs to customer-facing applications.

This ensures that data can get used by customer applications. This capability is especially useful in the growing fields of machine learning and AI, where massive repositories of data are needed for compute. The more data, the better—which not only requires a scale-out archive tier, but one that enables that data to be computed.

An example of a machine learning/AI workflow used by Igneous customers involves using Igneous Hybrid Storage Cloud as the archive tier for petabytes of unstructured file data and moving smaller subsets of data to a “hot edge” primary tier from which the data can be processed and computed.

3. As-a-Service delivery
Many of the digital enterprises and organizations with enormous amounts of unstructured file data don’t necessarily have the IT resources or budget to match, let alone the capacity to keep pace with the growing IT requirements of their exponentially growing data.

To keep management overhead reasonable and cost-effective, many organizations are turning to as-a-service solutions. With as-a-service platforms, software is remotely monitored, updated, and troubleshooted, so that organizations can focus on their business, not IT.

Modern archive solutions that are delivered as-a-service can help organizations save on total cost of ownership (TCO) when taking into account the amount of time it frees up for IT administrators to focus on other tasks—like planning long-term data management and archiving strategy.

45
There are hundreds of tech-heavy database reviews out there, but they don’t always give clear guidance on the first step in selecting a database: choosing the best general type for a specific application. All databases are not created equal. Each has specific strengths and weaknesses. While it’s true that workarounds exist to make a favorite database work for most projects, using those tricks adds unnecessary complexity.

Before considering a specific database, take some time to think about what type would best support the project at hand. The question goes deeper than “SQL vs. NoSQL.” Read on for a rundown of the most common database types, the relative merits of each, and how to tell which is the best fit.

[ Which NoSQL database should you use? Let InfoWorld be your guide. NoSQL grudge match: MongoDB and Couchbase Server go nose to nose. • NoSQL standouts: The best key-value databases. • NoSQL standouts: The best document databases. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Relational database management systems (Oracle, MySQL, MS Server, PostgreSQL)
Relational databases were developed in the 1970s to handle the increasing flood of data being produced. They have a solid foundational theory and have influenced nearly every database system in use today.

Relational databases store data sets as “relations”: tables with rows and columns where all information is stored as a value of a specific cell. Data in an RDBMS is managed using SQL. Though there are different implementations, SQL is standardized and provides a level of predictability and utility.

After an early flood of vendors tried to take advantage of the system’s popularity with not-quite-relational products, creator E.F. Codd outlined a set of rules that must be followed by all relational database management systems. Codd’s 12 rules revolve around imposing strict internal structure protocols, making sure that searches reliably return requested data, and preventing structural alterations (at least by users). The framework ensured that relational databases are consistent and reliable to this day.
Strengths
Relational databases excel at handling highly structured data and provide support for ACID (Atomicity, Consistency, Isolation, and Durability) transactions. Data is easily stored and retrieved using SQL queries. The structure can be scaled up quickly because adding data without modifying existing data is simple.

Creating limits on what certain user types can access or modify is built into the structure of an RDBMS. Because of this, relational databases are well-suited to applications that require tiered access. For example, customers could view their accounts while agents could both view and make necessary changes.

Weaknesses
The biggest weakness of relational databases is the mirror of their biggest strength. As good as they are at handling structured data, they have a hard time with unstructured data. Representing real world entities in context is difficult in the bounds of an RDBMS. “Sliced” data has to be reassembled from tables into something more readable, and speed can be negatively impacted. The fixed schema doesn’t react well to change, either.

Cost is a consideration with relational databases. They tend to be more expensive to set up and grow. Horizontal scaling, or scaling by adding more servers, is usually both faster and more economical than vertical scaling, which involves adding more resources to a server. However, the structure of relational databases complicates the process. Sharding (where data is horizontally partitioned and distributed across a collection of machines) is necessary to scale out a relational database. Sharding relational databases while maintaining ACID compliance can be a challenge.

Use a relational database for:
Situations where data integrity is absolutely paramount (i.e., for financial applications, defense and security, and private health information)
Highly structured data
Automation of internal processes
Document store (MongoDB, Couchbase)
A document store is a nonrelational database that stores data in JSON, BSON, or XML documents. They feature a flexible schema. Unlike SQL databases, where users must declare a table’s schema before inserting data, document stores don’t enforce document structure. Documents can contain any data desired. They have key-value pairs but also embed attribute metadata to make querying easier.

[ The essentials from InfoWorld: NoSQL grudge match: MongoDB vs. Couchbase Server • Review: MongoDB learns cool new tricks • The essential guide to MongoDB security. • How to work with MongoDB in .Net. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Strengths
Document stores are very flexible. They handle semistructured and unstructured data well. Users don’t need to know during set-up what types of data will be stored, so this is a good choice when it isn’t clear in advance what sort of data will be incoming.

Users can create their desired structure in a particular document without affecting all documents. Schema can be modified without causing downtime, which leads to high availability. Write speed is generally fast, as well.

Besides flexibility, developers like document stores because they’re easy to scale horizontally. The sharding necessary for horizontal scaling is much more intuitive than with relational databases, so document stores scale out fast and efficiently.

Weaknesses
Document databases sacrifice ACID compliance for flexibility. Also, while querying can be done in a document it’s not possible across documents.

Use a document database for:
Unstructured or semistructured data
Content management
In-depth data analysis
Rapid prototyping
Key-value store (Redis, Memcached)
A key-value store is a type of nonrelational database where each value is associated with a specific key. It’s also known as an associative array.

The “key” is a unique identifier associated only with the value. Keys can be anything allowed by the DBMS. In Redis, for example, keys man be any binary sequence up to 512MB.

“Values” are stored as blobs and don’t need predefined schema. They can take nearly any form: numbers, strings, counters, JSON, XML, HTML, PHP, binaries, images, short videos, lists, and even another key-value pair encapsulated in an object. Some DBMSs allow for the data type to be specified, but it isn’t mandatory.

[ The essentials from InfoWorld: Why Redis beats Memcached for caching • How to use Redis for real-time stream processing • Manage access control using Redis Bitfields. • How to work with Redis Cache in .Net. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Strengths
This style of database has a lot of positives. It’s incredibly flexible, able to handle a very wide array of data types easily. Keys are used to go straight to the value with no index searching or joins, so performance is high. Portability is another benefit: key-value stores can be moved from one system to another without rewriting code. Finally, they’re highly horizontally scalable and have lower operating costs overall.

Weaknesses
Flexibility comes at a price. It’s impossible to query values, because they’re stored as a blob and can only be returned as such. This makes it hard to do reporting or edit parts of values. Not all objects are easy to model as key-value pairs, either.

Use a key-value store for:
Recommendations
User profiles and settings
Unstructured data such as product reviews or blog comments
Session management at scale
Data that will be accessed frequently but not often updated
Wide-column store (Cassandra, HBase)
Wide-column stores, also called column stores or extensible record stores, are dynamic column-oriented nonrelational databases. They’re sometimes seen as a type of key-value store but have attributes of traditional relational databases as well.

Wide-column stores use the concept of a keyspace instead of schemas. A keyspace encompasses column families (similar to tables but more flexible in structure), each of which contains multiple rows with distinct columns. Each row doesn’t need to have the same number or type of column. A timestamp determines the most recent version of data.

[ The essentials from InfoWorld: Get to know Cassandra, the NoSQL maverick • Review: Cassandra lowers the barriers to big data. | HBase: The database big data left behind. • Review: HBase is massively scalable—and hugely complex. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Strengths
This type of database has some benefits of both relational and nonrelational databases. It deals better with both structured and semistructured data than other nonrelational databases, and it’s easier to update. Compared to relational databases, it’s more horizontally scalable and faster at scale.

Columnar databases compress better than row-based systems. Also, large data sets are simple to explore. Wide-column stores are particularly good at aggregation queries, for example.

Weaknesses
Writes are expensive in the small. While updating is easy to do in bulk, uploading and updating individual records is hard. Plus, wide-column stores are slower than relational databases when handling transactions.

Use a wide-column store for:
Big data analytics where speed is important
Data warehousing on big data
Large scale projects (this database style is not a good tool for average transactional applications)
Search engine (Elasticsearch)
It may seem strange to include search engines in an article about database types. However, Elasticsearch has seen increased popularity in this sphere as developers look for innovative ways to cut down search lag. Elastisearch is a nonrelational, document-based data storage and retrieval solution specifically arranged and optimized for the storage and rapid retrieval of data.

Strengths
Elastisearch is very scalable. It features flexible schema and fast retrieval of records, with advanced search options including full text search, suggestions, and complex search expressions.

One of the most interesting search features is stemming. Stemming analyzes the root form of a word to find relevant records even when another form is used. For example, a user searching an employment database for “paying jobs” would also find positions tagged as “paid” and “pay.”

Weaknesses
Elastisearch is used more as an intermediary or supplementary store than a primary database. It has low durability and poor security. There’s no innate authentication or access control. Also, Elastisearch doesn’t support transactions.

Use a search engine like Elastisearch for:
Improving user experience with faster search results
Logging
Final considerations
Some applications fit neatly in the strengths of one specific database type, but for most projects there’s overlap between two or more. In those cases, it can be useful to look at which specific databases in the contended styles are good candidates. Vendors offer a wide spectrum of features for tailoring their database to individual standards. Some of these may help resolve uncertainty over factors like security, scalability, and cost.

This article is published as part of the IDG Contributor Network. Want to Join?

Pages: 1 2 [3] 4 5 ... 7