Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - Tapushe Rabaya Toma

Pages: 1 2 [3] 4 5 ... 7

Software Engineering / How to Choose a Data Warehouse That Is the Right Fit for Your Company

« on: May 09, 2018, 06:22:08 PM »

Visit the Link:

http://www.eweek.com/database/how-to-choose-a-data-warehouse-that-is-the-right-fit-for-your-company

Software Engineering / Microsoft Adds Custom Cloud Assessments to Azure Security Center

« on: May 09, 2018, 06:21:06 PM »

Azure Security Center, Microsoft cloud-based security platform for customer instances, now supports custom assessments, allowing users to tailor the system to their needs.

In its standard form, Azure Security Center uses a set of over 150 rules that are used to harden an operating system, spanning firewalls, password policies and other factors that contribute to the system software's security posture.

Azure instances that stray from these rules trigger a security recommendation, alerting users that their virtual machines are vulnerable to attack, unauthorized access and other malicious activities that can lead to a data breach or service disruption.

Visit the Link:

http://www.eweek.com/database/microsoft-adds-custom-cloud-assessments-to-azure-security-center

Software Engineering / Oracle Turns On Autonomous Capabilities Throughout Cloud Platform

« on: May 09, 2018, 06:20:21 PM »

Oracle, which once shuddered at the idea that cloud computing was the wave of the future because it was cutting in on its heavy-metal on-premises systems, now embraces its new business model—cloud services—as if all that history never happened.

There’s lots of evidence about this old criticism 10 years ago of the biggest trend in IT since client-server; you can look it up. We all make mistakes, and companies and founders of companies are no exceptions. Good thing for Oracle shareholders that the company saw the light a half-dozen years ago, read the trend correctly and came up with its own cloud service business (in 2012) before it was too late.

Advancing the timeline to Feb. 12, 2018, Oracle President of Product Development Thomas Kurian (who wasn’t with the company back when it was criticizing clouds, by the way) demonstrated the latest advances in Oracle Cloud Platform at the CloudWorld conference in New York City.

Visit the source:
http://www.eweek.com/database/oracle-turns-on-autonomous-capabilities-throughout-cloud-platform

Software Engineering / Nine Best Practices for Keeping Bad Actors Out of a Database

« on: May 09, 2018, 06:17:42 PM »

Visit the Link..
http://www.eweek.com/database/nine-best-practices-for-keeping-bad-actors-out-of-a-database

Software Engineering / Wi-Fi provider uses big data analytics to drive customer experience

« on: May 09, 2018, 06:15:48 PM »

Just as Uber owns no cars itself, iPass owns no actual Wi-Fi network access points (APs), but acts instead as an aggregator of publicly available hotspots in locations such as airports, hotels and shopping malls, selling its single sign-on service to both consumers and enterprises with highly mobile workforces.
It counts airlines such as Emirates and SAS – which offers pilots access to the iPass network in their electronic flight bags (EFBs) – among its key customers, as well as large tech enterprises such as Ericsson and Hitachi, and incidentally, Uber. Its network partners include big names such as AT&T, BT, China Telecom, Orange and Swisscom.

Around three and a half years ago, the organisation decided to pivot from simply reselling connectivity services to a software company offering connectivity as a service powered by big data, analytics and machine learning.

Tomasz Magdanski, iPass director of big data and analytics, who transferred from London to Silicon Valley to lead the project, explained that this pivot came about because while iPass had successfully established itself as a connectivity layer for its customers, it effectively had to also maintain the whole service despite the physical layer being owned by someone else entirely.

“The challenge we used to have was how to monitor and manage the user experience, and steer customers on where to connect, where not to connect and why, based on things like reliability and security,” Magdanski told Computer Weekly.

“We wanted a platform capable of collecting all the data our applications around the world are generating and merging that into a recommendation engine to serve our clients.”

Hadoop cluster
To start out, Magdanski explained, he implemented an on-premise Hadoop cluster in iPass’ datacentre, but this turned out to be a less than optimal decision – the build and implementation grew so complex that the process ended up taking iPass 18 months.

“It was brand-new technology and we didn’t have the expertise in-house,” he said. “By the time it was done there were Spark upgrades and other parts of the infrastructure needed replacing, so the team had to start all over with new versions, new cycles, to just stay current with the technology. It was very hard to stay on top of it with a small team and a small budget.

“The next thing to hit us was scale – we got Hadoop up and running but the data and compute started to scale to the point where we started evaluating whether it even made sense to run big data, it was so overwhelming.”

As the crisis at iPass’ datacentre deepened, Magdanski knew the only solution was to move from on-premise Hadoop up into the cloud with additional Apache Spark to better manage and scale the organisation’s data, and meet customer expectations of 24/7 network access.

“We looked at cloud providers and ended up comparing Amazon EMR (Elastic MapReduce) to Databricks. Based on the experience of the previous 18 months, we knew we needed expertise more than just a cloud environment, but Databricks was actually able to do both,” he said.

“They gave us the ability to launch clusters with the touch of a button, reconfigure on the fly, and access to their engineers to bring us up to speed.

“It took 18 months to set up the first iteration of our big data initiative, and six weeks to set up, build and release the second.”
Databricks’ Unified Analytics Platform is specifically designed to eliminate the need for disparate tools, reduce deployment times, and improve the productivity of data engineering teams, something Magdanski says has been a noticeable plus for iPass. His team is now also using Databricks to build exact, transform, load (ETL) pipelines, streaming, analytics, machine learning, data lake processing and reporting.

“Databricks was the obvious solution as it’s equipped to collect and scale massive amounts of Wi-Fi data. We have been using Databricks in development through to production, which has allowed our data engineers to focus less on building infrastructure and more on creating new data products and providing the best Wi-Fi recommendations to mobile devices,” said Magdanski.

Beyond giving iPass’ IT department some much-needed breathing space, the implementation of Databricks has helped the organisation deliver more relevant and timely information to its customers using data-driven algorithms.

Helping users’ connectivity decisions
It can now help users make far better connectivity decisions by providing them with data on white- or blacklisted APs, steering connections to Wi-Fi or 4G mobile networks based on performance, or from unprotected hotspots to protected ones, for example.

“We now provide reports and dashboards to customers to help them understand their network usage and how our platform fits into their broader connectivity strategy, keeping them secure and productive as they travel,” said Magdanski.

The implementation has also enabled iPass to introduce new product lines. In late 2017, it announced Veri-Fi, a suite of analytics services that effectively productises and monetises the network data itself.

The aggregator is now able to sell data on aspects of performance such as AP functionality, customer experience, signal interference and more back into its service provider partners – the actual network owners – to help them monitor, manage and improve their own services.

For iPass, this is creating more meaningful, deeper relationships with its business partners, and most importantly for its shareholders, new revenue streams.

Software Engineering / Data analytics is about culture, not technology

« on: May 09, 2018, 06:14:06 PM »

Speakers at the Singapore launch of Tableau’s new data preparation tool said the growing prevalence of data analytics is due to the exponential growth in data volume and leaps in computing power. This presents huge opportunities for organisations to innovate and optimise their businesses.
“Data is the lifeblood of pervasive innovation,” said JY Pook, senior vice-president at Tableau Asia Pacific. “Everyone should be skilful in making decisions based on data on behalf of their organisation. We see smart nation initiatives, organisations and tertiary institutions all working towards understanding their data, and building a data culture.”

For example, the Singapore Sports Institute, a government statutory board, uses analytics to analyse the performance of more than 70 high-performance athletes to help them train effectively for competitions and forecast success.

“In the future, competitiveness is going to be determined by the ability of organisations to manage data,” said Fermin Diez, an adjunct professor at Singapore Management University.

Diez gave the example of how ride-hailing companies Grab and Uber both harness analytics in their operations, but Grab gained the upper hand through better use of technology.

He noted that although Uber, a global competitor, had entered Asia with a superior platform, Malaysian startup Grab used analytics to understand driver and customer needs better, eventually overtaking Uber in market share.
In fact, more than 60% of Grab’s staff engage in data analysis, said Diez. “We are seeing a broadening of analytics skills to as many people as possible, instead of depending only on data scientists who may seem like high priests in black robes in some back room gazing into crystal balls,” he quipped.

It is a challenge to prepare the data for analysis because it may be in the wrong shape or residing in disparate sources. A recent Harvard Business Review study found that people spend 80% of their time cleaning and shaping data, and only 20% of their time analysing it.

Besides having clean data, Diez said the main challenge is to convince people that data analytics does not always start with data.

“You need to figure out the business problem you are trying to solve,” he said, adding that organisations should consider the hypotheses they need to prove, or disprove, before figuring out what data to look at.

He gave the example of how HR departments tend to link employee turnover to pay, while another approach is to explore the relationship between turnover and company profits.

A trailblazer in this space is Merck Group. The global life science and technology company is using HR data to make business processes more efficient. It has rolled out an analytics app that allows managers and HR employees to compile real-time data for business units. This data is combined with payroll, accounting and employee engagement data to enable decision-making on a variety of issues.

“We often say we have this data and ask what insights it throws up…but part of the problem is that we haven’t trained people to ask a question and then look at data,” said Diez. “That type of fishing is much harder.”

Despite the challenges for emerging ASEAN countries to adopt analytics technology, some have made headway.

“The pace of innovation and adoption of modern business intelligence and analytics is amazing in some subsidiaries [in emerging countries],” said Tableau’s Pook. “They are very nimble and fast. They are doing things in the analytics space that their parent company is not doing yet.”

Software Engineering / Oracle’s autonomous database could leave DBAs unemployed

« on: May 09, 2018, 06:11:19 PM »

The autonomous self-patching, self-healing database, the first version of which is 18c, is a part of a long-term play to help draw the company’s customers into Oracle’s piece of the cloud – which is increasingly packing itself with cloud-based applications and services.
Hurd said it could take almost a year to get on-premise databases patched, whereas patching was instant with the autonomous version. “If everyone had the autonomous database, that would change to instantaneous,” he said.

So where does that leave Oracle DBAs around the world? Possibly in the unemployment queue, at least according to Hurd.

“There are hundreds of thousands of DBAs managing Oracle databases. If all of that moved to the autonomous database, the number would change to zero,” Hurd said at an Oracle media event in Redwood Shores, California.

That could be appealing for companies with large rosters of DBAs from a cost-cutting perspective, but that day is most likely many years away.

It is early days for the 18c version, which became available in March 2018, and most Oracle customers are still kicking its tyres.

Three customers on a panel said they were evaluating 18c with a view of using it in the future. They were Michael Sherwood, IT director for the city of Las Vegas; Glenn Coles, Yamaha US CIO; and Lynden Tennison, CIO of Union Pacific Corporation. Hertz and Accenture are also likely early 18c users.

Meanwhile, Pat Sullivan, Accenture’s North America Oracle business group lead, said at the event that his firm has 20,000 DBAs and their future looked reasonably rosy with many set to become more specialised database experts – if the basic database maintenance role went away with the autonomous version.

Hurd said the performance boost from 18c was on a level with the company’s high-end Exadata Database Machine, used by around 5% of Oracle’s on-premise customers.
“They [Exadata customers] get 20 times better performance than our traditional on-premise customers. Imagine everybody getting Exadata-plus performance. Extreme performance, totally patched, totally optimised,” he said.

Hurd was also very bullish on Oracle’s applications business, which includes revenues from on-premise support, on-premise licensing and software as a service (SaaS).

“I made a prediction in the middle of last year that [the applications business] would grow double digits…and that will happen for us during the year,” he said.

The company is also throwing in top-level platinum level support at no cost for anyone using Oracle’s Fusion SaaS applications. The support package includes 24/7 rapid response technical support, proactive technical monitoring, implementation guidance and improved on-demand education resources.

On the autonomous platform-as-a-service front, which Oracle is increasingly targeting as a future cash cow, the company announced the availability of three new services that have baked in artificial intelligence (AI) and machine learning algorithms. These are the Oracle Autonomous Analytics Cloud, Oracle Autonomous Integration Cloud, and Oracle Autonomous Visual Builder Cloud.

Software Engineering / How to use Azure Cosmos DB in .Net

« on: May 09, 2018, 06:08:32 PM »

Azure Cosmos DB is an easy-to-use, scalable, NoSQL database available in Microsoft’s Azure cloud. Cosmos DB is hosted on SSDs and hence data storage and retrieval is very fast. This database service also supports multiple data models including document, key-value, graph, and columnar, so can be leveraged for all kinds of applications. We will focus on Cosmos DB’s document data model, which is accessed through the SQL API (formerly known as the DocumentDB API).

Similar to MongoDB and RavenDB, the document data model in Azure Cosmos DB allows you to store data represented as JSON without an enforced schema. In this article I will discuss the use of Azure Cosmos DB as a JSON document store and how we can work with it in .Net.
Getting started with Azure Cosmos DB
First create a free Microsoft Azure account if you don’t have one. Next, create and configure an Azure Cosmos DB account from the Azure portal. When creating an Azure Cosmos DB account, you will be prompted to select any one of the five API options: Gremlin (graph database), MongoDB (document database), SQL (document database), Cassandra (columnar database), and Azure Table (key-value store). Select SQL as the API for your Azure Cosmos DB account. You can take a look at this Microsoft Quickstart to learn how to configure Azure Cosmos DB with the SQL API.

Now that you have your Cosmos DB account, create a new console application project in Visual Studio and save it with a name. Next, select the project in the Solution Explorer window and install the Microsoft.Azure.DocumentDB .Net client library in your project from the NuGet Package Manager window. Alternatively, you can install this library via the NuGet Package Manager console using the following command.

Software Engineering / 3 requirements of modern archive for massive unstructured data

« on: May 09, 2018, 06:07:20 PM »

Perhaps the least understood component of secondary storage strategy, archive has become a necessity for modern digital enterprises with petabytes of data and billions of files.
So, what exactly is archive, and why is it so important?

Archiving data involves moving data that is no longer frequently accessed off primary systems for long-term retention.
The most apparent benefit of archiving data is to save precious space on expensive primary NAS or to retain data for regulatory compliance, but archiving can reap long-term benefits for your business as well. For example, archiving the results of scientific experiments that would be costly to replicate can be extremely valuable later for future studies.

In addition, a strong archive tier can cost-effectively protect and enable usage of the huge data sets needed for enhanced analytics, machine learning, and artificial intelligence workflows.

Legacy archive fails for massive unstructured data
However, legacy archive infrastructure wasn’t built to meet the requirements of massive unstructured data, resulting in three key failures of legacy archive solutions.

First, the scale of data has changed greatly, from terabytes to petabytes and quickly growing. Legacy archive can’t move high volumes of data quickly enough and can’t scale with today’s exploding data sets.

Second, the way organizations use data has also changed. It’s no longer adequate to simply throw data into a vault and keep it safe; organizations need to use their archived data as digital assets become integral to business. As more organizations employ cloud computing and machine learning/AI applications using their huge repositories of data, legacy archive falls short in enabling usage of archived data.

Third, traditional data management must become increasingly automated and delivered as-a-Service to relieve management overhead on enterprise IT and reduce total cost of ownership as data explodes beyond petabytes.

Modern archive must overcome these failures of legacy solutions and meet the following requirements.

1. Ingest petabytes of data
Because today’s digital enterprises are generating and using petabytes of data and billions of files, a modern archive solution must have the capacity to ingest enormous amounts of data.

Legacy software uses single-threaded protocols to move data, which was necessary to write to tape and worked for terabyte-scale data but fail for today’s petabyte-scale data.

Modern archive needs highly parallel and latency-aware data movement to efficiently move data from where it lives to where it’s needed, without impacting performance. The ability to automatically surface archive-ready data and set policies to snapshot, move, verify, and re-export data can reduce administrator effort and streamline data management.

In addition, modern archive must be able to scale with exponentially growing data. Unlike legacy archive, which necessitates silos as data grows large, a scale-out archive tier keeps data within the same system for simpler management.

2. API-driven, cloud-native architecture
An API-driven archive solution can plug into customer applications, ensuring that the data can be used. Legacy software wasn’t designed with this kind of automation, making it difficult to use the data after it’s been archived.

Modern archive that’s cloud-native can much more easily plug into customer applications and enable usage. My company’s product, Igneous Hybrid Storage Cloud, is built with event-driven computing, applying the cloud-native concept of having interoperability at every step. Event-driven computing models tie compute to actions on data and are functionally API-driven, adding agility to the software. Building in compatibility with any application is simply a matter of exposing existing APIs to customer-facing applications.

This ensures that data can get used by customer applications. This capability is especially useful in the growing fields of machine learning and AI, where massive repositories of data are needed for compute. The more data, the better—which not only requires a scale-out archive tier, but one that enables that data to be computed.

An example of a machine learning/AI workflow used by Igneous customers involves using Igneous Hybrid Storage Cloud as the archive tier for petabytes of unstructured file data and moving smaller subsets of data to a “hot edge” primary tier from which the data can be processed and computed.

3. As-a-Service delivery
Many of the digital enterprises and organizations with enormous amounts of unstructured file data don’t necessarily have the IT resources or budget to match, let alone the capacity to keep pace with the growing IT requirements of their exponentially growing data.

To keep management overhead reasonable and cost-effective, many organizations are turning to as-a-service solutions. With as-a-service platforms, software is remotely monitored, updated, and troubleshooted, so that organizations can focus on their business, not IT.

Modern archive solutions that are delivered as-a-service can help organizations save on total cost of ownership (TCO) when taking into account the amount of time it frees up for IT administrators to focus on other tasks—like planning long-term data management and archiving strategy.

Software Engineering / How to choose the right type of database for your enterprise

« on: May 09, 2018, 06:05:32 PM »

There are hundreds of tech-heavy database reviews out there, but they don’t always give clear guidance on the first step in selecting a database: choosing the best general type for a specific application. All databases are not created equal. Each has specific strengths and weaknesses. While it’s true that workarounds exist to make a favorite database work for most projects, using those tricks adds unnecessary complexity.

Before considering a specific database, take some time to think about what type would best support the project at hand. The question goes deeper than “SQL vs. NoSQL.” Read on for a rundown of the most common database types, the relative merits of each, and how to tell which is the best fit.

[ Which NoSQL database should you use? Let InfoWorld be your guide. NoSQL grudge match: MongoDB and Couchbase Server go nose to nose. • NoSQL standouts: The best key-value databases. • NoSQL standouts: The best document databases. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Relational database management systems (Oracle, MySQL, MS Server, PostgreSQL)
Relational databases were developed in the 1970s to handle the increasing flood of data being produced. They have a solid foundational theory and have influenced nearly every database system in use today.

Relational databases store data sets as “relations”: tables with rows and columns where all information is stored as a value of a specific cell. Data in an RDBMS is managed using SQL. Though there are different implementations, SQL is standardized and provides a level of predictability and utility.

After an early flood of vendors tried to take advantage of the system’s popularity with not-quite-relational products, creator E.F. Codd outlined a set of rules that must be followed by all relational database management systems. Codd’s 12 rules revolve around imposing strict internal structure protocols, making sure that searches reliably return requested data, and preventing structural alterations (at least by users). The framework ensured that relational databases are consistent and reliable to this day.
Strengths
Relational databases excel at handling highly structured data and provide support for ACID (Atomicity, Consistency, Isolation, and Durability) transactions. Data is easily stored and retrieved using SQL queries. The structure can be scaled up quickly because adding data without modifying existing data is simple.

Creating limits on what certain user types can access or modify is built into the structure of an RDBMS. Because of this, relational databases are well-suited to applications that require tiered access. For example, customers could view their accounts while agents could both view and make necessary changes.

Weaknesses
The biggest weakness of relational databases is the mirror of their biggest strength. As good as they are at handling structured data, they have a hard time with unstructured data. Representing real world entities in context is difficult in the bounds of an RDBMS. “Sliced” data has to be reassembled from tables into something more readable, and speed can be negatively impacted. The fixed schema doesn’t react well to change, either.

Cost is a consideration with relational databases. They tend to be more expensive to set up and grow. Horizontal scaling, or scaling by adding more servers, is usually both faster and more economical than vertical scaling, which involves adding more resources to a server. However, the structure of relational databases complicates the process. Sharding (where data is horizontally partitioned and distributed across a collection of machines) is necessary to scale out a relational database. Sharding relational databases while maintaining ACID compliance can be a challenge.

Use a relational database for:
Situations where data integrity is absolutely paramount (i.e., for financial applications, defense and security, and private health information)
Highly structured data
Automation of internal processes
Document store (MongoDB, Couchbase)
A document store is a nonrelational database that stores data in JSON, BSON, or XML documents. They feature a flexible schema. Unlike SQL databases, where users must declare a table’s schema before inserting data, document stores don’t enforce document structure. Documents can contain any data desired. They have key-value pairs but also embed attribute metadata to make querying easier.

[ The essentials from InfoWorld: NoSQL grudge match: MongoDB vs. Couchbase Server • Review: MongoDB learns cool new tricks • The essential guide to MongoDB security. • How to work with MongoDB in .Net. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Strengths
Document stores are very flexible. They handle semistructured and unstructured data well. Users don’t need to know during set-up what types of data will be stored, so this is a good choice when it isn’t clear in advance what sort of data will be incoming.

Users can create their desired structure in a particular document without affecting all documents. Schema can be modified without causing downtime, which leads to high availability. Write speed is generally fast, as well.

Besides flexibility, developers like document stores because they’re easy to scale horizontally. The sharding necessary for horizontal scaling is much more intuitive than with relational databases, so document stores scale out fast and efficiently.

Weaknesses
Document databases sacrifice ACID compliance for flexibility. Also, while querying can be done in a document it’s not possible across documents.

Use a document database for:
Unstructured or semistructured data
Content management
In-depth data analysis
Rapid prototyping
Key-value store (Redis, Memcached)
A key-value store is a type of nonrelational database where each value is associated with a specific key. It’s also known as an associative array.

The “key” is a unique identifier associated only with the value. Keys can be anything allowed by the DBMS. In Redis, for example, keys man be any binary sequence up to 512MB.

“Values” are stored as blobs and don’t need predefined schema. They can take nearly any form: numbers, strings, counters, JSON, XML, HTML, PHP, binaries, images, short videos, lists, and even another key-value pair encapsulated in an object. Some DBMSs allow for the data type to be specified, but it isn’t mandatory.

[ The essentials from InfoWorld: Why Redis beats Memcached for caching • How to use Redis for real-time stream processing • Manage access control using Redis Bitfields. • How to work with Redis Cache in .Net. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Strengths
This style of database has a lot of positives. It’s incredibly flexible, able to handle a very wide array of data types easily. Keys are used to go straight to the value with no index searching or joins, so performance is high. Portability is another benefit: key-value stores can be moved from one system to another without rewriting code. Finally, they’re highly horizontally scalable and have lower operating costs overall.

Weaknesses
Flexibility comes at a price. It’s impossible to query values, because they’re stored as a blob and can only be returned as such. This makes it hard to do reporting or edit parts of values. Not all objects are easy to model as key-value pairs, either.

Use a key-value store for:
Recommendations
User profiles and settings
Unstructured data such as product reviews or blog comments
Session management at scale
Data that will be accessed frequently but not often updated
Wide-column store (Cassandra, HBase)
Wide-column stores, also called column stores or extensible record stores, are dynamic column-oriented nonrelational databases. They’re sometimes seen as a type of key-value store but have attributes of traditional relational databases as well.

Wide-column stores use the concept of a keyspace instead of schemas. A keyspace encompasses column families (similar to tables but more flexible in structure), each of which contains multiple rows with distinct columns. Each row doesn’t need to have the same number or type of column. A timestamp determines the most recent version of data.

[ The essentials from InfoWorld: Get to know Cassandra, the NoSQL maverick • Review: Cassandra lowers the barriers to big data. | HBase: The database big data left behind. • Review: HBase is massively scalable—and hugely complex. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Strengths
This type of database has some benefits of both relational and nonrelational databases. It deals better with both structured and semistructured data than other nonrelational databases, and it’s easier to update. Compared to relational databases, it’s more horizontally scalable and faster at scale.

Columnar databases compress better than row-based systems. Also, large data sets are simple to explore. Wide-column stores are particularly good at aggregation queries, for example.

Weaknesses
Writes are expensive in the small. While updating is easy to do in bulk, uploading and updating individual records is hard. Plus, wide-column stores are slower than relational databases when handling transactions.

Use a wide-column store for:
Big data analytics where speed is important
Data warehousing on big data
Large scale projects (this database style is not a good tool for average transactional applications)
Search engine (Elasticsearch)
It may seem strange to include search engines in an article about database types. However, Elasticsearch has seen increased popularity in this sphere as developers look for innovative ways to cut down search lag. Elastisearch is a nonrelational, document-based data storage and retrieval solution specifically arranged and optimized for the storage and rapid retrieval of data.

Strengths
Elastisearch is very scalable. It features flexible schema and fast retrieval of records, with advanced search options including full text search, suggestions, and complex search expressions.

One of the most interesting search features is stemming. Stemming analyzes the root form of a word to find relevant records even when another form is used. For example, a user searching an employment database for “paying jobs” would also find positions tagged as “paid” and “pay.”

Weaknesses
Elastisearch is used more as an intermediary or supplementary store than a primary database. It has low durability and poor security. There’s no innate authentication or access control. Also, Elastisearch doesn’t support transactions.

Use a search engine like Elastisearch for:
Improving user experience with faster search results
Logging
Final considerations
Some applications fit neatly in the strengths of one specific database type, but for most projects there’s overlap between two or more. In those cases, it can be useful to look at which specific databases in the contended styles are good candidates. Vendors offer a wide spectrum of features for tailoring their database to individual standards. Some of these may help resolve uncertainty over factors like security, scalability, and cost.

This article is published as part of the IDG Contributor Network. Want to Join?

Software Engineering / Congolese health workers and patients benefit from Couchbase database

« on: May 09, 2018, 06:02:46 PM »

https://www.computerweekly.com/feature/Congolese-health-workers-and-patients-benefit-from-Couchbase-database

Software Engineering / Case Study: How Admiral made its Teradata data warehouse more agile

« on: May 09, 2018, 06:01:47 PM »

Admiral Group began using Teradata a few years ago because it was finding it difficult to update its old mainframe system to add new insurance products.

Explaining the company’s data warehouse plans, James Gardiner, data warehouse technical lead at Admiral, says: “We wanted to use the US GuideWire software to spin up new products. But all data was coming from the mainframe accessed via SAS or Excel.

However, Gardiner admits that the Teradata project was slow and painful. “The project was run tactically to ensure the new system went in on time,” he says. “We ran a waterfall methodology and hand-coded Teradata. We had a complicated extract, translate and load (ETL) process to move data from our source systems into Teradata and the documentation was out of date.”
Following the implementation, the company wanted to try to reimplement the data warehouse project. “We found the traditional method of implementation was not aligned to the business,” says Gardiner.

Because Admiral’s database team were not Teradata experts, Admiral needed Teradata consultants to write custom lines of code by hand for each business request. The company wanted a more agile approach to updating its Teradata data warehouse to enable it to turn around business requirements quickly, says Gardiner.

Admiral began looking at how it could automate the code generation for the Teradata data warehouse, he adds. “We did a proof of concept with WhereScape, which allowed us to become agile, so we could change our methodology.”
By using WhereScape, according to Gardiner, the data warehouse team can now work with the business on rapidly prototyping new ideas, which can then be further developed into products. “We can speed up development by six to eight times and be more flexible with the business,” he says.

“Essentially, we can load data a lot quicker. Admiral will try a lot of different products and spin up trials quickly to see if they bring in new customers.”

To support this, Gardiner says WhereScape allows Admiral to build warehouse components very quickly. Previously, this would have taken months.

The team supporting Teradata is also a lot smaller. “The new WhereScape project has 20 people on the team,” he says. “The previous tactical project had 60 to 80 people.”

Given that Admiral’s data team is mainly trained in SQL Server, Gardiner says: “We can take someone in SQL Server and get this building on Teradata, without lots of hand-holding.”

Software Engineering / Distributed SQL database wagered on for scalability boost, GDPR

« on: May 09, 2018, 06:00:14 PM »

Fans of the simple life found relief of late as a once torrential stream of new distributed databases seemed to slow. But the spigot is still on. As new requirements percolate, additional databases continue to bubble up. New requirements can include global transaction support, the Global Data Protection Regulation and more.
Take CockroachDB from Cockroach Labs as an example. Early implementer Kindred Group plc has been testing out this whimsically named distributed SQL database system -- one fashioned on the model of Google's Spanner technology -- as it seeks to achieve the high levels of scalability required for an online gambling business that spans the planet.

Like other companies, Kindred is pursuing a global business strategy. That can mean, for example, taking bets in real time on sundry wagers made by punters in Australia who are watching tennis matches in France. The need to support that distant user has implications for cloud architecture and database choices.

"If you are betting on the next point in a tennis match, it's not possible to run that from a single data center," said Will Mace, head of U.K.-based Kindred Futures, an innovation unit within Kindred Group.

Instead, he said, you tend to move the database closer to the user.

When Mace came on board four years ago, Kindred was reviewing its technology platform to see how it fit in with its global growth plans. The study confirmed the need for new approaches to data handling.

"The business was based out of a single data center," he said. "It became clear it was not going to support the commercial targets."

Mace and his team were aware of globe-spanning Google Spanner, one of a new breed of distributed SQL database systems grouped together under the NewSQL technology umbrella. Initially described by Google in a 2012 research paper, Spanner was built to support large-scale online transaction processing, SQL and transactional consistency across multiple cloud availability regions.

High-speed data engine swap
Google's database seemed like a means to move transactions closer to the users, wherever they might be. But Spanner was not commercially available when Mace's team set about to switch out an existing relational database, which he declined to identify. Moreover, Spanner's use is limited to the Google Cloud, where it's offered in the Google Cloud Spanner service that was launched in May 2017.

So, Kindred began considering CockroachDB instead. Cockroach Labs built a database with some similarities to Spanner, open sourced it and has gone about supporting multiple cloud and on-premises implementations.

"It allows us to run our services and optimize the performance for our customers wherever they are in the world," Mace said. "It's not useful having transactions run around the world and back again. It's better to put the data center closer to the customer while still being able to replicate the data across different data centers, and keep the database in sync in the different places."
Mace said his team has been working very closely with the Cockroach Labs crew in something of a partnership to roll out a transactional system that can handle online bets. But the move to a new operational database is a delicate one.

"It's like swapping out the engine of a car while it's going 100 miles per hour," he said.

The effort is now in a testing environment.

GDPR, data and jurisdictions
CockroachDB has been in production for over a year, according to Cockroach Labs CEO Spencer Kimball. He described it as a cloud-native technology, and said CockroachDB's development team views the distributed SQL database as an elastically expandable system with a shared-nothing, symmetric architecture that can "heal itself autonomously."
According to Kimball, CockroachDB takes its moniker from the irrepressible members of the urban phylum Arthropoda that are sometimes found in New York City, which is Cockroach Lab's home base. The gritty species' enduring resilience is seen as a plus -- thus, the name.

Kimball said geo-replication, which enables database architects to maintain data closer to users to, in turn, reduce latency, was a key feature of CockroachDB 1.0. The recently released CockroachDB 2.0 offers geo-partitioning, which enables developers to create policies to assign the location of a user's data. It also aligns nicely with one of the hot topics of the day: General Data Protection Regulation (GDPR) compliance requirements in the European Union (EU).

Geo-partitioning of a distributed database system has benefits in this regard, according to Kindred's Mace. It is important as the EU starts to enforce the GDPR requirements that will affect the collection and processing of personal data not only in Europe, but also throughout much of the world.

"We have a desire to keep data in the jurisdiction within which it is given to us," Mace said.

With the updated version of CockroachDB, he said, his company is able to apply a particular location's specific data restrictions to the relevant data.

Software Engineering / The right way to pick a cloud database

« on: May 09, 2018, 05:58:03 PM »

It’s all the rage these days: moving on-premises data to the cloud. But should you use cloud-native databases or databases that run both in the cloud and on-premises?

There are trade-offs between the cloud-native (meaning “cloud-only”) and dual cloud/on-premises options, mostly involving cost and operational efficiency.

[ The essentials from InfoWorld: How to choose the right data-integration tools • How Cosmos DB ensures data consistency in the global cloud. | Go deep into analytics and big data with the InfoWorld Big Data and Analytics Report newsletter. ]
Of course, the issue of which database can be a contentious one. Many IT organizations have used a specific enterprise database for years, and they’re not about to give up that database in the cloud. The good news: Your favorite on-premises database runs in the cloud as well.

But cloud-native databases, such as AWS Redshift and AWS DynamoDB, are sound alternatives for those traditional databases that run in both the public cloud and on traditional systems. If you’re not wed to your on-premises database, you should look at such cloud-native databases.

Cloud-native databases have both cost and performance efficiencies from using cloud-native services in a public cloud. So, all things considered, they should be cheaper and faster.

The downside of cloud-native databases is that if you need to move the data back to your premises, you’ll need to convert the data into the structure of the on-premises databases, such as Oracle or IBM DB2.

Of course, the upside of using databases that run both on-premises and in the cloud (such as Oracle, SQL Server, and many others) is that you should be able to do migrations lickety-split, and even do real-time replications between the on-premises and cloud-based versions of the same database without having to go through data-structure transformations.

Either way, don’t forget to get real about the total costs. While it’s easy to determine ops costs of databases (whether in the cloud or not), you need to consider the cost of the DBA work, backup and recovery, data integration, security, and data governance. Getting those numbers takes some searching to understand.

At the end of the day, the decision comes down to money. Usually, the cheapest solution (in terms of the size of the cloud bill) that meets the needs of the business wins the day—or should.

The trouble is that I’m often fighting database fan boys, who never want to move to a different database than what they have in the datacenter. Of course, if that existing database is cheaper to run on the public cloud and on-premises, I’m all for it. If.

Software Engineering / Cryptocurrencies: Dawn of a new economy

« on: April 30, 2018, 02:26:55 PM »

Mostly due to its revolutionary properties cryptocurrencies have become a success their inventor, Satoshi Nakamoto, didn‘t dare to dream of it. While every other attempt to create a digital cash system didn‘t attract a critical mass of users, Bitcoin had something that provoked enthusiasm and fascination. Sometimes it feels more like religion than technology.
Cryptocurrencies are digital gold. Sound money that is secure from political influence. Money that promises to preserve and increase its value over time. Cryptocurrencies are also a fast and comfortable means of payment with a worldwide scope, and they are private and anonymous enough to serve as a means of payment for black markets and any other outlawed economic activity.

But while cryptocurrencies are more used for payment, its use as a means of speculation and a store of value dwarfs the payment aspects. Cryptocurrencies gave birth to an incredibly dynamic, fast-growing market for investors and speculators. Exchanges like Okcoin, poloniex or shapeshift enables the trade of hundreds of cryptocurrencies. Their daily trade volume exceeds that of major European stock exchanges.

At the same time, the praxis of Initial Coin Distribution (ICO), mostly facilitated by Ethereum‘s smart contracts, gave live to incredibly successful crowdfunding projects, in which often an idea is enough to collect millions of dollars. In the case of “The DAO” it has been more than 150 million dollars.

In this rich ecosystem of coins and token, you experience extreme volatility. It‘s common that a coin gains 10 percent a day – sometimes 100 percent – just to lose the same at the next day. If you are lucky, your coin‘s value grows up to 1000 percent in one or two weeks.

While Bitcoin remains by far the most famous cryptocurrency and most other cryptocurrencies have zero non-speculative impact, investors and users should keep an eye on several cryptocurrencies. Here we present the most popular cryptocurrencies of today.