Today, I am going to explain physical design process.
Purposes of this article creation
- To be able to create DB tables in MySQL in Ruby on Rails on my own.
- To be able to do physical design.
Check Capacity And Performance Requirements and Calculate Amount Of Data Required And Data Area Process
In this case, I will create physical data model for a SNS task management application on Cacoo as a deliverable.
【Process】
- Check Capacity And Performance Requirements.
Check how many users access the site, how often they access the site, how much data they register, and the level of performance required by the users.
※In the case of SNS, real-time performance is required, so quick response is necessary. In the case of a task management service, real-time performance is not necessary to that extent(even if updates are a little slow, it is not a problem or fatal).
※Performance is measured when a prototype is created and tested, so it is skipped here.
- Calculate Amount Of Data Required And Data Area.
Based on the information (for data requirements and performance requirements) obtained in step 1, determine the amount of data and data area required by the DB.
※Because this is a very difficult task, it is difficult to make an estimate out of the blue for new development. Since it is impossible to know what cannot be estimated, it is best to be aware of the fact that the capacity and performance should be fairly generous.
※For excessive performance, it is enough to complain that “it could have been done more cheaply…,” but for insufficient performance, it could lead to a large number of failures and, if done poorly, a large amount of damage. To solve such problems, the introduction of “cloud services” have enough room for consideration.
1.Check Capacity And Performance Requirements.
※Create the DB requirements for grow-sns here (skip about performance this time).
【Main contents to be checked by looking at the specifications(Examples)】
- User data additions(Capacity, such as how much data is added per day or per year).
- Frequency of user access(The contents such as how much data is accessed at a time).
- Performance required for service operation(Based on the above information, how much performance is required for the network and hardware in order for the business to run smoothly).
[Required to estimate data requirements]
- User data additions
- Amount of data added per day
- ~ per month
- ~ per year
[Required to estimate performance]
- Frequency of user access
- Amount of access in a moment
- Amount of access per day
- ~ per month
- ~ per year
[Check if performance is stated]
- Performance required for service operation
- Required Network
- Required Hardware
- Required DB
※If you have a minimum architecture using the AWS pay-as-you-go system, you will not have that much trouble, so we do not define it that strictly (because when developing a new service, it is unknown how many users will use it. It is sufficient to implement the minimum architecture and then change the architecture as the number of users increases).
Performance
【If the project is completely newly developed and cannot be predicted】
- Check the values of similar systems (if the configuration is different from that of similar systems, the results may be misleading).
- Create a prototype and conduct performance testing in the early development phase.
※Most projects have short deadlines, and most of the time we don’t have the time to actually do that much.
2.Calculate Amount Of Data Required And Data Area.
- Calculate Amount Of Data Required And Data Area.
- Calculate the amount of data required based on the “amount of data to be added/access frequency/performance requirements” to avoid DB disk capacity shortage.
- Understand the method of DB redundancy in AWS RDS and Aurora, and the pay-as-you-go system.
Amount Of Data Required And Data Area
【If the project is completely newly developed and cannot be predicted】
- Make sure to have a certain amount of storage capacity to spare.
- Make sure to have a configuration that allows for easy addition of storage capacity.
※If the system is “only storing data for the most recent year,” it may not increase that much, but basically, most systems are increasing rapidly.
Deliverables
【Performance】
- skip.
※Performance is measured when a prototype is created and tested, so it is skipped here. Once Performance is known, instances of RDS, etc. can be determined.
【Amount Of Data Required】
[Amount of data added per day]
- 30MB
- tasks:about 5/day
- likes:about 10/day
- relationships:about 5-10/day
- notifications(Dependent on likes and relationships.):about 15-20/day
[Amount of data added per month]
- 0.9GB
[Number of users increased]
- 100 users increases every 3 months.
※The actual number of users will not increase this nicely, but for ease of estimation, this figure will be used in this case. Actual numbers will be monitored by running the service.
【Data Area】
- 125GB
- 1year:37.8GB
- 2year:125GB
- 3year:233GB
- 4year:341GB
- 5year:469GB
[Why?]
- If you have about 125GB listed between the above 1-3 years of data, it will last for the first 1-2 years.
- If the amount of data changes more than predicted in the first 1-2 years, you can change the AWS DB instance type each time.
- Do not define it too strictly, but decide after actually operating the service. Put flexibility into the data volume until you make the change.
References
【Estimating the data capacity(data size) of a table】
- 「【データベース設計】ハードウェアのサイジング(容量と性能)を決める。(物理設計)」:http://www.code-magagine.com/?p=1725
- 「DB勉強会レポート_2019年01月15日(火)実施分【後編】」:https://techracho.bpsinc.jp/oasist/2019_02_14/69730
- 「[データベース設計編]レコード長×件数でデータ容量を決めてはいけない」:https://xtech.nikkei.com/it/article/COLUMN/20090512/329850/
- 「DB 設計時のサイズ見積り[最新版]」:https://qiita.com/yone098@github/items/b5b31c1a9feb8307febd
- 「11.6 Data Type Storage Requirements」:https://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html#data-types-storage-reqs-strings
【Performance】
Summary/What I learned this time
【Summary】
What I learned this time was about how to estimate data capacity and performance of a DB. I was able to estimate data capacity. The performance is to be determined after prototyping, test operation, and load experiments.
※The rest of the article will be updated when performance is measured after prototyping.
【What I learned this time】
By learning how to estimate data requirements, performance, etc., I was able to calculate the cost of running the service. It also gave me the opportunity to learn about methods to improve the performance of the service, and I learned a lot.
By doing data requirements calculations/requirements and cost estimates for many services, I want to be able to estimate more quickly and more accurately.